back to thoughts
    3 min read

    Gemini 2.5 vs GPT‑4o

    **Last verified:** 19 July 2025 --- ![ChatGPT Image Jul 19, 2025, 05_35_13 PM.png](https://eprdkvvzrdpxuhxntoft.supabase.co/storage/v1/object/public/blog-imag

    #gemini#ai#gemini2.5#flash#pro#pricing
    share:

    Last verified: 19 July 2025

    ChatGPT Image Jul 19, 2025, 05_35_13 PM.png

    1 · Why this update?

    My blog post compared Gemini 2.5 (Pro & Flash) to GPT‑4o. A couple of numbers have shifted, mostly pricing, so here is a clear, single‑page reference you can drop into the article or link as a footnote.


    2 · Latest official sources

    Model / Doc Link Doc timestamp
    Gemini 2.5 Pro – Vertex AI model card docs.google.com (Model Card) 11 Jul 2025
    Gemini 2.5 Flash – Vertex AI model card docs.google.com (Model Card) 10 Jul 2025
    Gemini thinking‑budget guide Google API docs 14 Jun 2025
    Gemini pricing table AI Studio pricing 09 Jul 2025
    OpenAI GPT‑4o pricing openai.com/pricing 05 Jun 2025

    3 · Pricing changes since the original post

    Tier Old (May 2025) Current (19 Jul 2025) Δ
    Gemini 2.5 Flash input $0.15 / M tokens $0.10 / M ↓ 33 %
    Gemini 2.5 Flash output $0.40 / M $0.40 / M
    Gemini 2.5 Pro input (≤ 200 k) $1.25 / M $1.25 / M
    Gemini 2.5 Pro output (≤ 200 k) $10 / M $10 / M
    Context‑cache price $0.31 / M (≤ 200 k) new
    Flash‑Lite tier $0.10 / M in · $0.40 / M out new

    4 · Capabilities & context windows (unchanged)

    Feature Gemini 2.5 Pro Gemini 2.5 Flash GPT‑4o
    Max context 1 M tokens 1 M tokens 128 k tokens
    Thinking‑budget knob
    Vision + audio Good (slower) Vision only (fast) Real‑time, best‑in‑class

    No change since the original post.


    5 · Benchmarks & latency snapshots

    • Gemini 2.5 Pro still scores ≈ 85 ± 1 % MMLU, narrowly behind GPT‑4o (≈ 86 %).
    • Independent latency tests report 0.26–0.32 s TTFT for Flash, well within the “sub‑second” claim.
    • GPT‑4o latency remains industry‑leading for multimodal I/O.

    6 · What actually changed?

    1. Flash input cost cut from $0.15 → $0.10 per M tokens.
    2. Flash‑Lite SKU introduced (same pricing as new Flash input, slightly lower weight limit).
    3. Context‑cache billing line added to Google docs.
    4. Minor wording: Google now labels Pro the “most advanced reasoning Gemini model.”

    Everything else, context limits, thinking tokens, latency, benchmark scores, GPT‑4o pricing, remains unchanged.


    7 · Bottom Line:

    Only three edits needed to keep your comparison current:

    1. Update the Flash input price to $0.10 / M.
    2. Add a footnote that Flash‑Lite is now available (same price, lower latency).
    3. Mention the context‑cache add‑on if your readers care about serving cost at scale.

    👨‍💻

    Ryan Katayi

    Full-stack developer who turns coffee into code. Building things that make the web a better place, one commit at a time.

    more about me