How accurate is the token estimate?
The estimator uses a 4-characters-per-token heuristic — fast, zero-dependency, and within about 0.5x–2x of the real tokenizer count for most English prose. Code and non-English text tokenize differently, so treat the number as a first-order estimate, not an invoice. For a real per-token breakdown, feed the same prompt into Prompt Token Visualizer; that tool shows word-boundary and raw-byte modes side-by-side.
When were these prices current?
Prices were fetched from each vendor's pricing page on 2026-04-24. Anthropic, Google, and DeepSeek numbers are verified against the vendor's own docs. OpenAI numbers come from two independent aggregators (devtk.ai, pricepertoken.com) because openai.com/api/pricing blocks automated fetching. The footer shows the last update date; when a price changes we refresh the constants in one place and bump the sitemap lastmod.
Where can I verify these numbers?
Anthropic: platform.claude.com/docs/en/docs/about-claude/pricing. Google: ai.google.dev/gemini-api/docs/pricing. DeepSeek: api-docs.deepseek.com/quick_start/pricing. OpenAI: openai.com/api/pricing (or platform.openai.com/docs/pricing). Every per-model card on this page links back to the source. If a vendor's page says something different from what you see here, the vendor wins — open an issue and we'll fix it.
Why is GPT-5 cheaper than Claude Opus 4.7 on input?
Price does not equal quality. GPT-5 lists at $1.25 / million input tokens; Claude Opus 4.7 lists at $5. Different vendors optimize for different workloads — OpenAI is aggressive on input cost to drive volume; Anthropic charges a premium that tracks Opus-tier reasoning on long contexts. Pick the one that actually solves your task in fewer rounds; a cheap model that needs 3 retries is more expensive than an expensive model that one-shots. Benchmark both on your own prompt before committing.
What about cached input?
Every model card shows a secondary cache-read stat — typically 10% of the base input rate. Cache hits let you reuse a large system prompt or document across calls at a fraction of the cost, but you pay an extra write fee the first time. The simulator does not model cache writes; it assumes every call is a fresh input. If you know you'll cache, mentally multiply your input cost by roughly 0.1 after the first call.
Why an output-ratio slider instead of fixed numbers?
Output length is the single biggest lever on cost because output tokens are typically 3–5x the input-token price. A 500-token prompt that produces a 50-token answer costs very differently from one that produces a 2000-token answer. The slider lets you model your actual workload — 0.1x is a classification task, 1x is a rewrite, 3x or higher is long-form generation. The default of 1x assumes output is the same length as input.
What's not counted in the cost?
Image tokens, tool-use system-prompt overhead (every tool definition adds a few hundred tokens), web-search fees ($10 per 1,000 searches on the Claude API), fast-mode premiums, data-residency multipliers, regional-endpoint surcharges, and any enterprise-tier discount you might negotiate. The simulator is a back-of-the-envelope comparison. For a production cost model that accounts for team size, kLOC, language mix, and intensity, use AI Stack Cost Estimator instead.
Can I share my results?
Yes. The URL updates as you type and pick options. It encodes the prompt in base64url (?p=...), the selected models as a comma-separated list of short ids (?m=op47,so46,...), the call count (?n=100), and the output ratio (?o=1.5). Copy the URL from your address bar or hit the Copy share URL button below the chart; a colleague opening it lands on your exact setup with the chart already rendered.