💰 Now with prompt caching savings

AI API Cost Calculator

Compare 30+ models with real cache pricing. See what you actually pay.

% of input tokens served from cache

Monthly: 30K req | Input: 15M | Output: 9M | Cached: 7.5M
Filter:

📋 Full Model Pricing

All prices per 1M tokens. Source: models.dev

Provider Model Input Output Cache Read Context Tier

FAQ

What is prompt caching and how does it save money?
Prompt caching stores frequently-used input tokens (like system prompts) so they don't need to be reprocessed. Cached tokens cost 50-90% less than regular input tokens. For example, GPT-5.4 charges $2.50/1M for input but only $0.25/1M for cached input — a 90% discount. If 50% of your input tokens are cached, your effective input cost drops by 45%.
Which models have the best cache discounts?
OpenAI and Anthropic offer the best cache discounts (90% off). Google offers 75% off. DeepSeek offers 90% off. Most providers require a minimum prompt length (typically 1024-4096 tokens) to enable caching. Anthropic also charges a cache write fee ($3.75/1M for Sonnet), so caching only pays off if you reuse the prompt multiple times.
How accurate is this calculator?
Pricing data comes from models.dev, an open-source database updated by the community. Cache hit rates are estimates — your actual rate depends on your workload. Chatbots with system prompts typically achieve 50-80% cache hit rates; one-off requests may have 0%.