💰 Now with prompt caching savings
AI API Cost Calculator
Compare 30+ models with real cache pricing. See what you actually pay.
% of input tokens served from cache
Monthly: 30K req | Input: 15M | Output: 9M | Cached: 7.5M
Filter:
📋 Full Model Pricing
All prices per 1M tokens. Source: models.dev
| Provider | Model | Input | Output | Cache Read | Context | Tier |
|---|
FAQ
What is prompt caching and how does it save money?
Prompt caching stores frequently-used input tokens (like system prompts) so they don't need to be reprocessed. Cached tokens cost 50-90% less than regular input tokens. For example, GPT-5.4 charges $2.50/1M for input but only $0.25/1M for cached input — a 90% discount. If 50% of your input tokens are cached, your effective input cost drops by 45%.
Which models have the best cache discounts?
OpenAI and Anthropic offer the best cache discounts (90% off). Google offers 75% off. DeepSeek offers 90% off. Most providers require a minimum prompt length (typically 1024-4096 tokens) to enable caching. Anthropic also charges a cache write fee ($3.75/1M for Sonnet), so caching only pays off if you reuse the prompt multiple times.
How accurate is this calculator?
Pricing data comes from models.dev, an open-source database updated by the community. Cache hit rates are estimates — your actual rate depends on your workload. Chatbots with system prompts typically achieve 50-80% cache hit rates; one-off requests may have 0%.