Compare pricing, benchmarks, and capabilities across 562 AI models
| Model | Provider | Input $/1M↕ | Output $/1M↕ | Context↕ | Intelligence↑ | Speed↕ | Latency | API |
|---|---|---|---|---|---|---|---|---|
DeepSeek R2 ★ | DeepSeek | $0.55 | $2.19 | 128K | 91% | 60 tok/s | — | |
GPT-4.1 ★ | OpenAI | $2 | $8 | 1M | 90.5% | 80 tok/s | — | |
Claude Opus 4.6 ★ | Anthropic | $15 | $75 | 200K | 88.7% | 60 tok/s | — | |
GPT-4o ★ | OpenAI | $5 | $15 | 128K | 87.2% | 120 tok/s | — | |
Claude Sonnet 4.6 ★ | Anthropic | $3 | $15 | 200K | 86.8% | 100 tok/s | — | |
Llama 3.3 70B Open★ | Meta AI | $0.23 | $0.92 | 128K | 86% | 80 tok/s | — | |
o3 | OpenAI | $10 | $40 | 200K | 96.7% | 40 tok/s | — | |
o4-mini | OpenAI | $1.1 | $4.4 | 200K | 93.4% | 100 tok/s | — | |
Gemini 3 Ultra | Google DeepMind | $7 | $21 | 1M | 90.1% | 70 tok/s | — | |
Claude Opus 4.5 (Reasoning) | Anthropic | — | — | — | 49.7 | 68 tok/s | 13.5s | |
Gemini 3 Pro Preview (low) | — | — | — | 41.3 | — | — | ||
Claude Opus 4.5 (Non-reasoning) | Anthropic | — | — | — | 43.1 | 53 tok/s | 1.1s | |
Gemini 3 Flash Preview (Reasoning) | — | — | — | 46.4 | 197 tok/s | 6.1s | ||
DeepSeek V3 Open | DeepSeek | $0.27 | $1.1 | 128K | 88.5% | 80 tok/s | — | |
Claude 4.1 Opus (Reasoning) | Anthropic | — | — | — | 42 | 37 tok/s | 8.2s | |
Claude 4.5 Sonnet (Reasoning) | Anthropic | — | — | — | 43 | 56 tok/s | 11.4s | |
MiniMax-M2.1 | MiniMax | — | — | — | 39.4 | 74 tok/s | 1.5s | |
Grok 3 | xAI | $3 | $15 | 131K | 87.5% | 90 tok/s | — | |
Llama 3.1 405B Open | Meta AI | $3 | $3 | 128K | 87.3% | 30 tok/s | — | |
Gemini 3 Pro | Google DeepMind | $3.5 | $10.5 | 1M | 87% | 100 tok/s | — | |
GPT-5.1 (high) | OpenAI | — | — | — | 47.7 | 121 tok/s | 33.8s | |
GPT-5 Codex (high) | OpenAI | — | — | — | 44.6 | 208 tok/s | 8.0s | |
GPT-5 (medium) | OpenAI | — | — | — | 42 | 83 tok/s | 50.4s | |
GPT-5.2 (xhigh) | OpenAI | — | — | — | 51.3 | 76 tok/s | 109.3s | |
Grok 4 | xAI | — | — | — | 41.5 | 60 tok/s | 7.7s | |
GPT-5 (high) | OpenAI | — | — | — | 44.6 | 82 tok/s | 101.8s | |
Qwen3-Max | Alibaba Cloud | $0.4 | $1.2 | 32K | 87% | 90 tok/s | — | |
Claude 4 Opus (Reasoning) | Anthropic | — | — | — | 39 | 39 tok/s | 7.6s | |
GPT-5.2 (medium) | OpenAI | — | — | — | 46.6 | — | — | |
GPT-5.1 Codex (high) | OpenAI | — | — | — | 43.1 | 170 tok/s | 6.4s | |
Gemini 2.5 Pro Preview (Mar' 25) | — | — | — | 30.3 | — | — | ||
DeepSeek V3.2 (Reasoning) | DeepSeek | — | — | — | 41.7 | 32 tok/s | 1.4s | |
DeepSeek V3.2 Speciale | DeepSeek | — | — | — | 29.4 | — | — | |
GPT-5 (low) | OpenAI | — | — | — | 39.2 | 79 tok/s | 10.2s | |
Gemini 2.5 Pro | — | — | — | 34.6 | 134 tok/s | 21.4s | ||
Claude 4 Opus (Non-reasoning) | Anthropic | — | — | — | 33 | 37 tok/s | 1.3s | |
Claude 4.5 Sonnet (Non-reasoning) | Anthropic | — | — | — | 37.1 | 43 tok/s | 1.0s | |
GLM-4.7 (Reasoning) | Z AI | — | — | — | 42.1 | 107 tok/s | 0.7s | |
Doubao Seed Code | ByteDance Seed | — | — | — | 33.5 | — | — | |
DeepSeek V3.1 (Reasoning) | DeepSeek | — | — | — | 27.7 | — | — | |
Grok 4 Fast (Reasoning) | xAI | — | — | — | 35.1 | 214 tok/s | 2.9s | |
Qwen3-72B Open | Alibaba Cloud | Free | Free | 32K | 85% | 100 tok/s | — | |
Kimi K2 Thinking | Kimi | — | — | — | 40.9 | 50 tok/s | 1.0s | |
DeepSeek V3.2 Exp (Reasoning) | DeepSeek | — | — | — | 32.9 | 33 tok/s | 1.4s | |
DeepSeek R1 0528 (May '25) | DeepSeek | — | — | — | 27.1 | — | — | |
Grok 4.1 Fast (Reasoning) | xAI | — | — | — | 38.6 | 151 tok/s | 9.8s | |
Cogito v2.1 (Reasoning) | Deep Cogito | — | — | — | 85% | 61 tok/s | 0.5s | |
DeepSeek V3.1 Terminus (Reasoning) | DeepSeek | — | — | — | 33.9 | — | — | |
Phi-4 Open | Microsoft | $0.07 | $0.14 | 16K | 84.8% | 300 tok/s | — | |
Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning) | — | — | — | 25.7 | — | — | ||
Claude 3.7 Sonnet (Reasoning) | Anthropic | — | — | — | 34.7 | — | — | |
GPT-5 mini (high) | OpenAI | — | — | — | 41.2 | 91 tok/s | 140.6s | |
MiMo-V2-Flash (Reasoning) | Xiaomi | — | — | — | 39.2 | 134 tok/s | 1.7s | |
Qwen3 VL 235B A22B (Reasoning) | Alibaba | — | — | — | 27.6 | 48 tok/s | 1.3s | |
Qwen3 Max (Preview) | Alibaba | — | — | — | 26.1 | 45 tok/s | 1.8s | |
DeepSeek V3.1 Terminus (Non-reasoning) | DeepSeek | — | — | — | 28.5 | — | — | |
Qwen3 235B A22B 2507 (Reasoning) | Alibaba | — | — | — | 29.5 | 40 tok/s | 1.4s | |
o1 | OpenAI | — | — | — | 30.8 | 129 tok/s | 18.5s | |
K-EXAONE (Reasoning) | LG AI Research | — | — | — | 32.1 | — | — | |
Gemini 2.5 Pro Preview (May' 25) | — | — | — | 29.5 | — | — | ||
DeepSeek V3.2 Exp (Non-reasoning) | DeepSeek | — | — | — | 28.4 | 33 tok/s | 1.3s | |
DeepSeek R1 (Jan '25) | DeepSeek | — | — | — | 18.8 | — | — | |
Mistral Large | Mistral AI | $2 | $6 | 128K | 84% | 90 tok/s | — | |
Claude 4 Sonnet (Non-reasoning) | Anthropic | — | — | — | 33 | 47 tok/s | 0.8s | |
Gemini 2.5 Flash Preview (Sep '25) (Reasoning) | — | — | — | 31.1 | — | — | ||
Claude 4 Sonnet (Reasoning) | Anthropic | — | — | — | 38.7 | 51 tok/s | 9.1s | |
GLM-4.5 (Reasoning) | Z AI | — | — | — | 26.4 | 42 tok/s | 0.9s | |
DeepSeek V3.2 (Non-reasoning) | DeepSeek | — | — | — | 32.1 | 32 tok/s | 1.4s | |
DeepSeek V3.1 (Non-reasoning) | DeepSeek | — | — | — | 28.1 | — | — | |
Grok 3 Mini | xAI | $0.3 | $0.5 | 131K | 83% | 160 tok/s | — | |
ERNIE 5.0 Thinking Preview | Baidu | — | — | — | 29.1 | — | — | |
GPT-5 mini (medium) | OpenAI | — | — | — | 38.9 | 83 tok/s | 18.4s | |
Nova 2.0 Pro Preview (medium) | Amazon | — | — | — | 35.7 | 144 tok/s | 14.5s | |
Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) | NVIDIA | — | — | — | 15 | 43 tok/s | 0.7s | |
Grok 3 mini Reasoning (high) | xAI | — | — | — | 32.1 | 217 tok/s | 0.4s | |
GLM-4.6 (Reasoning) | Z AI | — | — | — | 32.5 | 80 tok/s | 0.7s | |
Qwen3 235B A22B 2507 Instruct | Alibaba | — | — | — | 25 | 69 tok/s | 1.2s | |
Qwen3 235B A22B (Reasoning) | Alibaba | — | — | — | 19.8 | 64 tok/s | 1.2s | |
Hermes 4 - Llama-3.1 405B (Reasoning) | Nous Research | — | — | — | 18.6 | 34 tok/s | 0.7s | |
Gemini 2.5 Flash (Reasoning) | — | — | — | 27 | 231 tok/s | 14.9s | ||
Qwen3 Next 80B A3B Instruct | Alibaba | — | — | — | 20.1 | 172 tok/s | 1.1s | |
Qwen3 Max Thinking (Preview) | Alibaba | — | — | — | 32.5 | 43 tok/s | 1.8s | |
Kimi K2 | Kimi | — | — | — | 26.3 | 34 tok/s | 1.3s | |
Qwen3 Next 80B A3B (Reasoning) | Alibaba | — | — | — | 26.7 | 169 tok/s | 1.1s | |
Seed-OSS-36B-Instruct | ByteDance Seed | — | — | — | 25.2 | 43 tok/s | 1.6s | |
Qwen3 VL 32B (Reasoning) | Alibaba | — | — | — | 24.7 | 95 tok/s | 1.4s | |
Qwen3 VL 235B A22B Instruct | Alibaba | — | — | — | 20.8 | 60 tok/s | 1.1s | |
Kimi K2 0905 | Kimi | — | — | — | 30.9 | 24 tok/s | 6.0s | |
GLM-4.5-Air | Z AI | — | — | — | 23.2 | 67 tok/s | 1.1s | |
MiniMax M1 80k | MiniMax | — | — | — | 24.4 | — | — | |
MiniMax-M2 | MiniMax | — | — | — | 36.1 | 68 tok/s | 2.3s | |
Magistral Medium 1.2 | Mistral | — | — | — | 27.1 | 99 tok/s | 0.5s | |
DeepSeek V3 0324 | DeepSeek | — | — | — | 22.3 | — | — | |
GPT-4o mini | OpenAI | $0.15 | $0.6 | 128K | 82% | 200 tok/s | — | |
Gemini 3 Flash | Google DeepMind | $0.075 | $0.3 | 1M | 82% | 250 tok/s | — | |
Nova 2.0 Pro Preview (low) | Amazon | — | — | — | 31.9 | 154 tok/s | 6.0s | |
Nova 2.0 Lite (high) | Amazon | — | — | — | 34.5 | 192 tok/s | 17.9s | |
GPT-5.1 Codex mini (high) | OpenAI | — | — | — | 38.6 | 208 tok/s | 5.6s | |
GPT-5 (ChatGPT) | OpenAI | — | — | — | 21.8 | 154 tok/s | 0.6s | |
Ling-1T | InclusionAI | — | — | — | 19 | — | — | |
INTELLECT-3 | Prime Intellect | — | — | — | 22.2 | — | — | |
EXAONE 4.0 32B (Reasoning) | LG AI Research | — | — | — | 16.7 | — | — | |
Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning) | — | — | — | 21.6 | — | — | ||
Qwen3 VL 30B A3B (Reasoning) | Alibaba | — | — | — | 19.7 | 128 tok/s | 1.0s | |
gpt-oss-120B (high) | OpenAI | — | — | — | 33.3 | 212 tok/s | 0.5s | |
Nova 2.0 Lite (medium) | Amazon | — | — | — | 29.7 | 197 tok/s | 15.3s | |
Llama Nemotron Super 49B v1.5 (Reasoning) | NVIDIA | — | — | — | 18.7 | 66 tok/s | 0.3s | |
Qwen3 30B A3B 2507 (Reasoning) | Alibaba | — | — | — | 22.4 | 146 tok/s | 1.1s | |
Ring-1T | InclusionAI | — | — | — | 22.8 | — | — | |
MiniMax M1 40k | MiniMax | — | — | — | 20.9 | — | — | |
Hermes 4 - Llama-3.1 70B (Reasoning) | Nous Research | — | — | — | 16 | 74 tok/s | 0.6s | |
GPT-5 (minimal) | OpenAI | — | — | — | 23.9 | 72 tok/s | 1.2s | |
Llama 4 Maverick | Meta | — | — | — | 18.4 | 116 tok/s | 0.6s | |
Gemini 2.5 Flash (Non-reasoning) | — | — | — | 20.6 | 189 tok/s | 0.5s | ||
Nova 2.0 Omni (medium) | Amazon | — | — | — | 28 | — | — | |
Mistral Large 3 | Mistral | — | — | — | 22.8 | 56 tok/s | 0.6s | |
Gemini 2.0 Pro Experimental (Feb '25) | — | — | — | 18.1 | — | — | ||
Solar Pro 2 (Reasoning) | Upstage | — | — | — | 14.9 | — | — | |
KAT-Coder-Pro V1 | KwaiKAT | — | — | — | 36 | 119 tok/s | 0.9s | |
K-EXAONE (Non-reasoning) | LG AI Research | — | — | — | 23.4 | — | — | |
Mi:dm K 2.5 Pro Preview | Korea Telecom | — | — | — | 81% | — | — | |
Mi:dm K 2.5 Pro | Korea Telecom | — | — | — | 23.1 | — | — | |
GPT-5.2 (Non-reasoning) | OpenAI | — | — | — | 33.6 | 63 tok/s | 0.6s | |
Gemini 2.5 Flash Preview (Reasoning) | — | — | — | 24.3 | — | — | ||
Motif-2-12.7B-Reasoning | Motif Technologies | — | — | — | 19.1 | — | — | |
Claude 3.7 Sonnet (Non-reasoning) | Anthropic | — | — | — | 30.8 | — | — | |
Gemini 2.0 Flash Thinking Experimental (Jan '25) | — | — | — | 19.6 | — | — | ||
o3-mini (high) | OpenAI | — | — | — | 25.2 | 156 tok/s | 26.1s | |
Claude 4.5 Haiku (Non-reasoning) | Anthropic | — | — | — | 31.1 | 100 tok/s | 0.5s | |
GPT-4o (March 2025, chatgpt-4o-latest) | OpenAI | — | — | — | 18.6 | — | — | |
GLM-4.6V (Reasoning) | Z AI | — | — | — | 23.4 | 29 tok/s | 1.1s | |
Gemini 2.5 Flash-Lite Preview (Sep '25) (Non-reasoning) | — | — | — | 19.4 | — | — | ||
Qwen3 32B (Reasoning) | Alibaba | — | — | — | 16.5 | 105 tok/s | 1.1s | |
GPT-5.1 (Non-reasoning) | OpenAI | — | — | — | 27.4 | 120 tok/s | 0.8s | |
DeepSeek R1 Distill Llama 70B | DeepSeek | — | — | — | 16 | 43 tok/s | 0.5s | |
Nova 2.0 Omni (low) | Amazon | — | — | — | 23.2 | — | — | |
Llama 3.3 Nemotron Super 49B v1 (Reasoning) | NVIDIA | — | — | — | 18.5 | — | — | |
Qwen3 Omni 30B A3B (Reasoning) | Alibaba | — | — | — | 15.6 | 92 tok/s | 1.0s | |
NVIDIA Nemotron 3 Nano 30B A3B (Reasoning) | NVIDIA | — | — | — | 24.3 | 162 tok/s | 1.6s | |
Apriel-v1.6-15B-Thinker | ServiceNow | — | — | — | 27.6 | — | — | |
GLM-4.7 (Non-reasoning) | Z AI | — | — | — | 34.2 | 96 tok/s | 0.7s | |
HyperCLOVA X SEED Think (32B) | Naver | — | — | — | 23.7 | — | — | |
K2-V2 (high) | MBZUAI Institute of Foundation Models | — | — | — | 20.6 | — | — | |
GLM-4.5V (Reasoning) | Z AI | — | — | — | 15.1 | 48 tok/s | 0.8s | |
o3-mini | OpenAI | — | — | — | 25.9 | 167 tok/s | 8.5s | |
Grok Code Fast 1 | xAI | — | — | — | 28.7 | 189 tok/s | 3.9s | |
Nova 2.0 Lite (low) | Amazon | — | — | — | 24.6 | 206 tok/s | 4.8s | |
Qwen3 Coder 480B A35B Instruct | Alibaba | — | — | — | 24.8 | 62 tok/s | 1.6s | |
Ring-flash-2.0 | InclusionAI | — | — | — | 14 | 84 tok/s | 1.3s | |
Qwen3 VL 32B Instruct | Alibaba | — | — | — | 17.2 | 78 tok/s | 1.3s | |
Command R+ | Cohere | $2.5 | $10 | 128K | 78% | 80 tok/s | — | |
ERNIE 4.5 300B A47B | Baidu | — | — | — | 15 | 29 tok/s | 1.8s | |
Ling-flash-2.0 | InclusionAI | — | — | — | 15.7 | 99 tok/s | 1.4s | |
GPT-4.1 mini | OpenAI | — | — | — | 22.9 | 99 tok/s | 0.5s | |
GPT-5 nano (high) | OpenAI | — | — | — | 26.8 | 150 tok/s | 86.4s | |
GPT-5 mini (minimal) | OpenAI | — | — | — | 20.7 | 78 tok/s | 1.0s | |
Gemini 2.0 Flash (experimental) | — | — | — | 16.8 | — | — | ||
Gemini 2.0 Flash (Feb '25) | — | — | — | 18.5 | — | — | ||
Gemini 2.5 Flash Preview (Non-reasoning) | — | — | — | 17.8 | — | — | ||
GLM-4.6 (Non-reasoning) | Z AI | — | — | — | 30.2 | 88 tok/s | 0.9s | |
gpt-oss-120B (low) | OpenAI | — | — | — | 24.5 | 210 tok/s | 0.5s | |
Qwen3 30B A3B 2507 Instruct | Alibaba | — | — | — | 15 | 109 tok/s | 1.1s | |
Qwen3 30B A3B (Reasoning) | Alibaba | — | — | — | 15.3 | 70 tok/s | 1.1s | |
GPT-4o (ChatGPT) | OpenAI | — | — | — | 14.1 | — | — | |
Solar Pro 2 (Preview) (Reasoning) | Upstage | — | — | — | 18.8 | — | — | |
Qwen3 14B (Reasoning) | Alibaba | — | — | — | 16.2 | 64 tok/s | 1.2s | |
EXAONE 4.0 32B (Non-reasoning) | LG AI Research | — | — | — | 11.7 | — | — | |
Apriel-v1.5-15B-Thinker | ServiceNow | — | — | — | 28.3 | — | — | |
Magistral Small 1.2 | Mistral | — | — | — | 18.2 | 176 tok/s | 0.4s | |
Nova 2.0 Pro Preview (Non-reasoning) | Amazon | — | — | — | 23.1 | 184 tok/s | 0.7s | |
GPT-5 nano (medium) | OpenAI | — | — | — | 25.9 | 154 tok/s | 39.1s | |
Claude 3.5 Sonnet (Oct '24) | Anthropic | — | — | — | 15.9 | — | — | |
K2-V2 (medium) | MBZUAI Institute of Foundation Models | — | — | — | 18.7 | — | — | |
QwQ 32B | Alibaba | — | — | — | 19.7 | 33 tok/s | 0.4s | |
Devstral 2 | Mistral | — | — | — | 22 | 77 tok/s | 0.7s | |
Mistral Medium 3 | Mistral | — | — | — | 18.8 | 54 tok/s | 0.4s | |
Sonar Pro | Perplexity | — | — | — | 15.2 | — | — | |
Olmo 3.1 32B Think | Allen Institute for AI | — | — | — | 13.9 | — | — | |
Claude 4.5 Haiku (Reasoning) | Anthropic | — | — | — | 37.1 | 145 tok/s | 14.2s | |
Olmo 3 32B Think | Allen Institute for AI | — | — | — | 12.1 | — | — | |
Gemini 2.5 Flash-Lite (Reasoning) | — | — | — | 17.6 | 274 tok/s | 17.2s | ||
Qwen3 235B A22B (Non-reasoning) | Alibaba | — | — | — | 17 | 65 tok/s | 1.2s | |
Qwen2.5 Max | Alibaba | — | — | — | 16.3 | 49 tok/s | 1.2s | |
NVIDIA Nemotron Nano 12B v2 VL (Reasoning) | NVIDIA | — | — | — | 14.9 | 152 tok/s | 0.6s | |
Qwen3 VL 30B A3B Instruct | Alibaba | — | — | — | 16.1 | 122 tok/s | 1.1s | |
Claude Haiku 4.5 | Anthropic | $0.8 | $4 | 200K | 75.2% | 250 tok/s | — | |
Gemini 1.5 Pro (Sep '24) | — | — | — | 16 | — | — | ||
Claude 3.5 Sonnet (June '24) | Anthropic | — | — | — | 14.2 | — | — | |
GLM-4.5V (Non-reasoning) | Z AI | — | — | — | 12.7 | 50 tok/s | 30.9s | |
Gemma 3 27B Open | Google DeepMind | Free | Free | 128K | 75% | 120 tok/s | — | |
Magistral Small 1 | Mistral | — | — | — | 16.8 | — | — | |
Solar Pro 2 (Non-reasoning) | Upstage | — | — | — | 13.6 | — | — | |
Magistral Medium 1 | Mistral | — | — | — | 18.8 | — | — | |
gpt-oss-20B (high) | OpenAI | — | — | — | 24.5 | 276 tok/s | 0.3s | |
GLM-4.6V (Non-reasoning) | Z AI | — | — | — | 17.1 | 23 tok/s | 4.1s | |
Llama 4 Scout | Meta | — | — | — | 13.5 | 128 tok/s | 0.5s | |
Qwen3 VL 8B (Reasoning) | Alibaba | — | — | — | 16.7 | 130 tok/s | 1.1s | |
Nova 2.0 Lite (Non-reasoning) | Amazon | — | — | — | 18 | 173 tok/s | 0.8s | |
DeepSeek R1 0528 Qwen3 8B | DeepSeek | — | — | — | 16.4 | — | — | |
Grok 4.1 Fast (Non-reasoning) | xAI | — | — | — | 23.6 | 148 tok/s | 0.4s | |
NVIDIA Nemotron Nano 9B V2 (Reasoning) | NVIDIA | — | — | — | 14.8 | 109 tok/s | 0.3s | |
MiMo-V2-Flash (Non-reasoning) | Xiaomi | — | — | — | 30.4 | 138 tok/s | 1.5s | |
Qwen3 8B (Reasoning) | Alibaba | — | — | — | 13.2 | 83 tok/s | 1.0s | |
Qwen3 4B 2507 (Reasoning) | Alibaba | — | — | — | 18.2 | — | — | |
o1-mini | OpenAI | — | — | — | 20.4 | — | — | |
NVIDIA Nemotron Nano 9B V2 (Non-reasoning) | NVIDIA | — | — | — | 13.2 | 138 tok/s | 0.7s | |
GPT-4o (May '24) | OpenAI | — | — | — | 14.5 | 112 tok/s | 0.6s | |
DeepSeek R1 Distill Qwen 14B | DeepSeek | — | — | — | 15.8 | — | — | |
DeepSeek R1 Distill Qwen 32B | DeepSeek | — | — | — | 17.2 | 43 tok/s | 0.4s | |
DBRX Open | Databricks | $0.75 | $2.25 | 33K | 73.7% | 100 tok/s | — | |
Solar Pro 2 (Preview) (Non-reasoning) | Upstage | — | — | — | 16 | — | — | |
Qwen3 Omni 30B A3B Instruct | Alibaba | — | — | — | 10.7 | 105 tok/s | 1.1s | |
Nova Premier | Amazon | — | — | — | 19 | 62 tok/s | 1.1s | |
Qwen3 32B (Non-reasoning) | Alibaba | — | — | — | 14.5 | 102 tok/s | 1.2s | |
Hermes 4 - Llama-3.1 405B (Non-reasoning) | Nous Research | — | — | — | 17.6 | 33 tok/s | 0.8s | |
Llama 3.1 Instruct 405B | Meta | — | — | — | 17.4 | 31 tok/s | 0.7s | |
Falcon-H1R-7B | TII UAE | — | — | — | 15.8 | — | — | |
Grok 4 Fast (Non-reasoning) | xAI | — | — | — | 23.1 | 204 tok/s | 0.3s | |
Llama 3.2 11B Vision Open | Meta AI | $0.18 | $0.18 | 128K | 73% | 150 tok/s | — | |
gpt-oss-20B (low) | OpenAI | — | — | — | 20.8 | 263 tok/s | 0.4s | |
Llama 3.1 Tulu3 405B | Allen Institute for AI | — | — | — | 14.1 | — | — | |
Qwen2.5 Instruct 72B | Alibaba | — | — | — | 15.6 | 55 tok/s | 1.2s | |
Gemini 2.5 Flash-Lite (Non-reasoning) | — | — | — | 12.7 | 279 tok/s | 0.6s | ||
Gemini 2.0 Flash-Lite (Feb '25) | — | — | — | 14.7 | — | — | ||
Command R | Cohere | $0.15 | $0.6 | 128K | 72% | 150 tok/s | — | |
Mistral Small | Mistral AI | $0.1 | $0.3 | 32K | 72% | 200 tok/s | — | |
Nova 2.0 Omni (Non-reasoning) | Amazon | — | — | — | 16.6 | 223 tok/s | 0.9s | |
Gemini 3.1 Flash-Lite | Google DeepMind | $0.01 | $0.04 | 1M | 72% | 500 tok/s | — | |
Command A | Cohere | — | — | — | 13.5 | 42 tok/s | 0.5s | |
Qwen3 Coder 30B A3B Instruct | Alibaba | — | — | — | 20 | 112 tok/s | 1.5s | |
Llama 3.3 Instruct 70B | Meta | — | — | — | 14.5 | 97 tok/s | 0.6s | |
Grok 2 (Dec '24) | xAI | — | — | — | 13.9 | — | — | |
Devstral Medium | Mistral | — | — | — | 18.7 | 139 tok/s | 0.5s | |
Qwen3 30B A3B (Non-reasoning) | Alibaba | — | — | — | 12.5 | 70 tok/s | 1.2s | |
K2-V2 (low) | MBZUAI Institute of Foundation Models | — | — | — | 14.4 | — | — | |
Falcon 180B Open | TII | Free | Free | 4K | 70.4% | 20 tok/s | — | |
Qwen3 VL 4B (Reasoning) | Alibaba | — | — | — | 13.7 | — | — | |
Llama 3.3 Nemotron Super 49B v1 (Non-reasoning) | NVIDIA | — | — | — | 14.3 | — | — | |
Qwen3 4B (Reasoning) | Alibaba | — | — | — | 14.2 | 103 tok/s | 1.0s | |
Mistral Large 2 (Nov '24) | Mistral | — | — | — | 15.1 | 38 tok/s | 0.5s | |
Pixtral Large | Mistral | — | — | — | 14 | 52 tok/s | 0.5s | |
Grok Beta | xAI | — | — | — | 13.3 | — | — | |
Qwen2.5 Instruct 32B | Alibaba | — | — | — | 13.2 | — | — | |
Claude 3 Opus | Anthropic | — | — | — | 18 | — | — | |
Sarvam M (Reasoning) | Sarvam | — | — | — | 8.4 | — | — | |
Qwen3 VL 8B Instruct | Alibaba | — | — | — | 14.3 | 145 tok/s | 1.0s | |
GPT-4 Turbo | OpenAI | — | — | — | 13.7 | 34 tok/s | 0.9s | |
Ministral 3 14B | Mistral | — | — | — | 16 | 133 tok/s | 0.3s | |
Nova Pro | Amazon | — | — | — | 13.5 | — | — | |
Llama 3.1 Nemotron Instruct 70B | NVIDIA | — | — | — | 13.4 | 43 tok/s | 0.4s | |
Sonar | Perplexity | — | — | — | 15.5 | — | — | |
Llama Nemotron Super 49B v1.5 (Non-reasoning) | NVIDIA | — | — | — | 14.6 | 67 tok/s | 0.3s | |
Devstral Small 2 | Mistral | — | — | — | 19.5 | 77 tok/s | 0.5s | |
Mistral Medium 3.1 | Mistral | — | — | — | 21.3 | 82 tok/s | 0.4s | |
Gemini 1.5 Flash (Sep '24) | — | — | — | 13.8 | — | — | ||
Qwen3 14B (Non-reasoning) | Alibaba | — | — | — | 12.8 | 66 tok/s | 1.1s | |
Mistral Small 3.2 | Mistral | — | — | — | 15.1 | 166 tok/s | 0.4s | |
Llama 3.1 Instruct 70B | Meta | — | — | — | 12.5 | 31 tok/s | 0.7s | |
Mistral Large 2 (Jul '24) | Mistral | — | — | — | 13 | — | — | |
Llama 3.2 Instruct 90B (Vision) | Meta | — | — | — | 11.9 | 42 tok/s | 0.5s | |
Ling-mini-2.0 | InclusionAI | — | — | — | 9.2 | — | — | |
Reka Flash 3 | Reka AI | — | — | — | 9.5 | 96 tok/s | 1.1s | |
Qwen3 4B 2507 Instruct | Alibaba | — | — | — | 12.9 | — | — | |
Hermes 4 - Llama-3.1 70B (Non-reasoning) | Nous Research | — | — | — | 12.6 | 71 tok/s | 0.6s | |
Mistral Small 3.1 | Mistral | — | — | — | 14.5 | 148 tok/s | 0.4s | |
GPT-4.1 nano | OpenAI | — | — | — | 13 | 195 tok/s | 0.4s | |
Gemini 1.5 Pro (May '24) | — | — | — | 12 | — | — | ||
Olmo 3 7B Think | Allen Institute for AI | — | — | — | 9.4 | — | — | |
QwQ 32B-Preview | Alibaba | — | — | — | 15.2 | 44 tok/s | 0.5s | |
NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning) | NVIDIA | — | — | — | 10.1 | 174 tok/s | 0.7s | |
Mistral Small 3 | Mistral | — | — | — | 12.7 | 152 tok/s | 0.5s | |
Ministral 3 8B | Mistral | — | — | — | 14.8 | 189 tok/s | 0.3s | |
Qwen3 8B (Non-reasoning) | Alibaba | — | — | — | 10.6 | 89 tok/s | 1.0s | |
Qwen2.5 Coder Instruct 32B | Alibaba | — | — | — | 12.9 | — | — | |
Qwen3 VL 4B Instruct | Alibaba | — | — | — | 9.6 | — | — | |
Claude 3.5 Haiku | Anthropic | — | — | — | 18.7 | — | — | |
Devstral Small (May '25) | Mistral | — | — | — | 18 | — | — | |
Qwen2.5 Turbo | Alibaba | — | — | — | 12 | 68 tok/s | 1.2s | |
Devstral Small (Jul '25) | Mistral | — | — | — | 15.2 | 200 tok/s | 0.4s | |
Qwen2 Instruct 72B | Alibaba | — | — | — | 11.7 | — | — | |
Granite 4.0 H Small | IBM | — | — | — | 10.8 | 416 tok/s | 8.7s | |
Mistral Saba | Mistral | — | — | — | 12.1 | — | — | |
Gemma 3 12B Instruct | — | — | — | 8.8 | 31 tok/s | 24.2s | ||
Qwen3 4B (Non-reasoning) | Alibaba | — | — | — | 12.5 | 103 tok/s | 1.1s | |
Kimi Linear 48B A3B Instruct | Kimi | — | — | — | 14.4 | — | — | |
Exaone 4.0 1.2B (Reasoning) | LG AI Research | — | — | — | 8.3 | — | — | |
Nova Lite | Amazon | — | — | — | 12.7 | 228 tok/s | 0.7s | |
NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning) | NVIDIA | — | — | — | 13.2 | 75 tok/s | 0.3s | |
DeepHermes 3 - Mistral 24B Preview (Non-reasoning) | Nous Research | — | — | — | 10.9 | — | — | |
Claude 3 Sonnet | Anthropic | — | — | — | 10.3 | — | — | |
Jamba Reasoning 3B | AI21 Labs | — | — | — | 9.6 | — | — | |
Jamba 1.7 Large | AI21 Labs | — | — | — | 10.9 | 52 tok/s | 1.0s | |
Gemini 1.5 Flash-8B | — | — | — | 11.1 | — | — | ||
Hermes 3 - Llama-3.1 70B | Nous Research | — | — | — | 10.6 | 28 tok/s | 0.4s | |
Jamba 1.5 Large | AI21 Labs | — | — | — | 10.7 | — | — | |
Qwen3 1.7B (Reasoning) | Alibaba | — | — | — | 8 | 140 tok/s | 1.1s | |
Gemini 1.5 Flash (May '24) | — | — | — | 10.5 | — | — | ||
Llama 3 Instruct 70B | Meta | — | — | — | 8.9 | 39 tok/s | 0.7s | |
Jamba 1.6 Large | AI21 Labs | — | — | — | 10.6 | 53 tok/s | 1.0s | |
Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning) | NVIDIA | — | — | — | 14.4 | — | — | |
GPT-5 nano (minimal) | OpenAI | — | — | — | 13.8 | 145 tok/s | 1.0s | |
Mixtral 8x22B Instruct | Mistral | — | — | — | 9.8 | — | — | |
DeepSeek R1 Distill Llama 8B | DeepSeek | — | — | — | 12.1 | — | — | |
Nova Micro | Amazon | — | — | — | 10.3 | 328 tok/s | 0.6s | |
Ministral 3 3B | Mistral | — | — | — | 11.2 | 294 tok/s | 0.3s | |
Olmo 3 7B Instruct | Allen Institute for AI | — | — | — | 8.2 | — | — | |
OLMo 2 32B | Allen Institute for AI | — | — | — | 10.6 | — | — | |
LFM2 8B A1B | Liquid AI | — | — | — | 7 | — | — | |
Exaone 4.0 1.2B (Non-reasoning) | LG AI Research | — | — | — | 8.1 | — | — | |
Claude 2.1 | Anthropic | — | — | — | 9.3 | — | — | |
Mistral Medium | Mistral | — | — | — | 9 | 75 tok/s | 0.4s | |
Claude 2.0 | Anthropic | — | — | — | 9.1 | — | — | |
Phi-4 Multimodal Instruct | Microsoft Azure | — | — | — | 10 | 15 tok/s | 0.2s | |
Gemma 3n E4B Instruct | — | — | — | 6.4 | 14 tok/s | 0.3s | ||
Llama 3.1 Instruct 8B | Meta | — | — | — | 11.8 | 159 tok/s | 0.4s | |
Gemma 3n E4B Instruct Preview (May '25) | — | — | — | 10.1 | — | — | ||
Granite 3.3 8B (Non-reasoning) | IBM | — | — | — | 7 | 375 tok/s | 20.3s | |
Qwen2.5 Coder Instruct 7B | Alibaba | — | — | — | 10 | — | — | |
Phi-4 Mini Instruct | Microsoft Azure | — | — | — | 8.4 | 44 tok/s | 0.7s | |
Llama 3.2 Instruct 11B (Vision) | Meta | — | — | — | 8.7 | 77 tok/s | 0.5s | |
GPT-3.5 Turbo | OpenAI | — | — | — | 9 | 107 tok/s | 0.5s | |
Granite 4.0 Micro | IBM | — | — | — | 7.7 | — | — | |
Phi-3 Mini Instruct 3.8B | Microsoft Azure | — | — | — | 10.1 | — | — | |
Command-R+ (Apr '24) | Cohere | — | — | — | 8.3 | — | — | |
Gemini 1.0 Pro | — | — | — | 8.5 | — | — | ||
LFM 40B | Liquid AI | — | — | — | 8.8 | — | — | |
Claude Instant | Anthropic | — | — | — | 7.4 | — | — | |
DeepSeek Coder V2 Lite Instruct | DeepSeek | — | — | — | 8.5 | — | — | |
Mistral Small (Feb '24) | Mistral | — | — | — | 9 | 146 tok/s | 0.4s | |
Gemma 3 4B Instruct | — | — | — | 6.3 | 33 tok/s | 1.1s | ||
Qwen3 1.7B (Non-reasoning) | Alibaba | — | — | — | 6.8 | 141 tok/s | 0.9s | |
Llama 3 Instruct 8B | Meta | — | — | — | 6.4 | 83 tok/s | 0.5s | |
Llama 2 Chat 70B | Meta | — | — | — | 8.4 | — | — | |
Llama 2 Chat 13B | Meta | — | — | — | 8.4 | — | — | |
Jamba 1.7 Mini | AI21 Labs | — | — | — | 8.1 | — | — | |
Mixtral 8x7B Instruct | Mistral | — | — | — | 7.7 | — | — | |
Gemma 3n E2B Instruct | — | — | — | 4.8 | 52 tok/s | 0.4s | ||
Jamba 1.5 Mini | AI21 Labs | — | — | — | 8 | — | — | |
Jamba 1.6 Mini | AI21 Labs | — | — | — | 7.9 | 186 tok/s | 0.8s | |
DeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning) | Nous Research | — | — | — | 7.6 | — | — | |
Molmo 7B-D | Allen Institute for AI | — | — | — | 9.2 | — | — | |
Llama 3.2 Instruct 3B | Meta | — | — | — | 9.7 | 54 tok/s | 0.6s | |
Qwen3 0.6B (Reasoning) | Alibaba | — | — | — | 6.5 | 185 tok/s | 1.0s | |
Command-R (Mar '24) | Cohere | — | — | — | 7.4 | — | — | |
Granite 4.0 1B | IBM | — | — | — | 7.3 | — | — | |
OpenChat 3.5 (1210) | OpenChat | — | — | — | 8.3 | — | — | |
LFM2 2.6B | Liquid AI | — | — | — | 8 | — | — | |
OLMo 2 7B | Allen Institute for AI | — | — | — | 9.3 | — | — | |
Granite 4.0 H 1B | IBM | — | — | — | 8 | — | — | |
DeepSeek R1 Distill Qwen 1.5B | DeepSeek | — | — | — | 9.1 | — | — | |
LFM2 1.2B | Liquid AI | — | — | — | 6.3 | — | — | |
Mistral 7B Instruct | Mistral | — | — | — | 7.4 | 193 tok/s | 0.3s | |
Qwen3 0.6B (Non-reasoning) | Alibaba | — | — | — | 5.7 | 190 tok/s | 0.9s | |
Llama 3.2 Instruct 1B | Meta | — | — | — | 6.3 | 87 tok/s | 0.7s | |
Llama 2 Chat 7B | Meta | — | — | — | 9.7 | 118 tok/s | 1.5s | |
Gemma 3 1B Instruct | — | — | — | 5.5 | 51 tok/s | 0.7s | ||
Granite 4.0 H 350M | IBM | — | — | — | 5.4 | — | — | |
Granite 4.0 350M | IBM | — | — | — | 6.1 | — | — | |
Gemma 3 270M | — | — | — | 7.7 | — | — | ||
Standard | — | — | — | — | — | — | ||
Qwen3.5 Omni Flash | Alibaba | — | — | — | — | — | — | |
Octave 2 | Hume AI | — | — | — | — | — | — | |
Nemotron Cascade 2 30B A3B | NVIDIA | — | — | — | 28.4 | — | — | |
Kimi K2.5 (Non-reasoning) | Kimi | — | — | — | 37.3 | 31 tok/s | 1.3s | |
Mercury 2 | Inception | — | — | — | 32.8 | 877 tok/s | 4.4s | |
Molmo2-8B | Allen Institute for AI | — | — | — | 7.3 | — | — | |
MiMo-V2-Pro | Xiaomi | — | — | — | 49.2 | 71 tok/s | 1.9s | |
MiMo-V2-Omni-0327 | Xiaomi | — | — | — | 44.9 | — | — | |
Sarvam 105B (high) | Sarvam | — | — | — | 18.2 | 100 tok/s | 1.3s | |
MiMo-V2-Omni | Xiaomi | — | — | — | 43.4 | — | — | |
MiMo-V2-Flash (Feb 2026) | Xiaomi | — | — | — | 41.5 | 133 tok/s | 1.3s | |
Neural2 | — | — | — | — | — | — | ||
Sarvam 30B (high) | Sarvam | — | — | — | 12.3 | 272 tok/s | 1.2s | |
KAT Coder Pro V2 | KwaiKAT | — | — | — | 43.8 | 115 tok/s | 1.8s | |
o1-preview | OpenAI | — | — | — | 23.7 | — | — | |
Olmo 3.1 32B Instruct | Allen Institute for AI | — | — | — | 12.2 | 52 tok/s | 0.3s | |
K2 Think V2 | MBZUAI Institute of Foundation Models | — | — | — | 24.1 | — | — | |
LongCat Flash Lite | LongCat | — | — | — | 23.9 | 146 tok/s | 6.0s | |
Tri-21B-Think | Trillion Labs | — | — | — | 18.6 | — | — | |
Tri-21B-think Preview | Trillion Labs | — | — | — | 20 | — | — | |
Apertus 8B Instruct | Swiss AI Initiative | — | — | — | 5.9 | — | — | |
Nanbeige4.1-3B | Nanbeige | — | — | — | 16.1 | — | — | |
Apertus 70B Instruct | Swiss AI Initiative | — | — | — | 7.7 | — | — | |
Trinity Large Thinking | Arcee AI | — | — | — | 31.9 | 126 tok/s | 0.6s | |
GLM-5 (Reasoning) | Z AI | — | — | — | 49.8 | 72 tok/s | 0.9s | |
GLM 5V Turbo (Reasoning) | Z AI | — | — | — | 42.9 | — | — | |
GLM-5.1 (Reasoning) | Z AI | — | — | — | 51.4 | 46 tok/s | 1.0s | |
Step 3.5 Flash 2603 | StepFun | — | — | — | 38.5 | 188 tok/s | 0.9s | |
GLM-5-Turbo | Z AI | — | — | — | 46.8 | — | — | |
GLM-5 (Non-reasoning) | Z AI | — | — | — | 40.6 | 55 tok/s | 1.4s | |
Tiny Aya Global | Cohere | — | — | — | 4.7 | — | — | |
Qwen3.5 2B (Non-reasoning) | Alibaba | — | — | — | 14.7 | 241 tok/s | 0.3s | |
Qwen3.5 397B A17B (Reasoning) | Alibaba | — | — | — | 45 | 52 tok/s | 1.5s | |
Qwen3.5 4B (Non-reasoning) | Alibaba | — | — | — | 22.6 | 189 tok/s | 0.3s | |
Qwen3.5 0.8B (Reasoning) | Alibaba | — | — | — | 10.5 | — | — | |
Qwen3.5 0.8B (Non-reasoning) | Alibaba | — | — | — | 9.9 | 283 tok/s | 0.3s | |
Step3 VL 10B | StepFun | — | — | — | 15.4 | — | — | |
Qwen3.5 9B (Reasoning) | Alibaba | — | — | — | 32.4 | 125 tok/s | 0.3s | |
Qwen3.6 Plus | Alibaba | — | — | — | 50 | 52 tok/s | 1.5s | |
Qwen3.5 4B (Reasoning) | Alibaba | — | — | — | 27.1 | 186 tok/s | 0.3s | |
Qwen3.5 27B (Non-reasoning) | Alibaba | — | — | — | 37.2 | 89 tok/s | 1.4s | |
Qwen3.5 Omni Flash | Alibaba | — | — | — | 25.9 | 170 tok/s | 1.0s | |
Qwen3.5 27B (Reasoning) | Alibaba | — | — | — | 42.1 | 88 tok/s | 1.4s | |
Qwen3.5 122B A10B (Reasoning) | Alibaba | — | — | — | 41.6 | 162 tok/s | 1.2s | |
Qwen3.5 122B A10B (Non-reasoning) | Alibaba | — | — | — | 35.9 | 157 tok/s | 1.2s | |
Qwen3.5 Omni Plus | Alibaba | — | — | — | 38.6 | 51 tok/s | 1.3s | |
Qwen3 Coder Next | Alibaba | — | — | — | 28.3 | 152 tok/s | 0.8s | |
Kimi K2.5 (Reasoning) | Kimi | — | — | — | 46.8 | 33 tok/s | 1.2s | |
Qwen3.5 2B (Reasoning) | Alibaba | — | — | — | 16.3 | — | — | |
Qwen3.5 35B A3B (Non-reasoning) | Alibaba | — | — | — | 30.7 | 142 tok/s | 1.1s | |
Qwen3.5 35B A3B (Reasoning) | Alibaba | — | — | — | 37.1 | 145 tok/s | 1.1s | |
Qwen3.5 397B A17B (Non-reasoning) | Alibaba | — | — | — | 40.1 | 53 tok/s | 1.5s | |
Qwen3 Max Thinking | Alibaba | — | — | — | 39.9 | 34 tok/s | 1.8s | |
Step 3.5 Flash | StepFun | — | — | — | 37.8 | 169 tok/s | 0.8s | |
Llama 65B | Meta | — | — | — | 7.4 | — | — | |
NVIDIA Nemotron 3 Nano 4B | NVIDIA | — | — | — | 14.7 | — | — | |
GPT-3.5 Turbo (0613) | OpenAI | — | — | — | — | — | — | |
o3-pro | OpenAI | — | — | — | 40.7 | 19 tok/s | 106.9s | |
GPT-5.2 Codex (xhigh) | OpenAI | — | — | — | 49 | 110 tok/s | 9.2s | |
Gemini 3.1 Flash TTS | — | — | — | — | — | — | ||
GPT-4o (Aug '24) | OpenAI | — | — | — | 18.6 | 108 tok/s | 0.5s | |
GPT-5.4 mini (medium) | OpenAI | — | — | — | 37.7 | 177 tok/s | 7.4s | |
NVIDIA Nemotron 3 Super 120B A12B (Reasoning) | NVIDIA | — | — | — | 36 | 155 tok/s | 1.2s | |
DeepSeek-V2.5 | DeepSeek | — | — | — | 12.3 | — | — | |
o1-pro | OpenAI | — | — | — | 25.8 | — | — | |
Solar Open 100B (Reasoning) | Upstage | — | — | — | 21.7 | — | — | |
LFM2.5-VL-1.6B | Liquid AI | — | — | — | 6.2 | — | — | |
GPT-4.5 (Preview) | OpenAI | — | — | — | 20 | — | — | |
Solar Pro 3 | Upstage | — | — | — | 25.9 | — | — | |
GPT-4o Realtime (Dec '24) | OpenAI | — | — | — | — | — | — | |
MiniMax-M2.7 | MiniMax | — | — | — | 49.6 | 49 tok/s | 1.7s | |
LFM2 24B A2B | Liquid AI | — | — | — | 10.5 | 148 tok/s | 0.3s | |
GPT-4o mini Realtime (Dec '24) | OpenAI | — | — | — | — | — | — | |
GPT-5.4 nano (Non-Reasoning) | OpenAI | — | — | — | 24.4 | 154 tok/s | 0.5s | |
LFM2.5-1.2B-Thinking | Liquid AI | — | — | — | 8.1 | — | — | |
Gemini 2.0 Flash-Lite (Preview) | — | — | — | 14.5 | — | — | ||
Fish Audio S2 Pro | Fish Audio | — | — | — | — | — | — | |
LFM2.5-1.2B-Instruct | Liquid AI | — | — | — | 8 | — | — | |
GPT-4 | OpenAI | — | — | — | 12.8 | 37 tok/s | 0.8s | |
Gemini 2.0 Flash Thinking Experimental (Dec '24) | — | — | — | 12.3 | — | — | ||
Gemini 1.0 Ultra | — | — | — | 10.1 | — | — | ||
PALM-2 | — | — | — | 8.6 | — | — | ||
Claude 3 Haiku | Anthropic | — | — | — | 12.3 | 132 tok/s | 0.5s | |
Claude 4.1 Opus (Non-reasoning) | Anthropic | — | — | — | 36 | 36 tok/s | 1.4s | |
Grok 4.20 0309 v2 (Non-reasoning) | xAI | — | — | — | 29 | 162 tok/s | 0.4s | |
Grok 4.20 0309 v2 (Reasoning) | xAI | — | — | — | 49.3 | 225 tok/s | 14.9s | |
R1 1776 | Perplexity | — | — | — | 12 | — | — | |
Codestral | Mistral AI | $0.3 | $0.9 | 32K | — | 180 tok/s | — | |
DeepSeek-V2.5 (Dec '24) | DeepSeek | — | — | — | 12.5 | — | — | |
DeepSeek-Coder-V2 | DeepSeek | — | — | — | 10.6 | — | — | |
DeepSeek LLM 67B Chat (V1) | DeepSeek | — | — | — | 8.4 | — | — | |
Gemini 3.1 Pro Preview | — | — | — | 57.2 | 130 tok/s | 24.6s | ||
Gemini 3.1 Flash-Lite Preview | — | — | — | 33.5 | 338 tok/s | 5.3s | ||
Sonar Reasoning | Perplexity | — | — | — | 17.9 | — | — | |
Sonar Reasoning Pro | Perplexity | — | — | — | 24.6 | — | — | |
Grok 3 Reasoning Beta | xAI | — | — | — | 21.6 | — | — | |
Grok 4.20 0309 (Reasoning) | xAI | — | — | — | 48.5 | 215 tok/s | 18.3s | |
Grok 4.20 0309 (Non-reasoning) | xAI | — | — | — | 29.7 | 172 tok/s | 0.4s | |
Magpie-Multilingual 357M (Feb 2026) | NVIDIA | — | — | — | — | — | — | |
Solar Mini | Upstage | — | — | — | 11.9 | 92 tok/s | 1.5s | |
MiniMax-M2.5 | MiniMax | — | — | — | 41.9 | 68 tok/s | 1.8s | |
Mistral Small 4 (Reasoning) | Mistral | — | — | — | 27.8 | 175 tok/s | 0.5s | |
Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) | Anthropic | — | — | — | 51.7 | 71 tok/s | 54.0s | |
Claude Sonnet 4.6 (Non-reasoning, Low Effort) | Anthropic | — | — | — | 42.6 | 53 tok/s | 1.0s | |
Reka Flash (Sep '24) | Reka AI | — | — | — | 12 | 86 tok/s | 1.3s | |
Claude Opus 4.6 (Adaptive Reasoning, Max Effort) | Anthropic | — | — | — | 53 | 57 tok/s | 12.3s | |
Gemma 4 E4B (Reasoning) | — | — | — | 18.8 | — | — | ||
Gemma 4 E4B (Non-reasoning) | — | — | — | 14.8 | — | — | ||
Gemma 4 E2B (Reasoning) | — | — | — | 15.2 | — | — | ||
Magpie Multilingual | NVIDIA | — | — | — | — | — | — | |
GLM-4.7-Flash (Reasoning) | Z AI | — | — | — | 30.1 | 88 tok/s | 0.9s | |
Gemma 4 E2B (Non-reasoning) | — | — | — | 12.1 | — | — | ||
GLM-4.7-Flash (Non-reasoning) | Z AI | — | — | — | 22.1 | 139 tok/s | 1.3s | |
Grok-1 | xAI | — | — | — | 11.7 | — | — | |
Qwen1.5 Chat 110B | Alibaba | — | — | — | 9.5 | — | — | |
Gemma 4 26B A4B (Non-reasoning) | — | — | — | 27.1 | — | — | ||
Gemini 3 Deep Think | — | — | — | — | — | — | ||
Gemma 4 31B (Non-reasoning) | — | — | — | 32.3 | — | — | ||
Gemma 4 31B (Reasoning) | — | — | — | 39.2 | 35 tok/s | 1.0s | ||
Muse Spark | Meta | — | — | — | 52.1 | — | — | |
GPT-5.4 (Non-reasoning) | OpenAI | — | — | — | 35.4 | 61 tok/s | 0.7s | |
Arctic Instruct | Snowflake | — | — | — | 8.8 | — | — | |
Qwen Chat 72B | Alibaba | — | — | — | 8.8 | — | — | |
GPT-5.3 Codex (xhigh) | OpenAI | — | — | — | 53.6 | 90 tok/s | 71.3s | |
Gemini 2.5 Flash Lite TTS | — | — | — | — | — | — | ||
Gemini 2.5 Flash TTS (Dec 2025) | — | — | — | — | — | — | ||
GPT-5.4 nano (medium) | OpenAI | — | — | — | 38.1 | 161 tok/s | 3.1s | |
Inworld TTS 1.5 Max | Inworld | — | — | — | — | — | — | |
Eleven v3 | ElevenLabs | — | — | — | — | — | — | |
Inworld TTS 1 Max | Inworld | — | — | — | — | — | — | |
Speech 2.8 Turbo | MiniMax | — | — | — | — | — | — | |
Step TTS 2 (Mar 2026) | StepFun | — | — | — | — | — | — | |
Speech 2.6 HD | MiniMax | — | — | — | — | — | — | |
Speech 2.6 Turbo | MiniMax | — | — | — | — | — | — | |
Inworld TTS 1 | Inworld | — | — | — | — | — | — | |
Speech-02-HD | MiniMax | — | — | — | — | — | — | |
Azure HD 2.5 | Microsoft Azure | — | — | — | — | — | — | |
Multilingual v2 | ElevenLabs | — | — | — | — | — | — | |
Step Audio EditX (Mar 2026) | StepFun | — | — | — | — | — | — | |
Speech-02-Turbo | MiniMax | — | — | — | — | — | — | |
TTS-1 | OpenAI | — | — | — | — | — | — | |
TTS-1 HD | OpenAI | — | — | — | — | — | — | |
Turbo v2.5 | ElevenLabs | — | — | — | — | — | — | |
Flash v2.5 | ElevenLabs | — | — | — | — | — | — | |
Sonic 3 | Cartesia | — | — | — | — | — | — | |
OpenAudio S1 | Fish Audio | — | — | — | — | — | — | |
SIMBA 1.6 | Speechify | — | — | — | — | — | — | |
Studio | — | — | — | — | — | — | ||
T2A-01-HD | MiniMax | — | — | — | — | — | — | |
Kokoro 82M v1.0 | Kokoro | — | — | — | — | — | — | |
Voxtral TTS | Mistral | — | — | — | — | — | — | |
Polly Generative | Amazon | — | — | — | — | — | — | |
AsyncFlow V2, async | async | — | — | — | — | — | — | |
Azure Neural | Microsoft Azure | — | — | — | — | — | — | |
Maya1 | Maya Research | — | — | — | — | — | — | |
Inworld TTS 1.5 Mini | Inworld | — | — | — | — | — | — | |
Polly Long-Form | Amazon | — | — | — | — | — | — | |
Chatterbox HD | Resemble AI | — | — | — | — | — | — | |
Journey | — | — | — | — | — | — | ||
SIMBA 1.0 | Speechify | — | — | — | — | — | — | |
MiMo-V2-TTS | Xiaomi | — | — | — | — | — | — | |
Gemini 2.5 Pro (Dec 2025) | — | — | — | — | — | — | ||
T2A-01-Turbo | MiniMax | — | — | — | — | — | — | |
Lightning v3.1 | Smallest.ai | — | — | — | — | — | — | |
Octave TTS | Hume AI | — | — | — | — | — | — | |
Fish Speech 1.5 | Fish Audio | — | — | — | — | — | — | |
MAI-Voice-1 | Microsoft Azure | — | — | — | — | — | — | |
Chatterbox | Resemble AI | — | — | — | — | — | — | |
Magpie-Multilingual 357M | NVIDIA | — | — | — | — | — | — | |
Zonos-v0.1 | Zyphra | — | — | — | — | — | — | |
LMNT | LMNT | — | — | — | — | — | — | |
VibeVoice 1.5B | Microsoft Azure | — | — | — | — | — | — | |
VibeVoice 7B | Microsoft Azure | — | — | — | — | — | — | |
Murf Speech Gen 2 | Murf AI | — | — | — | — | — | — | |
OpenVoice v2 | OpenVoice | — | — | — | — | — | — | |
Neuphonic TTS | Neuphonic | — | — | — | — | — | — | |
Qwen3 TTS Flash | Alibaba | — | — | — | — | — | — | |
Qwen3 TTS | Alibaba | — | — | — | — | — | — | |
XTTS v2 | Coqui | — | — | — | — | — | — | |
StyleTTS 2 | StyleTTS | — | — | — | — | — | — | |
WaveNet | — | — | — | — | — | — | ||
Polly Neural | Amazon | — | — | — | — | — | — | |
Claude Opus 4.7 (Adaptive Reasoning, Max Effort) | Anthropic | — | — | — | 57.3 | 53 tok/s | 7.5s | |
Sonic English (Oct 2024) | Cartesia | — | — | — | — | — | — | |
Qwen3.5 9B (Non-reasoning) | Alibaba | — | — | — | 27.3 | 141 tok/s | 0.3s | |
GPT-5.4 mini (Non-Reasoning) | OpenAI | — | — | — | 23.3 | 164 tok/s | 0.5s | |
Chirp 3: HD | — | — | — | — | — | — | ||
Falcon (Beta) | Murf AI | — | — | — | — | — | — | |
Polly Standard | Amazon | — | — | — | — | — | — | |
JT-MINI | China Mobile | — | — | — | 25.4 | — | — | |
GPT-5.4 Pro (xhigh) | OpenAI | — | — | — | — | — | — | |
Gemma 4 26B A4B (Reasoning) | — | — | — | 31.2 | — | — | ||
Mistral Small 4 (Non-reasoning) | Mistral | — | — | — | 18.6 | 147 tok/s | 0.4s | |
GPT-5.4 nano (xhigh) | OpenAI | — | — | — | 44 | 163 tok/s | 2.8s | |
Qwen Chat 14B | Alibaba | — | — | — | 7.4 | — | — | |
GPT-5.4 (xhigh) | OpenAI | — | — | — | 56.8 | 85 tok/s | 168.3s | |
GLM-5.1 (Non-reasoning) | Z AI | — | — | — | 43.8 | 48 tok/s | 1.3s | |
MetaVoice v1 | MetaVoice | — | — | — | — | — | — | |
GPT-5.4 mini (xhigh) | OpenAI | — | — | — | 48.9 | 188 tok/s | 7.6s | |
DeepSeek-V2-Chat | DeepSeek | — | — | — | 9.1 | — | — | |
Qwen3.6 35B A3B (Reasoning) | Alibaba | — | — | — | 43.5 | 239 tok/s | 1.7s | |
Speech 2.8 HD | MiniMax | — | — | — | — | — | — |
Enter your expected usage to compare costs across models
e.g. 1,000,000 = ~750,000 words
Usually 30–50% of input volume
6 models selected
Prices are approximate and may vary. Check provider documentation for current pricing.