Compare pricing, benchmarks, and capabilities across 496 AI models
| Model | Provider | Input $/1M↕ | Output $/1M↕ | Context↕ | Intelligence↑ | Speed↕ | Latency | API |
|---|---|---|---|---|---|---|---|---|
DeepSeek R2 ★ | DeepSeek | $0.55 | $2.19 | 128K | 91% | 60 tok/s | — | |
GPT-4.1 ★ | OpenAI | $2 | $8 | 1M | 90.5% | 80 tok/s | — | |
Claude Opus 4.6 ★ | Anthropic | $15 | $75 | 200K | 88.7% | 60 tok/s | — | |
GPT-4o ★ | OpenAI | $5 | $15 | 128K | 87.2% | 120 tok/s | — | |
Claude Sonnet 4.6 ★ | Anthropic | $3 | $15 | 200K | 86.8% | 100 tok/s | — | |
Llama 3.3 70B Open★ | Meta AI | $0.23 | $0.92 | 128K | 86% | 80 tok/s | — | |
o3 | OpenAI | $10 | $40 | 200K | 96.7% | 40 tok/s | — | |
o4-mini | OpenAI | $1.1 | $4.4 | 200K | 93.4% | 100 tok/s | — | |
Gemini 3 Ultra | Google DeepMind | $7 | $21 | 1M | 90.1% | 70 tok/s | — | |
Gemini 3 Pro Preview (low) | — | — | — | 41.3 | — | — | ||
Claude Opus 4.5 (Reasoning) | Anthropic | — | — | — | 49.7 | 72 tok/s | 11.7s | |
Claude Opus 4.5 (Non-reasoning) | Anthropic | — | — | — | 43.1 | 63 tok/s | 1.1s | |
Gemini 3 Flash Preview (Reasoning) | — | — | — | 46.4 | 195 tok/s | 5.9s | ||
DeepSeek V3 Open | DeepSeek | $0.27 | $1.1 | 128K | 88.5% | 80 tok/s | — | |
MiniMax-M2.1 | MiniMax | — | — | — | 39.4 | 59 tok/s | 2.4s | |
Claude 4.5 Sonnet (Reasoning) | Anthropic | — | — | — | 43 | 59 tok/s | 10.4s | |
Claude 4.1 Opus (Reasoning) | Anthropic | — | — | — | 42 | 42 tok/s | 8.0s | |
Grok 3 | xAI | $3 | $15 | 131K | 87.5% | 90 tok/s | — | |
Llama 3.1 405B Open | Meta AI | $3 | $3 | 128K | 87.3% | 30 tok/s | — | |
Grok 4 | xAI | — | — | — | 41.5 | 64 tok/s | 7.4s | |
Gemini 3 Pro | Google DeepMind | $3.5 | $10.5 | 1M | 87% | 100 tok/s | — | |
Qwen3-Max | Alibaba Cloud | $0.4 | $1.2 | 32K | 87% | 90 tok/s | — | |
GPT-5.1 (high) | OpenAI | — | — | — | 47.7 | 118 tok/s | 25.1s | |
GPT-5.2 (xhigh) | OpenAI | — | — | — | 51.3 | 72 tok/s | 81.3s | |
GPT-5 (high) | OpenAI | — | — | — | 44.6 | 86 tok/s | 99.7s | |
GPT-5 (medium) | OpenAI | — | — | — | 42 | 95 tok/s | 40.4s | |
Claude 4 Opus (Reasoning) | Anthropic | — | — | — | 39 | 41 tok/s | 8.0s | |
GPT-5 Codex (high) | OpenAI | — | — | — | 44.6 | 207 tok/s | 11.4s | |
DeepSeek V3.2 (Reasoning) | DeepSeek | — | — | — | 41.7 | 29 tok/s | 1.4s | |
GPT-5.2 (medium) | OpenAI | — | — | — | 46.6 | — | — | |
Gemini 2.5 Pro | — | — | — | 34.6 | 127 tok/s | 22.0s | ||
Claude 4.5 Sonnet (Non-reasoning) | Anthropic | — | — | — | 37.1 | 56 tok/s | 1.2s | |
Claude 4 Opus (Non-reasoning) | Anthropic | — | — | — | 33 | 37 tok/s | 1.4s | |
Gemini 2.5 Pro Preview (Mar' 25) | — | — | — | 30.3 | — | — | ||
DeepSeek V3.2 Speciale | DeepSeek | — | — | — | 29.4 | — | — | |
GPT-5 (low) | OpenAI | — | — | — | 39.2 | 75 tok/s | 10.3s | |
GPT-5.1 Codex (high) | OpenAI | — | — | — | 43.1 | 167 tok/s | 6.7s | |
GLM-4.7 (Reasoning) | Z AI | — | — | — | 42.1 | 109 tok/s | 0.7s | |
Kimi K2 Thinking | Kimi | — | — | — | 40.9 | 41 tok/s | 1.1s | |
DeepSeek R1 0528 (May '25) | DeepSeek | — | — | — | 27.1 | — | — | |
Qwen3-72B Open | Alibaba Cloud | Free | Free | 32K | 85% | 100 tok/s | — | |
DeepSeek V3.1 (Reasoning) | DeepSeek | — | — | — | 27.7 | — | — | |
DeepSeek V3.1 Terminus (Reasoning) | DeepSeek | — | — | — | 33.9 | — | — | |
Cogito v2.1 (Reasoning) | Deep Cogito | — | — | — | 85% | 57 tok/s | 0.5s | |
Doubao Seed Code | ByteDance Seed | — | — | — | 33.5 | — | — | |
DeepSeek V3.2 Exp (Reasoning) | DeepSeek | — | — | — | 32.9 | 30 tok/s | 1.4s | |
Grok 4 Fast (Reasoning) | xAI | — | — | — | 35.1 | 216 tok/s | 3.4s | |
Grok 4.1 Fast (Reasoning) | xAI | — | — | — | 38.6 | 142 tok/s | 9.2s | |
Phi-4 Open | Microsoft | $0.07 | $0.14 | 16K | 84.8% | 300 tok/s | — | |
Claude 3.7 Sonnet (Reasoning) | Anthropic | — | — | — | 34.7 | — | — | |
Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning) | — | — | — | 25.7 | — | — | ||
DeepSeek V3.2 (Non-reasoning) | DeepSeek | — | — | — | 32.1 | 30 tok/s | 1.3s | |
Gemini 2.5 Pro Preview (May' 25) | — | — | — | 29.5 | — | — | ||
Qwen3 VL 235B A22B (Reasoning) | Alibaba | — | — | — | 27.6 | 45 tok/s | 1.2s | |
K-EXAONE (Reasoning) | LG AI Research | — | — | — | 32.1 | — | — | |
Qwen3 Max (Preview) | Alibaba | — | — | — | 26.1 | 47 tok/s | 1.8s | |
Qwen3 235B A22B 2507 (Reasoning) | Alibaba | — | — | — | 29.5 | 51 tok/s | 1.3s | |
GPT-5 mini (high) | OpenAI | — | — | — | 41.2 | 74 tok/s | 91.5s | |
o1 | OpenAI | — | — | — | 30.8 | 112 tok/s | 23.6s | |
GLM-4.5 (Reasoning) | Z AI | — | — | — | 26.4 | 38 tok/s | 0.9s | |
MiMo-V2-Flash (Reasoning) | Xiaomi | — | — | — | 39.2 | 123 tok/s | 1.8s | |
DeepSeek R1 (Jan '25) | DeepSeek | — | — | — | 18.8 | — | — | |
DeepSeek V3.1 Terminus (Non-reasoning) | DeepSeek | — | — | — | 28.5 | — | — | |
DeepSeek V3.2 Exp (Non-reasoning) | DeepSeek | — | — | — | 28.4 | 31 tok/s | 1.3s | |
Gemini 2.5 Flash Preview (Sep '25) (Reasoning) | — | — | — | 31.1 | — | — | ||
Mistral Large | Mistral AI | $2 | $6 | 128K | 84% | 90 tok/s | — | |
Claude 4 Sonnet (Reasoning) | Anthropic | — | — | — | 38.7 | 59 tok/s | 8.5s | |
Claude 4 Sonnet (Non-reasoning) | Anthropic | — | — | — | 33 | 52 tok/s | 0.8s | |
Grok 3 Mini | xAI | $0.3 | $0.5 | 131K | 83% | 160 tok/s | — | |
ERNIE 5.0 Thinking Preview | Baidu | — | — | — | 29.1 | — | — | |
DeepSeek V3.1 (Non-reasoning) | DeepSeek | — | — | — | 28.1 | — | — | |
Hermes 4 - Llama-3.1 405B (Reasoning) | Nous Research | — | — | — | 18.6 | 32 tok/s | 0.8s | |
GLM-4.6 (Reasoning) | Z AI | — | — | — | 32.5 | 36 tok/s | 0.9s | |
Qwen3 235B A22B 2507 Instruct | Alibaba | — | — | — | 25 | 70 tok/s | 1.2s | |
Nova 2.0 Pro Preview (medium) | Amazon | — | — | — | 35.7 | 120 tok/s | 17.9s | |
Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) | NVIDIA | — | — | — | 15 | 42 tok/s | 0.7s | |
Gemini 2.5 Flash (Reasoning) | — | — | — | 27 | 205 tok/s | 13.3s | ||
Grok 3 mini Reasoning (high) | xAI | — | — | — | 32.1 | 216 tok/s | 0.4s | |
Qwen3 235B A22B (Reasoning) | Alibaba | — | — | — | 19.8 | 65 tok/s | 1.3s | |
GPT-5 mini (medium) | OpenAI | — | — | — | 38.9 | 77 tok/s | 20.0s | |
INTELLECT-3 | Prime Intellect | — | — | — | 22.2 | — | — | |
EXAONE 4.0 32B (Reasoning) | LG AI Research | — | — | — | 16.7 | — | — | |
Qwen3 VL 32B (Reasoning) | Alibaba | — | — | — | 24.7 | 97 tok/s | 1.4s | |
Seed-OSS-36B-Instruct | ByteDance Seed | — | — | — | 25.2 | 42 tok/s | 1.8s | |
Qwen3 VL 235B A22B Instruct | Alibaba | — | — | — | 20.8 | 57 tok/s | 1.2s | |
Kimi K2 0905 | Kimi | — | — | — | 30.9 | 22 tok/s | 2.1s | |
GLM-4.5-Air | Z AI | — | — | — | 23.2 | 65 tok/s | 1.3s | |
MiniMax M1 80k | MiniMax | — | — | — | 24.4 | — | — | |
MiniMax-M2 | MiniMax | — | — | — | 36.1 | 61 tok/s | 2.2s | |
DeepSeek V3 0324 | DeepSeek | — | — | — | 22.3 | — | — | |
Magistral Medium 1.2 | Mistral | — | — | — | 27.1 | 95 tok/s | 0.4s | |
GPT-4o mini | OpenAI | $0.15 | $0.6 | 128K | 82% | 200 tok/s | — | |
Gemini 3 Flash | Google DeepMind | $0.075 | $0.3 | 1M | 82% | 250 tok/s | — | |
Qwen3 Max Thinking (Preview) | Alibaba | — | — | — | 32.5 | 43 tok/s | 1.8s | |
Nova 2.0 Lite (high) | Amazon | — | — | — | 34.5 | 195 tok/s | 21.4s | |
Nova 2.0 Pro Preview (low) | Amazon | — | — | — | 31.9 | 143 tok/s | 6.8s | |
GPT-5 (ChatGPT) | OpenAI | — | — | — | 21.8 | 158 tok/s | 0.6s | |
Kimi K2 | Kimi | — | — | — | 26.3 | 35 tok/s | 1.3s | |
GPT-5.1 Codex mini (high) | OpenAI | — | — | — | 38.6 | 197 tok/s | 5.9s | |
Qwen3 Next 80B A3B (Reasoning) | Alibaba | — | — | — | 26.7 | 164 tok/s | 1.1s | |
Ling-1T | InclusionAI | — | — | — | 19 | — | — | |
Qwen3 Next 80B A3B Instruct | Alibaba | — | — | — | 20.1 | 166 tok/s | 1.0s | |
Llama 4 Maverick | Meta | — | — | — | 18.4 | 115 tok/s | 0.6s | |
GPT-5 (minimal) | OpenAI | — | — | — | 23.9 | 74 tok/s | 1.1s | |
K-EXAONE (Non-reasoning) | LG AI Research | — | — | — | 23.4 | — | — | |
Hermes 4 - Llama-3.1 70B (Reasoning) | Nous Research | — | — | — | 16 | 62 tok/s | 0.6s | |
Llama Nemotron Super 49B v1.5 (Reasoning) | NVIDIA | — | — | — | 18.7 | 60 tok/s | 0.3s | |
Ring-1T | InclusionAI | — | — | — | 22.8 | — | — | |
Gemini 2.5 Flash (Non-reasoning) | — | — | — | 20.6 | 180 tok/s | 0.5s | ||
Nova 2.0 Lite (medium) | Amazon | — | — | — | 29.7 | 177 tok/s | 13.8s | |
Nova 2.0 Omni (medium) | Amazon | — | — | — | 28 | — | — | |
KAT-Coder-Pro V1 | KwaiKAT | — | — | — | 36 | 112 tok/s | 1.0s | |
MiniMax M1 40k | MiniMax | — | — | — | 20.9 | — | — | |
Gemini 2.0 Pro Experimental (Feb '25) | — | — | — | 18.1 | — | — | ||
gpt-oss-120B (high) | OpenAI | — | — | — | 33.3 | 215 tok/s | 0.5s | |
Solar Pro 2 (Reasoning) | Upstage | — | — | — | 14.9 | — | — | |
Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning) | — | — | — | 21.6 | — | — | ||
Mi:dm K 2.5 Pro | Korea Telecom | — | — | — | 23.1 | — | — | |
GPT-5.2 (Non-reasoning) | OpenAI | — | — | — | 33.6 | 63 tok/s | 0.8s | |
Mi:dm K 2.5 Pro Preview | Korea Telecom | — | — | — | 81% | — | — | |
Qwen3 30B A3B 2507 (Reasoning) | Alibaba | — | — | — | 22.4 | 148 tok/s | 1.1s | |
Qwen3 VL 30B A3B (Reasoning) | Alibaba | — | — | — | 19.7 | 127 tok/s | 1.0s | |
Mistral Large 3 | Mistral | — | — | — | 22.8 | 56 tok/s | 0.6s | |
Nova 2.0 Omni (low) | Amazon | — | — | — | 23.2 | — | — | |
Claude 4.5 Haiku (Non-reasoning) | Anthropic | — | — | — | 31.1 | 120 tok/s | 0.5s | |
Gemini 2.5 Flash-Lite Preview (Sep '25) (Non-reasoning) | — | — | — | 19.4 | — | — | ||
GPT-4o (March 2025, chatgpt-4o-latest) | OpenAI | — | — | — | 18.6 | — | — | |
o3-mini (high) | OpenAI | — | — | — | 25.2 | 149 tok/s | 27.7s | |
GLM-4.6V (Reasoning) | Z AI | — | — | — | 23.4 | 27 tok/s | 1.2s | |
Qwen3 32B (Reasoning) | Alibaba | — | — | — | 16.5 | 103 tok/s | 1.1s | |
GPT-5.1 (Non-reasoning) | OpenAI | — | — | — | 27.4 | 108 tok/s | 0.8s | |
Motif-2-12.7B-Reasoning | Motif Technologies | — | — | — | 19.1 | — | — | |
Gemini 2.5 Flash Preview (Reasoning) | — | — | — | 24.3 | — | — | ||
Gemini 2.0 Flash Thinking Experimental (Jan '25) | — | — | — | 19.6 | — | — | ||
DeepSeek R1 Distill Llama 70B | DeepSeek | — | — | — | 16 | 41 tok/s | 0.5s | |
Claude 3.7 Sonnet (Non-reasoning) | Anthropic | — | — | — | 30.8 | — | — | |
Qwen3 Coder 480B A35B Instruct | Alibaba | — | — | — | 24.8 | 65 tok/s | 1.7s | |
Grok Code Fast 1 | xAI | — | — | — | 28.7 | 185 tok/s | 5.4s | |
Nova 2.0 Lite (low) | Amazon | — | — | — | 24.6 | 210 tok/s | 5.1s | |
NVIDIA Nemotron 3 Nano 30B A3B (Reasoning) | NVIDIA | — | — | — | 24.3 | 133 tok/s | 1.3s | |
Llama 3.3 Nemotron Super 49B v1 (Reasoning) | NVIDIA | — | — | — | 18.5 | — | — | |
K2-V2 (high) | MBZUAI Institute of Foundation Models | — | — | — | 20.6 | — | — | |
HyperCLOVA X SEED Think (32B) | Naver | — | — | — | 23.7 | — | — | |
Apriel-v1.6-15B-Thinker | ServiceNow | — | — | — | 27.6 | — | — | |
Ring-flash-2.0 | InclusionAI | — | — | — | 14 | 87 tok/s | 1.4s | |
Qwen3 Omni 30B A3B (Reasoning) | Alibaba | — | — | — | 15.6 | 93 tok/s | 1.0s | |
o3-mini | OpenAI | — | — | — | 25.9 | 151 tok/s | 8.1s | |
GLM-4.5V (Reasoning) | Z AI | — | — | — | 15.1 | 45 tok/s | 1.0s | |
GLM-4.7 (Non-reasoning) | Z AI | — | — | — | 34.2 | 106 tok/s | 0.7s | |
Qwen3 VL 32B Instruct | Alibaba | — | — | — | 17.2 | 83 tok/s | 1.3s | |
Ling-flash-2.0 | InclusionAI | — | — | — | 15.7 | 94 tok/s | 1.5s | |
GPT-5 mini (minimal) | OpenAI | — | — | — | 20.7 | 96 tok/s | 1.1s | |
GPT-4.1 mini | OpenAI | — | — | — | 22.9 | 90 tok/s | 0.6s | |
GLM-4.6 (Non-reasoning) | Z AI | — | — | — | 30.2 | 67 tok/s | 0.9s | |
Gemini 2.5 Flash Preview (Non-reasoning) | — | — | — | 17.8 | — | — | ||
Qwen3 30B A3B (Reasoning) | Alibaba | — | — | — | 15.3 | 70 tok/s | 1.2s | |
ERNIE 4.5 300B A47B | Baidu | — | — | — | 15 | 29 tok/s | 1.8s | |
gpt-oss-120B (low) | OpenAI | — | — | — | 24.5 | 218 tok/s | 0.5s | |
Gemini 2.0 Flash (Feb '25) | — | — | — | 18.5 | — | — | ||
Gemini 2.0 Flash (experimental) | — | — | — | 16.8 | — | — | ||
Command R+ | Cohere | $2.5 | $10 | 128K | 78% | 80 tok/s | — | |
GPT-5 nano (high) | OpenAI | — | — | — | 26.8 | 144 tok/s | 100.6s | |
Qwen3 30B A3B 2507 Instruct | Alibaba | — | — | — | 15 | 92 tok/s | 1.3s | |
Apriel-v1.5-15B-Thinker | ServiceNow | — | — | — | 28.3 | — | — | |
GPT-4o (ChatGPT) | OpenAI | — | — | — | 14.1 | — | — | |
GPT-5 nano (medium) | OpenAI | — | — | — | 25.9 | 145 tok/s | 50.0s | |
Qwen3 14B (Reasoning) | Alibaba | — | — | — | 16.2 | 65 tok/s | 1.1s | |
EXAONE 4.0 32B (Non-reasoning) | LG AI Research | — | — | — | 11.7 | — | — | |
Magistral Small 1.2 | Mistral | — | — | — | 18.2 | 188 tok/s | 0.4s | |
Nova 2.0 Pro Preview (Non-reasoning) | Amazon | — | — | — | 23.1 | 151 tok/s | 0.7s | |
Solar Pro 2 (Preview) (Reasoning) | Upstage | — | — | — | 18.8 | — | — | |
Claude 3.5 Sonnet (Oct '24) | Anthropic | — | — | — | 15.9 | — | — | |
Qwen2.5 Max | Alibaba | — | — | — | 16.3 | 46 tok/s | 1.1s | |
Mistral Medium 3 | Mistral | — | — | — | 18.8 | 62 tok/s | 0.5s | |
Devstral 2 | Mistral | — | — | — | 22 | 79 tok/s | 0.5s | |
Olmo 3.1 32B Think | Allen Institute for AI | — | — | — | 13.9 | — | — | |
Qwen3 235B A22B (Non-reasoning) | Alibaba | — | — | — | 17 | 63 tok/s | 1.2s | |
Olmo 3 32B Think | Allen Institute for AI | — | — | — | 12.1 | — | — | |
QwQ 32B | Alibaba | — | — | — | 19.7 | 33 tok/s | 0.4s | |
Gemini 2.5 Flash-Lite (Reasoning) | — | — | — | 17.6 | 295 tok/s | 12.3s | ||
Claude 4.5 Haiku (Reasoning) | Anthropic | — | — | — | 37.1 | 156 tok/s | 10.0s | |
K2-V2 (medium) | MBZUAI Institute of Foundation Models | — | — | — | 18.7 | — | — | |
Qwen3 VL 30B A3B Instruct | Alibaba | — | — | — | 16.1 | 123 tok/s | 1.0s | |
Sonar Pro | Perplexity | — | — | — | 15.2 | — | — | |
NVIDIA Nemotron Nano 12B v2 VL (Reasoning) | NVIDIA | — | — | — | 14.9 | 151 tok/s | 0.5s | |
Claude Haiku 4.5 | Anthropic | $0.8 | $4 | 200K | 75.2% | 250 tok/s | — | |
Magistral Medium 1 | Mistral | — | — | — | 18.8 | — | — | |
Gemini 1.5 Pro (Sep '24) | — | — | — | 16 | — | — | ||
Gemma 3 27B Open | Google DeepMind | Free | Free | 128K | 75% | 120 tok/s | — | |
Solar Pro 2 (Non-reasoning) | Upstage | — | — | — | 13.6 | — | — | |
Llama 4 Scout | Meta | — | — | — | 13.5 | 137 tok/s | 0.5s | |
Magistral Small 1 | Mistral | — | — | — | 16.8 | — | — | |
Qwen3 VL 8B (Reasoning) | Alibaba | — | — | — | 16.7 | 135 tok/s | 1.1s | |
Claude 3.5 Sonnet (June '24) | Anthropic | — | — | — | 14.2 | — | — | |
gpt-oss-20B (high) | OpenAI | — | — | — | 24.5 | 252 tok/s | 0.3s | |
GLM-4.6V (Non-reasoning) | Z AI | — | — | — | 17.1 | 23 tok/s | 5.9s | |
GLM-4.5V (Non-reasoning) | Z AI | — | — | — | 12.7 | 39 tok/s | 29.9s | |
o1-mini | OpenAI | — | — | — | 20.4 | — | — | |
MiMo-V2-Flash (Non-reasoning) | Xiaomi | — | — | — | 30.4 | 124 tok/s | 1.5s | |
NVIDIA Nemotron Nano 9B V2 (Non-reasoning) | NVIDIA | — | — | — | 13.2 | 153 tok/s | 0.7s | |
Nova 2.0 Lite (Non-reasoning) | Amazon | — | — | — | 18 | 182 tok/s | 0.8s | |
DeepSeek R1 Distill Qwen 14B | DeepSeek | — | — | — | 15.8 | — | — | |
Grok 4.1 Fast (Non-reasoning) | xAI | — | — | — | 23.6 | 131 tok/s | 0.4s | |
Qwen3 4B 2507 (Reasoning) | Alibaba | — | — | — | 18.2 | — | — | |
NVIDIA Nemotron Nano 9B V2 (Reasoning) | NVIDIA | — | — | — | 14.8 | 117 tok/s | 0.3s | |
GPT-4o (May '24) | OpenAI | — | — | — | 14.5 | 101 tok/s | 0.5s | |
DeepSeek R1 0528 Qwen3 8B | DeepSeek | — | — | — | 16.4 | — | — | |
DeepSeek R1 Distill Qwen 32B | DeepSeek | — | — | — | 17.2 | 42 tok/s | 0.5s | |
Qwen3 8B (Reasoning) | Alibaba | — | — | — | 13.2 | 91 tok/s | 1.0s | |
DBRX Open | Databricks | $0.75 | $2.25 | 33K | 73.7% | 100 tok/s | — | |
Qwen3 Omni 30B A3B Instruct | Alibaba | — | — | — | 10.7 | 106 tok/s | 1.1s | |
Llama 3.2 11B Vision Open | Meta AI | $0.18 | $0.18 | 128K | 73% | 150 tok/s | — | |
Llama 3.1 Instruct 405B | Meta | — | — | — | 17.4 | 31 tok/s | 0.7s | |
Nova Premier | Amazon | — | — | — | 19 | 70 tok/s | 1.2s | |
Solar Pro 2 (Preview) (Non-reasoning) | Upstage | — | — | — | 16 | — | — | |
Falcon-H1R-7B | TII UAE | — | — | — | 15.8 | — | — | |
Grok 4 Fast (Non-reasoning) | xAI | — | — | — | 23.1 | 196 tok/s | 0.4s | |
Hermes 4 - Llama-3.1 405B (Non-reasoning) | Nous Research | — | — | — | 17.6 | 32 tok/s | 0.9s | |
Qwen3 32B (Non-reasoning) | Alibaba | — | — | — | 14.5 | 102 tok/s | 1.2s | |
Llama 3.1 Tulu3 405B | Allen Institute for AI | — | — | — | 14.1 | — | — | |
Nova 2.0 Omni (Non-reasoning) | Amazon | — | — | — | 16.6 | 227 tok/s | 0.9s | |
Qwen2.5 Instruct 72B | Alibaba | — | — | — | 15.6 | 55 tok/s | 1.2s | |
gpt-oss-20B (low) | OpenAI | — | — | — | 20.8 | 261 tok/s | 0.4s | |
Gemini 3.1 Flash-Lite | Google DeepMind | $0.01 | $0.04 | 1M | 72% | 500 tok/s | — | |
Mistral Small | Mistral AI | $0.1 | $0.3 | 32K | 72% | 200 tok/s | — | |
Command R | Cohere | $0.15 | $0.6 | 128K | 72% | 150 tok/s | — | |
Gemini 2.5 Flash-Lite (Non-reasoning) | — | — | — | 12.7 | 260 tok/s | 0.4s | ||
Gemini 2.0 Flash-Lite (Feb '25) | — | — | — | 14.7 | — | — | ||
Devstral Medium | Mistral | — | — | — | 18.7 | 145 tok/s | 0.5s | |
K2-V2 (low) | MBZUAI Institute of Foundation Models | — | — | — | 14.4 | — | — | |
Qwen3 30B A3B (Non-reasoning) | Alibaba | — | — | — | 12.5 | 67 tok/s | 1.2s | |
Qwen3 Coder 30B A3B Instruct | Alibaba | — | — | — | 20 | 113 tok/s | 1.4s | |
Llama 3.3 Instruct 70B | Meta | — | — | — | 14.5 | 96 tok/s | 0.6s | |
Grok 2 (Dec '24) | xAI | — | — | — | 13.9 | — | — | |
Command A | Cohere | — | — | — | 13.5 | 40 tok/s | 0.6s | |
Falcon 180B Open | TII | Free | Free | 4K | 70.4% | 20 tok/s | — | |
Llama 3.3 Nemotron Super 49B v1 (Non-reasoning) | NVIDIA | — | — | — | 14.3 | — | — | |
Sarvam M (Reasoning) | Sarvam | — | — | — | 8.4 | — | — | |
Claude 3 Opus | Anthropic | — | — | — | 18 | — | — | |
Qwen3 VL 4B (Reasoning) | Alibaba | — | — | — | 13.7 | — | — | |
Mistral Large 2 (Nov '24) | Mistral | — | — | — | 15.1 | 41 tok/s | 0.5s | |
Qwen2.5 Instruct 32B | Alibaba | — | — | — | 13.2 | — | — | |
Grok Beta | xAI | — | — | — | 13.3 | — | — | |
Qwen3 4B (Reasoning) | Alibaba | — | — | — | 14.2 | 104 tok/s | 1.0s | |
Pixtral Large | Mistral | — | — | — | 14 | 51 tok/s | 0.5s | |
Qwen3 VL 8B Instruct | Alibaba | — | — | — | 14.3 | 148 tok/s | 0.9s | |
Ministral 3 14B | Mistral | — | — | — | 16 | 99 tok/s | 0.3s | |
Nova Pro | Amazon | — | — | — | 13.5 | — | — | |
Llama 3.1 Nemotron Instruct 70B | NVIDIA | — | — | — | 13.4 | 46 tok/s | 0.3s | |
Sonar | Perplexity | — | — | — | 15.5 | — | — | |
Llama Nemotron Super 49B v1.5 (Non-reasoning) | NVIDIA | — | — | — | 14.6 | 58 tok/s | 0.3s | |
GPT-4 Turbo | OpenAI | — | — | — | 13.7 | 32 tok/s | 1.2s | |
Llama 3.1 Instruct 70B | Meta | — | — | — | 12.5 | 31 tok/s | 0.8s | |
Mistral Medium 3.1 | Mistral | — | — | — | 21.3 | 89 tok/s | 0.4s | |
Mistral Small 3.2 | Mistral | — | — | — | 15.1 | 155 tok/s | 0.3s | |
Mistral Large 2 (Jul '24) | Mistral | — | — | — | 13 | — | — | |
Qwen3 14B (Non-reasoning) | Alibaba | — | — | — | 12.8 | 65 tok/s | 1.0s | |
Gemini 1.5 Flash (Sep '24) | — | — | — | 13.8 | — | — | ||
Devstral Small 2 | Mistral | — | — | — | 19.5 | 80 tok/s | 0.7s | |
Ling-mini-2.0 | InclusionAI | — | — | — | 9.2 | — | — | |
Qwen3 4B 2507 Instruct | Alibaba | — | — | — | 12.9 | — | — | |
Llama 3.2 Instruct 90B (Vision) | Meta | — | — | — | 11.9 | 42 tok/s | 0.5s | |
Reka Flash 3 | Reka AI | — | — | — | 9.5 | 94 tok/s | 1.3s | |
Olmo 3 7B Think | Allen Institute for AI | — | — | — | 9.4 | — | — | |
Hermes 4 - Llama-3.1 70B (Non-reasoning) | Nous Research | — | — | — | 12.6 | 63 tok/s | 0.6s | |
GPT-4.1 nano | OpenAI | — | — | — | 13 | 200 tok/s | 0.4s | |
Gemini 1.5 Pro (May '24) | — | — | — | 12 | — | — | ||
Mistral Small 3.1 | Mistral | — | — | — | 14.5 | 153 tok/s | 0.5s | |
NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning) | NVIDIA | — | — | — | 10.1 | 175 tok/s | 0.7s | |
Mistral Small 3 | Mistral | — | — | — | 12.7 | 154 tok/s | 0.5s | |
QwQ 32B-Preview | Alibaba | — | — | — | 15.2 | 43 tok/s | 0.5s | |
Qwen3 8B (Non-reasoning) | Alibaba | — | — | — | 10.6 | 94 tok/s | 0.9s | |
Ministral 3 8B | Mistral | — | — | — | 14.8 | 180 tok/s | 0.3s | |
Qwen2.5 Coder Instruct 32B | Alibaba | — | — | — | 12.9 | — | — | |
Devstral Small (May '25) | Mistral | — | — | — | 18 | — | — | |
Claude 3.5 Haiku | Anthropic | — | — | — | 18.7 | — | — | |
Qwen3 VL 4B Instruct | Alibaba | — | — | — | 9.6 | — | — | |
Qwen2.5 Turbo | Alibaba | — | — | — | 12 | 68 tok/s | 1.2s | |
Devstral Small (Jul '25) | Mistral | — | — | — | 15.2 | 202 tok/s | 0.4s | |
Qwen2 Instruct 72B | Alibaba | — | — | — | 11.7 | — | — | |
Granite 4.0 H Small | IBM | — | — | — | 10.8 | 453 tok/s | 8.7s | |
Mistral Saba | Mistral | — | — | — | 12.1 | — | — | |
Gemma 3 12B Instruct | — | — | — | 8.8 | 30 tok/s | 10.2s | ||
Nova Lite | Amazon | — | — | — | 12.7 | 221 tok/s | 0.7s | |
Exaone 4.0 1.2B (Reasoning) | LG AI Research | — | — | — | 8.3 | — | — | |
Kimi Linear 48B A3B Instruct | Kimi | — | — | — | 14.4 | — | — | |
Qwen3 4B (Non-reasoning) | Alibaba | — | — | — | 12.5 | 105 tok/s | 1.0s | |
Claude 3 Sonnet | Anthropic | — | — | — | 10.3 | — | — | |
Jamba 1.7 Large | AI21 Labs | — | — | — | 10.9 | 49 tok/s | 1.1s | |
Jamba Reasoning 3B | AI21 Labs | — | — | — | 9.6 | — | — | |
NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning) | NVIDIA | — | — | — | 13.2 | 78 tok/s | 0.3s | |
DeepHermes 3 - Mistral 24B Preview (Non-reasoning) | Nous Research | — | — | — | 10.9 | — | — | |
Jamba 1.5 Large | AI21 Labs | — | — | — | 10.7 | — | — | |
Llama 3 Instruct 70B | Meta | — | — | — | 8.9 | 42 tok/s | 0.7s | |
Gemini 1.5 Flash-8B | — | — | — | 11.1 | — | — | ||
Hermes 3 - Llama-3.1 70B | Nous Research | — | — | — | 10.6 | 28 tok/s | 0.4s | |
Qwen3 1.7B (Reasoning) | Alibaba | — | — | — | 8 | 138 tok/s | 1.0s | |
Gemini 1.5 Flash (May '24) | — | — | — | 10.5 | — | — | ||
Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning) | NVIDIA | — | — | — | 14.4 | — | — | |
Jamba 1.6 Large | AI21 Labs | — | — | — | 10.6 | 48 tok/s | 0.9s | |
GPT-5 nano (minimal) | OpenAI | — | — | — | 13.8 | 142 tok/s | 1.0s | |
DeepSeek R1 Distill Llama 8B | DeepSeek | — | — | — | 12.1 | — | — | |
Mixtral 8x22B Instruct | Mistral | — | — | — | 9.8 | — | — | |
Nova Micro | Amazon | — | — | — | 10.3 | 314 tok/s | 0.6s | |
Ministral 3 3B | Mistral | — | — | — | 11.2 | 307 tok/s | 0.3s | |
Olmo 3 7B Instruct | Allen Institute for AI | — | — | — | 8.2 | — | — | |
OLMo 2 32B | Allen Institute for AI | — | — | — | 10.6 | — | — | |
LFM2 8B A1B | Liquid AI | — | — | — | 7 | — | — | |
Claude 2.1 | Anthropic | — | — | — | 9.3 | — | — | |
Exaone 4.0 1.2B (Non-reasoning) | LG AI Research | — | — | — | 8.1 | — | — | |
Gemma 3n E4B Instruct | — | — | — | 6.4 | 14 tok/s | 0.4s | ||
Claude 2.0 | Anthropic | — | — | — | 9.1 | — | — | |
Mistral Medium | Mistral | — | — | — | 9 | 89 tok/s | 0.4s | |
Phi-4 Multimodal Instruct | Microsoft Azure | — | — | — | 10 | 16 tok/s | 0.4s | |
Llama 3.1 Instruct 8B | Meta | — | — | — | 11.8 | 170 tok/s | 0.4s | |
Gemma 3n E4B Instruct Preview (May '25) | — | — | — | 10.1 | — | — | ||
Granite 3.3 8B (Non-reasoning) | IBM | — | — | — | 7 | 427 tok/s | 7.3s | |
Phi-4 Mini Instruct | Microsoft Azure | — | — | — | 8.4 | 44 tok/s | 0.3s | |
Qwen2.5 Coder Instruct 7B | Alibaba | — | — | — | 10 | — | — | |
Llama 3.2 Instruct 11B (Vision) | Meta | — | — | — | 8.7 | 79 tok/s | 0.5s | |
GPT-3.5 Turbo | OpenAI | — | — | — | 9 | 89 tok/s | 0.5s | |
Granite 4.0 Micro | IBM | — | — | — | 7.7 | — | — | |
Phi-3 Mini Instruct 3.8B | Microsoft Azure | — | — | — | 10.1 | — | — | |
Gemini 1.0 Pro | — | — | — | 8.5 | — | — | ||
Claude Instant | Anthropic | — | — | — | 7.4 | — | — | |
DeepSeek Coder V2 Lite Instruct | DeepSeek | — | — | — | 8.5 | — | — | |
LFM 40B | Liquid AI | — | — | — | 8.8 | — | — | |
Command-R+ (Apr '24) | Cohere | — | — | — | 8.3 | — | — | |
Gemma 3 4B Instruct | — | — | — | 6.3 | 30 tok/s | 1.1s | ||
Mistral Small (Feb '24) | Mistral | — | — | — | 9 | 154 tok/s | 0.5s | |
Qwen3 1.7B (Non-reasoning) | Alibaba | — | — | — | 6.8 | 141 tok/s | 0.9s | |
Llama 2 Chat 13B | Meta | — | — | — | 8.4 | — | — | |
Llama 2 Chat 70B | Meta | — | — | — | 8.4 | — | — | |
Llama 3 Instruct 8B | Meta | — | — | — | 6.4 | 82 tok/s | 0.5s | |
Mixtral 8x7B Instruct | Mistral | — | — | — | 7.7 | — | — | |
Jamba 1.7 Mini | AI21 Labs | — | — | — | 8.1 | — | — | |
Gemma 3n E2B Instruct | — | — | — | 4.8 | 51 tok/s | 0.5s | ||
Molmo 7B-D | Allen Institute for AI | — | — | — | 9.2 | — | — | |
Jamba 1.5 Mini | AI21 Labs | — | — | — | 8 | — | — | |
DeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning) | Nous Research | — | — | — | 7.6 | — | — | |
Jamba 1.6 Mini | AI21 Labs | — | — | — | 7.9 | 178 tok/s | 0.8s | |
Qwen3 0.6B (Reasoning) | Alibaba | — | — | — | 6.5 | 189 tok/s | 0.9s | |
Llama 3.2 Instruct 3B | Meta | — | — | — | 9.7 | 53 tok/s | 0.6s | |
Command-R (Mar '24) | Cohere | — | — | — | 7.4 | — | — | |
Granite 4.0 1B | IBM | — | — | — | 7.3 | — | — | |
OpenChat 3.5 (1210) | OpenChat | — | — | — | 8.3 | — | — | |
LFM2 2.6B | Liquid AI | — | — | — | 8 | — | — | |
OLMo 2 7B | Allen Institute for AI | — | — | — | 9.3 | — | — | |
Granite 4.0 H 1B | IBM | — | — | — | 8 | — | — | |
DeepSeek R1 Distill Qwen 1.5B | DeepSeek | — | — | — | 9.1 | — | — | |
LFM2 1.2B | Liquid AI | — | — | — | 6.3 | — | — | |
Mistral 7B Instruct | Mistral | — | — | — | 7.4 | 190 tok/s | 0.3s | |
Qwen3 0.6B (Non-reasoning) | Alibaba | — | — | — | 5.7 | 194 tok/s | 0.9s | |
Llama 3.2 Instruct 1B | Meta | — | — | — | 6.3 | 88 tok/s | 0.6s | |
Llama 2 Chat 7B | Meta | — | — | — | 9.7 | 108 tok/s | 12.6s | |
Gemma 3 1B Instruct | — | — | — | 5.5 | 48 tok/s | 0.6s | ||
Granite 4.0 H 350M | IBM | — | — | — | 5.4 | — | — | |
Granite 4.0 350M | IBM | — | — | — | 6.1 | — | — | |
Gemma 3 270M | — | — | — | 7.7 | — | — | ||
Qwen3.5 27B (Non-reasoning) | Alibaba | — | — | — | 37.2 | 92 tok/s | 1.4s | |
Gemini 3.1 Flash-Lite Preview | — | — | — | 33.5 | 319 tok/s | 5.7s | ||
GLM-4.7-Flash (Non-reasoning) | Z AI | — | — | — | 22.1 | 105 tok/s | 1.0s | |
Qwen3.5 35B A3B (Reasoning) | Alibaba | — | — | — | 37.1 | 149 tok/s | 1.2s | |
GLM-4.7-Flash (Reasoning) | Z AI | — | — | — | 30.1 | 91 tok/s | 0.9s | |
Gemma 4 E4B (Reasoning) | — | — | — | 18.8 | — | — | ||
GPT-5.4 mini (medium) | OpenAI | — | — | — | 37.7 | 181 tok/s | 6.3s | |
Qwen3.5 2B (Reasoning) | Alibaba | — | — | — | 16.3 | — | — | |
GPT-5.4 mini (xhigh) | OpenAI | — | — | — | 48.9 | 189 tok/s | 6.9s | |
Qwen3.5 9B (Reasoning) | Alibaba | — | — | — | 32.4 | 56 tok/s | 0.4s | |
Qwen3 Coder Next | Alibaba | — | — | — | 28.3 | 165 tok/s | 0.8s | |
Gemma 4 31B (Non-reasoning) | — | — | — | 32.3 | — | — | ||
Nemotron Cascade 2 30B A3B | NVIDIA | — | — | — | 28.4 | — | — | |
Step 3.5 Flash | StepFun | — | — | — | 37.8 | 163 tok/s | 0.8s | |
Qwen3.5 4B (Non-reasoning) | Alibaba | — | — | — | 22.6 | 178 tok/s | 0.3s | |
Qwen3.5 0.8B (Non-reasoning) | Alibaba | — | — | — | 9.9 | 285 tok/s | 0.3s | |
Qwen3.5 2B (Non-reasoning) | Alibaba | — | — | — | 14.7 | 232 tok/s | 0.3s | |
Grok-1 | xAI | — | — | — | 11.7 | — | — | |
Qwen3.5 0.8B (Reasoning) | Alibaba | — | — | — | 10.5 | — | — | |
Qwen3.5 397B A17B (Reasoning) | Alibaba | — | — | — | 45 | 52 tok/s | 1.5s | |
GLM-5 (Reasoning) | Z AI | — | — | — | 49.8 | 67 tok/s | 0.9s | |
Gemini 3 Deep Think | — | — | — | — | — | — | ||
Tiny Aya Global | Cohere | — | — | — | 4.7 | — | — | |
Gemma 4 26B A4B (Non-reasoning) | — | — | — | 27.1 | — | — | ||
Muse Spark | Meta | — | — | — | 52.1 | — | — | |
GLM 5V Turbo (Reasoning) | Z AI | — | — | — | 42.9 | — | — | |
Qwen Chat 72B | Alibaba | — | — | — | 8.8 | — | — | |
Gemma 4 31B (Reasoning) | — | — | — | 39.2 | 35 tok/s | 1.0s | ||
Arctic Instruct | Snowflake | — | — | — | 8.8 | — | — | |
GPT-5.4 nano (medium) | OpenAI | — | — | — | 38.1 | 158 tok/s | 3.8s | |
Qwen1.5 Chat 110B | Alibaba | — | — | — | 9.5 | — | — | |
GLM-5-Turbo | Z AI | — | — | — | 46.8 | — | — | |
GLM-5.1 (Reasoning) | Z AI | — | — | — | 51.4 | 43 tok/s | 1.2s | |
GLM-5 (Non-reasoning) | Z AI | — | — | — | 40.6 | 53 tok/s | 1.4s | |
Trinity Large Thinking | Arcee AI | — | — | — | 31.9 | 127 tok/s | 0.6s | |
Apertus 8B Instruct | Swiss AI Initiative | — | — | — | 5.9 | — | — | |
Apertus 70B Instruct | Swiss AI Initiative | — | — | — | 7.7 | — | — | |
Tri-21B-Think | Trillion Labs | — | — | — | 18.6 | — | — | |
Nanbeige4.1-3B | Nanbeige | — | — | — | 16.1 | — | — | |
Ling 2.6 Flash | InclusionAI | — | — | — | 26.2 | 202 tok/s | 0.8s | |
Tri-21B-think Preview | Trillion Labs | — | — | — | 20 | — | — | |
LongCat Flash Lite | LongCat | — | — | — | 23.9 | 115 tok/s | 3.9s | |
Step 3.5 Flash 2603 | StepFun | — | — | — | 38.5 | 186 tok/s | 0.8s | |
Mercury 2 | Inception | — | — | — | 32.8 | 872 tok/s | 4.7s | |
o1-preview | OpenAI | — | — | — | 23.7 | — | — | |
Kimi K2.5 (Non-reasoning) | Kimi | — | — | — | 37.3 | 32 tok/s | 1.4s | |
K2 Think V2 | MBZUAI Institute of Foundation Models | — | — | — | 24.1 | — | — | |
GPT-5.4 nano (Non-Reasoning) | OpenAI | — | — | — | 24.4 | 161 tok/s | 0.6s | |
Sarvam 105B (high) | Sarvam | — | — | — | 18.2 | 124 tok/s | 1.2s | |
Olmo 3.1 32B Instruct | Allen Institute for AI | — | — | — | 12.2 | 54 tok/s | 0.3s | |
Sarvam 30B (high) | Sarvam | — | — | — | 12.3 | 294 tok/s | 1.2s | |
MiMo-V2-Omni-0327 | Xiaomi | — | — | — | 44.9 | — | — | |
Claude Opus 4.7 (Adaptive Reasoning, Max Effort) | Anthropic | — | — | — | 57.3 | 57 tok/s | 11.6s | |
Step3 VL 10B | StepFun | — | — | — | 15.4 | — | — | |
KAT Coder Pro V2 | KwaiKAT | — | — | — | 43.8 | 114 tok/s | 1.8s | |
GPT-5.4 (xhigh) | OpenAI | — | — | — | 56.8 | 81 tok/s | 157.8s | |
Mistral Small 4 (Non-reasoning) | Mistral | — | — | — | 18.6 | 149 tok/s | 0.5s | |
MiMo-V2-Pro | Xiaomi | — | — | — | 49.2 | 67 tok/s | 2.1s | |
GPT-5.4 (Non-reasoning) | OpenAI | — | — | — | 35.4 | 62 tok/s | 0.7s | |
MiMo-V2-Omni | Xiaomi | — | — | — | 43.4 | — | — | |
JT-MINI | China Mobile | — | — | — | 25.4 | — | — | |
GLM-5.1 (Non-reasoning) | Z AI | — | — | — | 43.8 | 47 tok/s | 2.1s | |
GPT-5.3 Codex (xhigh) | OpenAI | — | — | — | 53.6 | 85 tok/s | 60.3s | |
Qwen3.5 9B (Non-reasoning) | Alibaba | — | — | — | 27.3 | 143 tok/s | 0.3s | |
GPT-5.4 Pro (xhigh) | OpenAI | — | — | — | — | — | — | |
Gemma 4 26B A4B (Reasoning) | — | — | — | 31.2 | — | — | ||
MiMo-V2-Flash (Feb 2026) | Xiaomi | — | — | — | 41.5 | 127 tok/s | 1.5s | |
Qwen Chat 14B | Alibaba | — | — | — | 7.4 | — | — | |
GPT-5.4 mini (Non-Reasoning) | OpenAI | — | — | — | 23.3 | 176 tok/s | 0.6s | |
DeepSeek-V2-Chat | DeepSeek | — | — | — | 9.1 | — | — | |
Kimi K2.6 | Kimi | — | — | — | 53.9 | 135 tok/s | 0.8s | |
Qwen3.6 35B A3B (Reasoning) | Alibaba | — | — | — | 43.5 | 238 tok/s | 1.7s | |
Qwen3.6 35B A3B (Non-reasoning) | Alibaba | — | — | — | 31.5 | 193 tok/s | 1.5s | |
Molmo2-8B | Allen Institute for AI | — | — | — | 7.3 | — | — | |
Grok 4.20 0309 v2 (Reasoning) | xAI | — | — | — | 49.3 | 175 tok/s | 15.5s | |
PALM-2 | — | — | — | 8.6 | — | — | ||
Gemini 2.0 Flash Thinking Experimental (Dec '24) | — | — | — | 12.3 | — | — | ||
Grok 4.20 0309 v2 (Non-reasoning) | xAI | — | — | — | 29 | 177 tok/s | 0.4s | |
Gemini 1.0 Ultra | — | — | — | 10.1 | — | — | ||
LFM2.5-VL-1.6B | Liquid AI | — | — | — | 6.2 | — | — | |
Qwen3.6 Max Preview | Alibaba | — | — | — | 51.8 | 57 tok/s | 1.9s | |
Claude 3 Haiku | Anthropic | — | — | — | 12.3 | 131 tok/s | 0.5s | |
R1 1776 | Perplexity | — | — | — | 12 | — | — | |
Gemini 2.0 Flash-Lite (Preview) | — | — | — | 14.5 | — | — | ||
Solar Pro 3 | Upstage | — | — | — | 25.9 | — | — | |
Codestral | Mistral AI | $0.3 | $0.9 | 32K | — | 180 tok/s | — | |
GPT-4.5 (Preview) | OpenAI | — | — | — | 20 | — | — | |
LFM2.5-1.2B-Instruct | Liquid AI | — | — | — | 8 | — | — | |
GPT-4o mini Realtime (Dec '24) | OpenAI | — | — | — | — | — | — | |
Claude 4.1 Opus (Non-reasoning) | Anthropic | — | — | — | 36 | 39 tok/s | 1.4s | |
GPT-4 | OpenAI | — | — | — | 12.8 | 35 tok/s | 0.8s | |
GPT-4o Realtime (Dec '24) | OpenAI | — | — | — | — | — | — | |
Claude Opus 4.7 (Non-reasoning, High Effort) | Anthropic | — | — | — | 51.8 | 53 tok/s | 1.2s | |
GPT-5.2 Codex (xhigh) | OpenAI | — | — | — | 49 | 107 tok/s | 7.4s | |
LFM2.5-1.2B-Thinking | Liquid AI | — | — | — | 8.1 | — | — | |
MiniMax-M2.7 | MiniMax | — | — | — | 49.6 | 47 tok/s | 1.6s | |
NVIDIA Nemotron 3 Super 120B A12B (Reasoning) | NVIDIA | — | — | — | 36 | 154 tok/s | 1.1s | |
GPT-4o (Aug '24) | OpenAI | — | — | — | 18.6 | 108 tok/s | 0.6s | |
DeepSeek-V2.5 (Dec '24) | DeepSeek | — | — | — | 12.5 | — | — | |
Solar Open 100B (Reasoning) | Upstage | — | — | — | 21.7 | — | — | |
o1-pro | OpenAI | — | — | — | 25.8 | — | — | |
o3-pro | OpenAI | — | — | — | 40.7 | 19 tok/s | 95.4s | |
DeepSeek-V2.5 | DeepSeek | — | — | — | 12.3 | — | — | |
DeepSeek-Coder-V2 | DeepSeek | — | — | — | 10.6 | — | — | |
DeepSeek LLM 67B Chat (V1) | DeepSeek | — | — | — | 8.4 | — | — | |
Gemini 3.1 Pro Preview | — | — | — | 57.2 | 124 tok/s | 28.7s | ||
GPT-3.5 Turbo (0613) | OpenAI | — | — | — | — | — | — | |
LFM2 24B A2B | Liquid AI | — | — | — | 10.5 | 163 tok/s | 0.3s | |
Claude Sonnet 4.6 (Non-reasoning, Low Effort) | Anthropic | — | — | — | 42.6 | 60 tok/s | 1.0s | |
Qwen3.5 35B A3B (Non-reasoning) | Alibaba | — | — | — | 30.7 | 153 tok/s | 1.1s | |
Grok 4.20 0309 (Non-reasoning) | xAI | — | — | — | 29.7 | 164 tok/s | 0.4s | |
Qwen3.5 27B (Reasoning) | Alibaba | — | — | — | 42.1 | 92 tok/s | 1.4s | |
Sonar Reasoning | Perplexity | — | — | — | 17.9 | — | — | |
Qwen3.5 Omni Flash | Alibaba | — | — | — | 25.9 | 170 tok/s | 1.2s | |
Qwen3.5 122B A10B (Non-reasoning) | Alibaba | — | — | — | 35.9 | 152 tok/s | 1.1s | |
Mistral Small 4 (Reasoning) | Mistral | — | — | — | 27.8 | 173 tok/s | 0.5s | |
Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort) | Anthropic | — | — | — | 51.7 | 72 tok/s | 46.6s | |
Grok 4.20 0309 (Reasoning) | xAI | — | — | — | 48.5 | 183 tok/s | 16.1s | |
Qwen3.5 397B A17B (Non-reasoning) | Alibaba | — | — | — | 40.1 | 52 tok/s | 1.4s | |
Grok 3 Reasoning Beta | xAI | — | — | — | 21.6 | — | — | |
Qwen3.6 Plus | Alibaba | — | — | — | 50 | 53 tok/s | 1.6s | |
Solar Mini | Upstage | — | — | — | 11.9 | 87 tok/s | 1.4s | |
GPT-5.4 nano (xhigh) | OpenAI | — | — | — | 44 | 157 tok/s | 2.5s | |
MiniMax-M2.5 | MiniMax | — | — | — | 41.9 | 59 tok/s | 2.1s | |
Qwen3.5 4B (Reasoning) | Alibaba | — | — | — | 27.1 | 177 tok/s | 0.3s | |
Qwen3.5 Omni Plus | Alibaba | — | — | — | 38.6 | 55 tok/s | 1.3s | |
Gemma 4 E2B (Non-reasoning) | — | — | — | 12.1 | — | — | ||
Gemma 4 E4B (Non-reasoning) | — | — | — | 14.8 | — | — | ||
Claude Opus 4.6 (Adaptive Reasoning, Max Effort) | Anthropic | — | — | — | 53 | 53 tok/s | 11.7s | |
Kimi K2.5 (Reasoning) | Kimi | — | — | — | 46.8 | 32 tok/s | 1.3s | |
Sonar Reasoning Pro | Perplexity | — | — | — | 24.6 | — | — | |
Llama 65B | Meta | — | — | — | 7.4 | — | — | |
NVIDIA Nemotron 3 Nano 4B | NVIDIA | — | — | — | 14.7 | — | — | |
Reka Flash (Sep '24) | Reka AI | — | — | — | 12 | 85 tok/s | 1.3s | |
Qwen3 Max Thinking | Alibaba | — | — | — | 39.9 | 36 tok/s | 1.7s | |
Gemma 4 E2B (Reasoning) | — | — | — | 15.2 | — | — | ||
Qwen3.5 122B A10B (Reasoning) | Alibaba | — | — | — | 41.6 | 159 tok/s | 1.1s |
Enter your expected usage to compare costs across models
e.g. 1,000,000 = ~750,000 words
Usually 30–50% of input volume
6 models selected
Prices are approximate and may vary. Check provider documentation for current pricing.