∞AI
ToolsModelsJobsHackathons
SubmitSign In

AI Model Comparison

Compare pricing, benchmarks, and capabilities across 562 AI models

562 models tracked9 open source
AllLanguage ModelsText → ImageText → VideoText → SpeechImage → Video
Type
AllProprietaryOpen Source
Provider
AllAI21 LabsAlibabaAlibaba CloudAllen Institute for AIAmazonAnthropicArcee AIBaiduByteDance SeedCartesiaChina MobileCohereCoquiDatabricksDeep CogitoDeepSeekElevenLabsFish AudioGoogleGoogle DeepMindHume AIIBMInceptionInclusionAIInworldKimiKokoroKorea TelecomKwaiKATLG AI ResearchLMNTLiquid AILongCatMBZUAI Institute of Foundation ModelsMaya ResearchMetaMeta AIMetaVoiceMicrosoftMicrosoft AzureMiniMaxMistralMistral AIMotif TechnologiesMurf AINVIDIANanbeigeNaverNeuphonicNous ResearchOpenAIOpenChatOpenVoicePerplexityPrime IntellectReka AIResemble AISarvamServiceNowSmallest.aiSnowflakeSpeechifyStepFunStyleTTS Swiss AI InitiativeTIITII UAETrillion LabsUpstageXiaomiZ AIZyphraasyncxAI
Price
AnyFree<$1/M<$5/M<$20/M
Sort
Best BenchmarkCheapest FirstMost ExpensiveLargest ContextFastest
ModelProviderInput $/1M↕Output $/1M↕Context↕Intelligence↑Speed↕LatencyAPI
DeepSeek R2
★
DeepSeek$0.55$2.19128K
91%
60 tok/s—
GPT-4.1
★
OpenAI$2$81M
90.5%
80 tok/s—
Claude Opus 4.6
★
Anthropic$15$75200K
88.7%
60 tok/s—
GPT-4o
★
OpenAI$5$15128K
87.2%
120 tok/s—
Claude Sonnet 4.6
★
Anthropic$3$15200K
86.8%
100 tok/s—
Llama 3.3 70B
Open★
Meta AI$0.23$0.92128K
86%
80 tok/s—
o3
OpenAI$10$40200K
96.7%
40 tok/s—
o4-mini
OpenAI$1.1$4.4200K
93.4%
100 tok/s—
Gemini 3 Ultra
Google DeepMind$7$211M
90.1%
70 tok/s—
Claude Opus 4.5 (Reasoning)
Anthropic———
49.7
68 tok/s13.5s
Gemini 3 Pro Preview (low)
Google———
41.3
——
Claude Opus 4.5 (Non-reasoning)
Anthropic———
43.1
53 tok/s1.1s
Gemini 3 Flash Preview (Reasoning)
Google———
46.4
197 tok/s6.1s
DeepSeek V3
Open
DeepSeek$0.27$1.1128K
88.5%
80 tok/s—
Claude 4.1 Opus (Reasoning)
Anthropic———
42
37 tok/s8.2s
Claude 4.5 Sonnet (Reasoning)
Anthropic———
43
56 tok/s11.4s
MiniMax-M2.1
MiniMax———
39.4
74 tok/s1.5s
Grok 3
xAI$3$15131K
87.5%
90 tok/s—
Llama 3.1 405B
Open
Meta AI$3$3128K
87.3%
30 tok/s—
Gemini 3 Pro
Google DeepMind$3.5$10.51M
87%
100 tok/s—
GPT-5.1 (high)
OpenAI———
47.7
121 tok/s33.8s
GPT-5 Codex (high)
OpenAI———
44.6
208 tok/s8.0s
GPT-5 (medium)
OpenAI———
42
83 tok/s50.4s
GPT-5.2 (xhigh)
OpenAI———
51.3
76 tok/s109.3s
Grok 4
xAI———
41.5
60 tok/s7.7s
GPT-5 (high)
OpenAI———
44.6
82 tok/s101.8s
Qwen3-Max
Alibaba Cloud$0.4$1.232K
87%
90 tok/s—
Claude 4 Opus (Reasoning)
Anthropic———
39
39 tok/s7.6s
GPT-5.2 (medium)
OpenAI———
46.6
——
GPT-5.1 Codex (high)
OpenAI———
43.1
170 tok/s6.4s
Gemini 2.5 Pro Preview (Mar' 25)
Google———
30.3
——
DeepSeek V3.2 (Reasoning)
DeepSeek———
41.7
32 tok/s1.4s
DeepSeek V3.2 Speciale
DeepSeek———
29.4
——
GPT-5 (low)
OpenAI———
39.2
79 tok/s10.2s
Gemini 2.5 Pro
Google———
34.6
134 tok/s21.4s
Claude 4 Opus (Non-reasoning)
Anthropic———
33
37 tok/s1.3s
Claude 4.5 Sonnet (Non-reasoning)
Anthropic———
37.1
43 tok/s1.0s
GLM-4.7 (Reasoning)
Z AI———
42.1
107 tok/s0.7s
Doubao Seed Code
ByteDance Seed———
33.5
——
DeepSeek V3.1 (Reasoning)
DeepSeek———
27.7
——
Grok 4 Fast (Reasoning)
xAI———
35.1
214 tok/s2.9s
Qwen3-72B
Open
Alibaba CloudFreeFree32K
85%
100 tok/s—
Kimi K2 Thinking
Kimi———
40.9
50 tok/s1.0s
DeepSeek V3.2 Exp (Reasoning)
DeepSeek———
32.9
33 tok/s1.4s
DeepSeek R1 0528 (May '25)
DeepSeek———
27.1
——
Grok 4.1 Fast (Reasoning)
xAI———
38.6
151 tok/s9.8s
Cogito v2.1 (Reasoning)
Deep Cogito———
85%
61 tok/s0.5s
DeepSeek V3.1 Terminus (Reasoning)
DeepSeek———
33.9
——
Phi-4
Open
Microsoft$0.07$0.1416K
84.8%
300 tok/s—
Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning)
Google———
25.7
——
Claude 3.7 Sonnet (Reasoning)
Anthropic———
34.7
——
GPT-5 mini (high)
OpenAI———
41.2
91 tok/s140.6s
MiMo-V2-Flash (Reasoning)
Xiaomi———
39.2
134 tok/s1.7s
Qwen3 VL 235B A22B (Reasoning)
Alibaba———
27.6
48 tok/s1.3s
Qwen3 Max (Preview)
Alibaba———
26.1
45 tok/s1.8s
DeepSeek V3.1 Terminus (Non-reasoning)
DeepSeek———
28.5
——
Qwen3 235B A22B 2507 (Reasoning)
Alibaba———
29.5
40 tok/s1.4s
o1
OpenAI———
30.8
129 tok/s18.5s
K-EXAONE (Reasoning)
LG AI Research———
32.1
——
Gemini 2.5 Pro Preview (May' 25)
Google———
29.5
——
DeepSeek V3.2 Exp (Non-reasoning)
DeepSeek———
28.4
33 tok/s1.3s
DeepSeek R1 (Jan '25)
DeepSeek———
18.8
——
Mistral Large
Mistral AI$2$6128K
84%
90 tok/s—
Claude 4 Sonnet (Non-reasoning)
Anthropic———
33
47 tok/s0.8s
Gemini 2.5 Flash Preview (Sep '25) (Reasoning)
Google———
31.1
——
Claude 4 Sonnet (Reasoning)
Anthropic———
38.7
51 tok/s9.1s
GLM-4.5 (Reasoning)
Z AI———
26.4
42 tok/s0.9s
DeepSeek V3.2 (Non-reasoning)
DeepSeek———
32.1
32 tok/s1.4s
DeepSeek V3.1 (Non-reasoning)
DeepSeek———
28.1
——
Grok 3 Mini
xAI$0.3$0.5131K
83%
160 tok/s—
ERNIE 5.0 Thinking Preview
Baidu———
29.1
——
GPT-5 mini (medium)
OpenAI———
38.9
83 tok/s18.4s
Nova 2.0 Pro Preview (medium)
Amazon———
35.7
144 tok/s14.5s
Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)
NVIDIA———
15
43 tok/s0.7s
Grok 3 mini Reasoning (high)
xAI———
32.1
217 tok/s0.4s
GLM-4.6 (Reasoning)
Z AI———
32.5
80 tok/s0.7s
Qwen3 235B A22B 2507 Instruct
Alibaba———
25
69 tok/s1.2s
Qwen3 235B A22B (Reasoning)
Alibaba———
19.8
64 tok/s1.2s
Hermes 4 - Llama-3.1 405B (Reasoning)
Nous Research———
18.6
34 tok/s0.7s
Gemini 2.5 Flash (Reasoning)
Google———
27
231 tok/s14.9s
Qwen3 Next 80B A3B Instruct
Alibaba———
20.1
172 tok/s1.1s
Qwen3 Max Thinking (Preview)
Alibaba———
32.5
43 tok/s1.8s
Kimi K2
Kimi———
26.3
34 tok/s1.3s
Qwen3 Next 80B A3B (Reasoning)
Alibaba———
26.7
169 tok/s1.1s
Seed-OSS-36B-Instruct
ByteDance Seed———
25.2
43 tok/s1.6s
Qwen3 VL 32B (Reasoning)
Alibaba———
24.7
95 tok/s1.4s
Qwen3 VL 235B A22B Instruct
Alibaba———
20.8
60 tok/s1.1s
Kimi K2 0905
Kimi———
30.9
24 tok/s6.0s
GLM-4.5-Air
Z AI———
23.2
67 tok/s1.1s
MiniMax M1 80k
MiniMax———
24.4
——
MiniMax-M2
MiniMax———
36.1
68 tok/s2.3s
Magistral Medium 1.2
Mistral———
27.1
99 tok/s0.5s
DeepSeek V3 0324
DeepSeek———
22.3
——
GPT-4o mini
OpenAI$0.15$0.6128K
82%
200 tok/s—
Gemini 3 Flash
Google DeepMind$0.075$0.31M
82%
250 tok/s—
Nova 2.0 Pro Preview (low)
Amazon———
31.9
154 tok/s6.0s
Nova 2.0 Lite (high)
Amazon———
34.5
192 tok/s17.9s
GPT-5.1 Codex mini (high)
OpenAI———
38.6
208 tok/s5.6s
GPT-5 (ChatGPT)
OpenAI———
21.8
154 tok/s0.6s
Ling-1T
InclusionAI———
19
——
INTELLECT-3
Prime Intellect———
22.2
——
EXAONE 4.0 32B (Reasoning)
LG AI Research———
16.7
——
Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning)
Google———
21.6
——
Qwen3 VL 30B A3B (Reasoning)
Alibaba———
19.7
128 tok/s1.0s
gpt-oss-120B (high)
OpenAI———
33.3
212 tok/s0.5s
Nova 2.0 Lite (medium)
Amazon———
29.7
197 tok/s15.3s
Llama Nemotron Super 49B v1.5 (Reasoning)
NVIDIA———
18.7
66 tok/s0.3s
Qwen3 30B A3B 2507 (Reasoning)
Alibaba———
22.4
146 tok/s1.1s
Ring-1T
InclusionAI———
22.8
——
MiniMax M1 40k
MiniMax———
20.9
——
Hermes 4 - Llama-3.1 70B (Reasoning)
Nous Research———
16
74 tok/s0.6s
GPT-5 (minimal)
OpenAI———
23.9
72 tok/s1.2s
Llama 4 Maverick
Meta———
18.4
116 tok/s0.6s
Gemini 2.5 Flash (Non-reasoning)
Google———
20.6
189 tok/s0.5s
Nova 2.0 Omni (medium)
Amazon———
28
——
Mistral Large 3
Mistral———
22.8
56 tok/s0.6s
Gemini 2.0 Pro Experimental (Feb '25)
Google———
18.1
——
Solar Pro 2 (Reasoning)
Upstage———
14.9
——
KAT-Coder-Pro V1
KwaiKAT———
36
119 tok/s0.9s
K-EXAONE (Non-reasoning)
LG AI Research———
23.4
——
Mi:dm K 2.5 Pro Preview
Korea Telecom———
81%
——
Mi:dm K 2.5 Pro
Korea Telecom———
23.1
——
GPT-5.2 (Non-reasoning)
OpenAI———
33.6
63 tok/s0.6s
Gemini 2.5 Flash Preview (Reasoning)
Google———
24.3
——
Motif-2-12.7B-Reasoning
Motif Technologies———
19.1
——
Claude 3.7 Sonnet (Non-reasoning)
Anthropic———
30.8
——
Gemini 2.0 Flash Thinking Experimental (Jan '25)
Google———
19.6
——
o3-mini (high)
OpenAI———
25.2
156 tok/s26.1s
Claude 4.5 Haiku (Non-reasoning)
Anthropic———
31.1
100 tok/s0.5s
GPT-4o (March 2025, chatgpt-4o-latest)
OpenAI———
18.6
——
GLM-4.6V (Reasoning)
Z AI———
23.4
29 tok/s1.1s
Gemini 2.5 Flash-Lite Preview (Sep '25) (Non-reasoning)
Google———
19.4
——
Qwen3 32B (Reasoning)
Alibaba———
16.5
105 tok/s1.1s
GPT-5.1 (Non-reasoning)
OpenAI———
27.4
120 tok/s0.8s
DeepSeek R1 Distill Llama 70B
DeepSeek———
16
43 tok/s0.5s
Nova 2.0 Omni (low)
Amazon———
23.2
——
Llama 3.3 Nemotron Super 49B v1 (Reasoning)
NVIDIA———
18.5
——
Qwen3 Omni 30B A3B (Reasoning)
Alibaba———
15.6
92 tok/s1.0s
NVIDIA Nemotron 3 Nano 30B A3B (Reasoning)
NVIDIA———
24.3
162 tok/s1.6s
Apriel-v1.6-15B-Thinker
ServiceNow———
27.6
——
GLM-4.7 (Non-reasoning)
Z AI———
34.2
96 tok/s0.7s
HyperCLOVA X SEED Think (32B)
Naver———
23.7
——
K2-V2 (high)
MBZUAI Institute of Foundation Models———
20.6
——
GLM-4.5V (Reasoning)
Z AI———
15.1
48 tok/s0.8s
o3-mini
OpenAI———
25.9
167 tok/s8.5s
Grok Code Fast 1
xAI———
28.7
189 tok/s3.9s
Nova 2.0 Lite (low)
Amazon———
24.6
206 tok/s4.8s
Qwen3 Coder 480B A35B Instruct
Alibaba———
24.8
62 tok/s1.6s
Ring-flash-2.0
InclusionAI———
14
84 tok/s1.3s
Qwen3 VL 32B Instruct
Alibaba———
17.2
78 tok/s1.3s
Command R+
Cohere$2.5$10128K
78%
80 tok/s—
ERNIE 4.5 300B A47B
Baidu———
15
29 tok/s1.8s
Ling-flash-2.0
InclusionAI———
15.7
99 tok/s1.4s
GPT-4.1 mini
OpenAI———
22.9
99 tok/s0.5s
GPT-5 nano (high)
OpenAI———
26.8
150 tok/s86.4s
GPT-5 mini (minimal)
OpenAI———
20.7
78 tok/s1.0s
Gemini 2.0 Flash (experimental)
Google———
16.8
——
Gemini 2.0 Flash (Feb '25)
Google———
18.5
——
Gemini 2.5 Flash Preview (Non-reasoning)
Google———
17.8
——
GLM-4.6 (Non-reasoning)
Z AI———
30.2
88 tok/s0.9s
gpt-oss-120B (low)
OpenAI———
24.5
210 tok/s0.5s
Qwen3 30B A3B 2507 Instruct
Alibaba———
15
109 tok/s1.1s
Qwen3 30B A3B (Reasoning)
Alibaba———
15.3
70 tok/s1.1s
GPT-4o (ChatGPT)
OpenAI———
14.1
——
Solar Pro 2 (Preview) (Reasoning)
Upstage———
18.8
——
Qwen3 14B (Reasoning)
Alibaba———
16.2
64 tok/s1.2s
EXAONE 4.0 32B (Non-reasoning)
LG AI Research———
11.7
——
Apriel-v1.5-15B-Thinker
ServiceNow———
28.3
——
Magistral Small 1.2
Mistral———
18.2
176 tok/s0.4s
Nova 2.0 Pro Preview (Non-reasoning)
Amazon———
23.1
184 tok/s0.7s
GPT-5 nano (medium)
OpenAI———
25.9
154 tok/s39.1s
Claude 3.5 Sonnet (Oct '24)
Anthropic———
15.9
——
K2-V2 (medium)
MBZUAI Institute of Foundation Models———
18.7
——
QwQ 32B
Alibaba———
19.7
33 tok/s0.4s
Devstral 2
Mistral———
22
77 tok/s0.7s
Mistral Medium 3
Mistral———
18.8
54 tok/s0.4s
Sonar Pro
Perplexity———
15.2
——
Olmo 3.1 32B Think
Allen Institute for AI———
13.9
——
Claude 4.5 Haiku (Reasoning)
Anthropic———
37.1
145 tok/s14.2s
Olmo 3 32B Think
Allen Institute for AI———
12.1
——
Gemini 2.5 Flash-Lite (Reasoning)
Google———
17.6
274 tok/s17.2s
Qwen3 235B A22B (Non-reasoning)
Alibaba———
17
65 tok/s1.2s
Qwen2.5 Max
Alibaba———
16.3
49 tok/s1.2s
NVIDIA Nemotron Nano 12B v2 VL (Reasoning)
NVIDIA———
14.9
152 tok/s0.6s
Qwen3 VL 30B A3B Instruct
Alibaba———
16.1
122 tok/s1.1s
Claude Haiku 4.5
Anthropic$0.8$4200K
75.2%
250 tok/s—
Gemini 1.5 Pro (Sep '24)
Google———
16
——
Claude 3.5 Sonnet (June '24)
Anthropic———
14.2
——
GLM-4.5V (Non-reasoning)
Z AI———
12.7
50 tok/s30.9s
Gemma 3 27B
Open
Google DeepMindFreeFree128K
75%
120 tok/s—
Magistral Small 1
Mistral———
16.8
——
Solar Pro 2 (Non-reasoning)
Upstage———
13.6
——
Magistral Medium 1
Mistral———
18.8
——
gpt-oss-20B (high)
OpenAI———
24.5
276 tok/s0.3s
GLM-4.6V (Non-reasoning)
Z AI———
17.1
23 tok/s4.1s
Llama 4 Scout
Meta———
13.5
128 tok/s0.5s
Qwen3 VL 8B (Reasoning)
Alibaba———
16.7
130 tok/s1.1s
Nova 2.0 Lite (Non-reasoning)
Amazon———
18
173 tok/s0.8s
DeepSeek R1 0528 Qwen3 8B
DeepSeek———
16.4
——
Grok 4.1 Fast (Non-reasoning)
xAI———
23.6
148 tok/s0.4s
NVIDIA Nemotron Nano 9B V2 (Reasoning)
NVIDIA———
14.8
109 tok/s0.3s
MiMo-V2-Flash (Non-reasoning)
Xiaomi———
30.4
138 tok/s1.5s
Qwen3 8B (Reasoning)
Alibaba———
13.2
83 tok/s1.0s
Qwen3 4B 2507 (Reasoning)
Alibaba———
18.2
——
o1-mini
OpenAI———
20.4
——
NVIDIA Nemotron Nano 9B V2 (Non-reasoning)
NVIDIA———
13.2
138 tok/s0.7s
GPT-4o (May '24)
OpenAI———
14.5
112 tok/s0.6s
DeepSeek R1 Distill Qwen 14B
DeepSeek———
15.8
——
DeepSeek R1 Distill Qwen 32B
DeepSeek———
17.2
43 tok/s0.4s
DBRX
Open
Databricks$0.75$2.2533K
73.7%
100 tok/s—
Solar Pro 2 (Preview) (Non-reasoning)
Upstage———
16
——
Qwen3 Omni 30B A3B Instruct
Alibaba———
10.7
105 tok/s1.1s
Nova Premier
Amazon———
19
62 tok/s1.1s
Qwen3 32B (Non-reasoning)
Alibaba———
14.5
102 tok/s1.2s
Hermes 4 - Llama-3.1 405B (Non-reasoning)
Nous Research———
17.6
33 tok/s0.8s
Llama 3.1 Instruct 405B
Meta———
17.4
31 tok/s0.7s
Falcon-H1R-7B
TII UAE———
15.8
——
Grok 4 Fast (Non-reasoning)
xAI———
23.1
204 tok/s0.3s
Llama 3.2 11B Vision
Open
Meta AI$0.18$0.18128K
73%
150 tok/s—
gpt-oss-20B (low)
OpenAI———
20.8
263 tok/s0.4s
Llama 3.1 Tulu3 405B
Allen Institute for AI———
14.1
——
Qwen2.5 Instruct 72B
Alibaba———
15.6
55 tok/s1.2s
Gemini 2.5 Flash-Lite (Non-reasoning)
Google———
12.7
279 tok/s0.6s
Gemini 2.0 Flash-Lite (Feb '25)
Google———
14.7
——
Command R
Cohere$0.15$0.6128K
72%
150 tok/s—
Mistral Small
Mistral AI$0.1$0.332K
72%
200 tok/s—
Nova 2.0 Omni (Non-reasoning)
Amazon———
16.6
223 tok/s0.9s
Gemini 3.1 Flash-Lite
Google DeepMind$0.01$0.041M
72%
500 tok/s—
Command A
Cohere———
13.5
42 tok/s0.5s
Qwen3 Coder 30B A3B Instruct
Alibaba———
20
112 tok/s1.5s
Llama 3.3 Instruct 70B
Meta———
14.5
97 tok/s0.6s
Grok 2 (Dec '24)
xAI———
13.9
——
Devstral Medium
Mistral———
18.7
139 tok/s0.5s
Qwen3 30B A3B (Non-reasoning)
Alibaba———
12.5
70 tok/s1.2s
K2-V2 (low)
MBZUAI Institute of Foundation Models———
14.4
——
Falcon 180B
Open
TIIFreeFree4K
70.4%
20 tok/s—
Qwen3 VL 4B (Reasoning)
Alibaba———
13.7
——
Llama 3.3 Nemotron Super 49B v1 (Non-reasoning)
NVIDIA———
14.3
——
Qwen3 4B (Reasoning)
Alibaba———
14.2
103 tok/s1.0s
Mistral Large 2 (Nov '24)
Mistral———
15.1
38 tok/s0.5s
Pixtral Large
Mistral———
14
52 tok/s0.5s
Grok Beta
xAI———
13.3
——
Qwen2.5 Instruct 32B
Alibaba———
13.2
——
Claude 3 Opus
Anthropic———
18
——
Sarvam M (Reasoning)
Sarvam———
8.4
——
Qwen3 VL 8B Instruct
Alibaba———
14.3
145 tok/s1.0s
GPT-4 Turbo
OpenAI———
13.7
34 tok/s0.9s
Ministral 3 14B
Mistral———
16
133 tok/s0.3s
Nova Pro
Amazon———
13.5
——
Llama 3.1 Nemotron Instruct 70B
NVIDIA———
13.4
43 tok/s0.4s
Sonar
Perplexity———
15.5
——
Llama Nemotron Super 49B v1.5 (Non-reasoning)
NVIDIA———
14.6
67 tok/s0.3s
Devstral Small 2
Mistral———
19.5
77 tok/s0.5s
Mistral Medium 3.1
Mistral———
21.3
82 tok/s0.4s
Gemini 1.5 Flash (Sep '24)
Google———
13.8
——
Qwen3 14B (Non-reasoning)
Alibaba———
12.8
66 tok/s1.1s
Mistral Small 3.2
Mistral———
15.1
166 tok/s0.4s
Llama 3.1 Instruct 70B
Meta———
12.5
31 tok/s0.7s
Mistral Large 2 (Jul '24)
Mistral———
13
——
Llama 3.2 Instruct 90B (Vision)
Meta———
11.9
42 tok/s0.5s
Ling-mini-2.0
InclusionAI———
9.2
——
Reka Flash 3
Reka AI———
9.5
96 tok/s1.1s
Qwen3 4B 2507 Instruct
Alibaba———
12.9
——
Hermes 4 - Llama-3.1 70B (Non-reasoning)
Nous Research———
12.6
71 tok/s0.6s
Mistral Small 3.1
Mistral———
14.5
148 tok/s0.4s
GPT-4.1 nano
OpenAI———
13
195 tok/s0.4s
Gemini 1.5 Pro (May '24)
Google———
12
——
Olmo 3 7B Think
Allen Institute for AI———
9.4
——
QwQ 32B-Preview
Alibaba———
15.2
44 tok/s0.5s
NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning)
NVIDIA———
10.1
174 tok/s0.7s
Mistral Small 3
Mistral———
12.7
152 tok/s0.5s
Ministral 3 8B
Mistral———
14.8
189 tok/s0.3s
Qwen3 8B (Non-reasoning)
Alibaba———
10.6
89 tok/s1.0s
Qwen2.5 Coder Instruct 32B
Alibaba———
12.9
——
Qwen3 VL 4B Instruct
Alibaba———
9.6
——
Claude 3.5 Haiku
Anthropic———
18.7
——
Devstral Small (May '25)
Mistral———
18
——
Qwen2.5 Turbo
Alibaba———
12
68 tok/s1.2s
Devstral Small (Jul '25)
Mistral———
15.2
200 tok/s0.4s
Qwen2 Instruct 72B
Alibaba———
11.7
——
Granite 4.0 H Small
IBM———
10.8
416 tok/s8.7s
Mistral Saba
Mistral———
12.1
——
Gemma 3 12B Instruct
Google———
8.8
31 tok/s24.2s
Qwen3 4B (Non-reasoning)
Alibaba———
12.5
103 tok/s1.1s
Kimi Linear 48B A3B Instruct
Kimi———
14.4
——
Exaone 4.0 1.2B (Reasoning)
LG AI Research———
8.3
——
Nova Lite
Amazon———
12.7
228 tok/s0.7s
NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning)
NVIDIA———
13.2
75 tok/s0.3s
DeepHermes 3 - Mistral 24B Preview (Non-reasoning)
Nous Research———
10.9
——
Claude 3 Sonnet
Anthropic———
10.3
——
Jamba Reasoning 3B
AI21 Labs———
9.6
——
Jamba 1.7 Large
AI21 Labs———
10.9
52 tok/s1.0s
Gemini 1.5 Flash-8B
Google———
11.1
——
Hermes 3 - Llama-3.1 70B
Nous Research———
10.6
28 tok/s0.4s
Jamba 1.5 Large
AI21 Labs———
10.7
——
Qwen3 1.7B (Reasoning)
Alibaba———
8
140 tok/s1.1s
Gemini 1.5 Flash (May '24)
Google———
10.5
——
Llama 3 Instruct 70B
Meta———
8.9
39 tok/s0.7s
Jamba 1.6 Large
AI21 Labs———
10.6
53 tok/s1.0s
Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning)
NVIDIA———
14.4
——
GPT-5 nano (minimal)
OpenAI———
13.8
145 tok/s1.0s
Mixtral 8x22B Instruct
Mistral———
9.8
——
DeepSeek R1 Distill Llama 8B
DeepSeek———
12.1
——
Nova Micro
Amazon———
10.3
328 tok/s0.6s
Ministral 3 3B
Mistral———
11.2
294 tok/s0.3s
Olmo 3 7B Instruct
Allen Institute for AI———
8.2
——
OLMo 2 32B
Allen Institute for AI———
10.6
——
LFM2 8B A1B
Liquid AI———
7
——
Exaone 4.0 1.2B (Non-reasoning)
LG AI Research———
8.1
——
Claude 2.1
Anthropic———
9.3
——
Mistral Medium
Mistral———
9
75 tok/s0.4s
Claude 2.0
Anthropic———
9.1
——
Phi-4 Multimodal Instruct
Microsoft Azure———
10
15 tok/s0.2s
Gemma 3n E4B Instruct
Google———
6.4
14 tok/s0.3s
Llama 3.1 Instruct 8B
Meta———
11.8
159 tok/s0.4s
Gemma 3n E4B Instruct Preview (May '25)
Google———
10.1
——
Granite 3.3 8B (Non-reasoning)
IBM———
7
375 tok/s20.3s
Qwen2.5 Coder Instruct 7B
Alibaba———
10
——
Phi-4 Mini Instruct
Microsoft Azure———
8.4
44 tok/s0.7s
Llama 3.2 Instruct 11B (Vision)
Meta———
8.7
77 tok/s0.5s
GPT-3.5 Turbo
OpenAI———
9
107 tok/s0.5s
Granite 4.0 Micro
IBM———
7.7
——
Phi-3 Mini Instruct 3.8B
Microsoft Azure———
10.1
——
Command-R+ (Apr '24)
Cohere———
8.3
——
Gemini 1.0 Pro
Google———
8.5
——
LFM 40B
Liquid AI———
8.8
——
Claude Instant
Anthropic———
7.4
——
DeepSeek Coder V2 Lite Instruct
DeepSeek———
8.5
——
Mistral Small (Feb '24)
Mistral———
9
146 tok/s0.4s
Gemma 3 4B Instruct
Google———
6.3
33 tok/s1.1s
Qwen3 1.7B (Non-reasoning)
Alibaba———
6.8
141 tok/s0.9s
Llama 3 Instruct 8B
Meta———
6.4
83 tok/s0.5s
Llama 2 Chat 70B
Meta———
8.4
——
Llama 2 Chat 13B
Meta———
8.4
——
Jamba 1.7 Mini
AI21 Labs———
8.1
——
Mixtral 8x7B Instruct
Mistral———
7.7
——
Gemma 3n E2B Instruct
Google———
4.8
52 tok/s0.4s
Jamba 1.5 Mini
AI21 Labs———
8
——
Jamba 1.6 Mini
AI21 Labs———
7.9
186 tok/s0.8s
DeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning)
Nous Research———
7.6
——
Molmo 7B-D
Allen Institute for AI———
9.2
——
Llama 3.2 Instruct 3B
Meta———
9.7
54 tok/s0.6s
Qwen3 0.6B (Reasoning)
Alibaba———
6.5
185 tok/s1.0s
Command-R (Mar '24)
Cohere———
7.4
——
Granite 4.0 1B
IBM———
7.3
——
OpenChat 3.5 (1210)
OpenChat———
8.3
——
LFM2 2.6B
Liquid AI———
8
——
OLMo 2 7B
Allen Institute for AI———
9.3
——
Granite 4.0 H 1B
IBM———
8
——
DeepSeek R1 Distill Qwen 1.5B
DeepSeek———
9.1
——
LFM2 1.2B
Liquid AI———
6.3
——
Mistral 7B Instruct
Mistral———
7.4
193 tok/s0.3s
Qwen3 0.6B (Non-reasoning)
Alibaba———
5.7
190 tok/s0.9s
Llama 3.2 Instruct 1B
Meta———
6.3
87 tok/s0.7s
Llama 2 Chat 7B
Meta———
9.7
118 tok/s1.5s
Gemma 3 1B Instruct
Google———
5.5
51 tok/s0.7s
Granite 4.0 H 350M
IBM———
5.4
——
Granite 4.0 350M
IBM———
6.1
——
Gemma 3 270M
Google———
7.7
——
Standard
Google——————
Qwen3.5 Omni Flash
Alibaba——————
Octave 2
Hume AI——————
Nemotron Cascade 2 30B A3B
NVIDIA———
28.4
——
Kimi K2.5 (Non-reasoning)
Kimi———
37.3
31 tok/s1.3s
Mercury 2
Inception———
32.8
877 tok/s4.4s
Molmo2-8B
Allen Institute for AI———
7.3
——
MiMo-V2-Pro
Xiaomi———
49.2
71 tok/s1.9s
MiMo-V2-Omni-0327
Xiaomi———
44.9
——
Sarvam 105B (high)
Sarvam———
18.2
100 tok/s1.3s
MiMo-V2-Omni
Xiaomi———
43.4
——
MiMo-V2-Flash (Feb 2026)
Xiaomi———
41.5
133 tok/s1.3s
Neural2
Google——————
Sarvam 30B (high)
Sarvam———
12.3
272 tok/s1.2s
KAT Coder Pro V2
KwaiKAT———
43.8
115 tok/s1.8s
o1-preview
OpenAI———
23.7
——
Olmo 3.1 32B Instruct
Allen Institute for AI———
12.2
52 tok/s0.3s
K2 Think V2
MBZUAI Institute of Foundation Models———
24.1
——
LongCat Flash Lite
LongCat———
23.9
146 tok/s6.0s
Tri-21B-Think
Trillion Labs———
18.6
——
Tri-21B-think Preview
Trillion Labs———
20
——
Apertus 8B Instruct
Swiss AI Initiative———
5.9
——
Nanbeige4.1-3B
Nanbeige———
16.1
——
Apertus 70B Instruct
Swiss AI Initiative———
7.7
——
Trinity Large Thinking
Arcee AI———
31.9
126 tok/s0.6s
GLM-5 (Reasoning)
Z AI———
49.8
72 tok/s0.9s
GLM 5V Turbo (Reasoning)
Z AI———
42.9
——
GLM-5.1 (Reasoning)
Z AI———
51.4
46 tok/s1.0s
Step 3.5 Flash 2603
StepFun———
38.5
188 tok/s0.9s
GLM-5-Turbo
Z AI———
46.8
——
GLM-5 (Non-reasoning)
Z AI———
40.6
55 tok/s1.4s
Tiny Aya Global
Cohere———
4.7
——
Qwen3.5 2B (Non-reasoning)
Alibaba———
14.7
241 tok/s0.3s
Qwen3.5 397B A17B (Reasoning)
Alibaba———
45
52 tok/s1.5s
Qwen3.5 4B (Non-reasoning)
Alibaba———
22.6
189 tok/s0.3s
Qwen3.5 0.8B (Reasoning)
Alibaba———
10.5
——
Qwen3.5 0.8B (Non-reasoning)
Alibaba———
9.9
283 tok/s0.3s
Step3 VL 10B
StepFun———
15.4
——
Qwen3.5 9B (Reasoning)
Alibaba———
32.4
125 tok/s0.3s
Qwen3.6 Plus
Alibaba———
50
52 tok/s1.5s
Qwen3.5 4B (Reasoning)
Alibaba———
27.1
186 tok/s0.3s
Qwen3.5 27B (Non-reasoning)
Alibaba———
37.2
89 tok/s1.4s
Qwen3.5 Omni Flash
Alibaba———
25.9
170 tok/s1.0s
Qwen3.5 27B (Reasoning)
Alibaba———
42.1
88 tok/s1.4s
Qwen3.5 122B A10B (Reasoning)
Alibaba———
41.6
162 tok/s1.2s
Qwen3.5 122B A10B (Non-reasoning)
Alibaba———
35.9
157 tok/s1.2s
Qwen3.5 Omni Plus
Alibaba———
38.6
51 tok/s1.3s
Qwen3 Coder Next
Alibaba———
28.3
152 tok/s0.8s
Kimi K2.5 (Reasoning)
Kimi———
46.8
33 tok/s1.2s
Qwen3.5 2B (Reasoning)
Alibaba———
16.3
——
Qwen3.5 35B A3B (Non-reasoning)
Alibaba———
30.7
142 tok/s1.1s
Qwen3.5 35B A3B (Reasoning)
Alibaba———
37.1
145 tok/s1.1s
Qwen3.5 397B A17B (Non-reasoning)
Alibaba———
40.1
53 tok/s1.5s
Qwen3 Max Thinking
Alibaba———
39.9
34 tok/s1.8s
Step 3.5 Flash
StepFun———
37.8
169 tok/s0.8s
Llama 65B
Meta———
7.4
——
NVIDIA Nemotron 3 Nano 4B
NVIDIA———
14.7
——
GPT-3.5 Turbo (0613)
OpenAI——————
o3-pro
OpenAI———
40.7
19 tok/s106.9s
GPT-5.2 Codex (xhigh)
OpenAI———
49
110 tok/s9.2s
Gemini 3.1 Flash TTS
Google——————
GPT-4o (Aug '24)
OpenAI———
18.6
108 tok/s0.5s
GPT-5.4 mini (medium)
OpenAI———
37.7
177 tok/s7.4s
NVIDIA Nemotron 3 Super 120B A12B (Reasoning)
NVIDIA———
36
155 tok/s1.2s
DeepSeek-V2.5
DeepSeek———
12.3
——
o1-pro
OpenAI———
25.8
——
Solar Open 100B (Reasoning)
Upstage———
21.7
——
LFM2.5-VL-1.6B
Liquid AI———
6.2
——
GPT-4.5 (Preview)
OpenAI———
20
——
Solar Pro 3
Upstage———
25.9
——
GPT-4o Realtime (Dec '24)
OpenAI——————
MiniMax-M2.7
MiniMax———
49.6
49 tok/s1.7s
LFM2 24B A2B
Liquid AI———
10.5
148 tok/s0.3s
GPT-4o mini Realtime (Dec '24)
OpenAI——————
GPT-5.4 nano (Non-Reasoning)
OpenAI———
24.4
154 tok/s0.5s
LFM2.5-1.2B-Thinking
Liquid AI———
8.1
——
Gemini 2.0 Flash-Lite (Preview)
Google———
14.5
——
Fish Audio S2 Pro
Fish Audio——————
LFM2.5-1.2B-Instruct
Liquid AI———
8
——
GPT-4
OpenAI———
12.8
37 tok/s0.8s
Gemini 2.0 Flash Thinking Experimental (Dec '24)
Google———
12.3
——
Gemini 1.0 Ultra
Google———
10.1
——
PALM-2
Google———
8.6
——
Claude 3 Haiku
Anthropic———
12.3
132 tok/s0.5s
Claude 4.1 Opus (Non-reasoning)
Anthropic———
36
36 tok/s1.4s
Grok 4.20 0309 v2 (Non-reasoning)
xAI———
29
162 tok/s0.4s
Grok 4.20 0309 v2 (Reasoning)
xAI———
49.3
225 tok/s14.9s
R1 1776
Perplexity———
12
——
Codestral
Mistral AI$0.3$0.932K—180 tok/s—
DeepSeek-V2.5 (Dec '24)
DeepSeek———
12.5
——
DeepSeek-Coder-V2
DeepSeek———
10.6
——
DeepSeek LLM 67B Chat (V1)
DeepSeek———
8.4
——
Gemini 3.1 Pro Preview
Google———
57.2
130 tok/s24.6s
Gemini 3.1 Flash-Lite Preview
Google———
33.5
338 tok/s5.3s
Sonar Reasoning
Perplexity———
17.9
——
Sonar Reasoning Pro
Perplexity———
24.6
——
Grok 3 Reasoning Beta
xAI———
21.6
——
Grok 4.20 0309 (Reasoning)
xAI———
48.5
215 tok/s18.3s
Grok 4.20 0309 (Non-reasoning)
xAI———
29.7
172 tok/s0.4s
Magpie-Multilingual 357M (Feb 2026)
NVIDIA——————
Solar Mini
Upstage———
11.9
92 tok/s1.5s
MiniMax-M2.5
MiniMax———
41.9
68 tok/s1.8s
Mistral Small 4 (Reasoning)
Mistral———
27.8
175 tok/s0.5s
Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort)
Anthropic———
51.7
71 tok/s54.0s
Claude Sonnet 4.6 (Non-reasoning, Low Effort)
Anthropic———
42.6
53 tok/s1.0s
Reka Flash (Sep '24)
Reka AI———
12
86 tok/s1.3s
Claude Opus 4.6 (Adaptive Reasoning, Max Effort)
Anthropic———
53
57 tok/s12.3s
Gemma 4 E4B (Reasoning)
Google———
18.8
——
Gemma 4 E4B (Non-reasoning)
Google———
14.8
——
Gemma 4 E2B (Reasoning)
Google———
15.2
——
Magpie Multilingual
NVIDIA——————
GLM-4.7-Flash (Reasoning)
Z AI———
30.1
88 tok/s0.9s
Gemma 4 E2B (Non-reasoning)
Google———
12.1
——
GLM-4.7-Flash (Non-reasoning)
Z AI———
22.1
139 tok/s1.3s
Grok-1
xAI———
11.7
——
Qwen1.5 Chat 110B
Alibaba———
9.5
——
Gemma 4 26B A4B (Non-reasoning)
Google———
27.1
——
Gemini 3 Deep Think
Google——————
Gemma 4 31B (Non-reasoning)
Google———
32.3
——
Gemma 4 31B (Reasoning)
Google———
39.2
35 tok/s1.0s
Muse Spark
Meta———
52.1
——
GPT-5.4 (Non-reasoning)
OpenAI———
35.4
61 tok/s0.7s
Arctic Instruct
Snowflake———
8.8
——
Qwen Chat 72B
Alibaba———
8.8
——
GPT-5.3 Codex (xhigh)
OpenAI———
53.6
90 tok/s71.3s
Gemini 2.5 Flash Lite TTS
Google——————
Gemini 2.5 Flash TTS (Dec 2025)
Google——————
GPT-5.4 nano (medium)
OpenAI———
38.1
161 tok/s3.1s
Inworld TTS 1.5 Max
Inworld——————
Eleven v3
ElevenLabs——————
Inworld TTS 1 Max
Inworld——————
Speech 2.8 Turbo
MiniMax——————
Step TTS 2 (Mar 2026)
StepFun——————
Speech 2.6 HD
MiniMax——————
Speech 2.6 Turbo
MiniMax——————
Inworld TTS 1
Inworld——————
Speech-02-HD
MiniMax——————
Azure HD 2.5
Microsoft Azure——————
Multilingual v2
ElevenLabs——————
Step Audio EditX (Mar 2026)
StepFun——————
Speech-02-Turbo
MiniMax——————
TTS-1
OpenAI——————
TTS-1 HD
OpenAI——————
Turbo v2.5
ElevenLabs——————
Flash v2.5
ElevenLabs——————
Sonic 3
Cartesia——————
OpenAudio S1
Fish Audio——————
SIMBA 1.6
Speechify——————
Studio
Google——————
T2A-01-HD
MiniMax——————
Kokoro 82M v1.0
Kokoro——————
Voxtral TTS
Mistral——————
Polly Generative
Amazon——————
AsyncFlow V2, async
async——————
Azure Neural
Microsoft Azure——————
Maya1
Maya Research——————
Inworld TTS 1.5 Mini
Inworld——————
Polly Long-Form
Amazon——————
Chatterbox HD
Resemble AI——————
Journey
Google——————
SIMBA 1.0
Speechify——————
MiMo-V2-TTS
Xiaomi——————
Gemini 2.5 Pro (Dec 2025)
Google——————
T2A-01-Turbo
MiniMax——————
Lightning v3.1
Smallest.ai——————
Octave TTS
Hume AI——————
Fish Speech 1.5
Fish Audio——————
MAI-Voice-1
Microsoft Azure——————
Chatterbox
Resemble AI——————
Magpie-Multilingual 357M
NVIDIA——————
Zonos-v0.1
Zyphra——————
LMNT
LMNT——————
VibeVoice 1.5B
Microsoft Azure——————
VibeVoice 7B
Microsoft Azure——————
Murf Speech Gen 2
Murf AI——————
OpenVoice v2
OpenVoice——————
Neuphonic TTS
Neuphonic——————
Qwen3 TTS Flash
Alibaba——————
Qwen3 TTS
Alibaba——————
XTTS v2
Coqui——————
StyleTTS 2
StyleTTS ——————
WaveNet
Google——————
Polly Neural
Amazon——————
Claude Opus 4.7 (Adaptive Reasoning, Max Effort)
Anthropic———
57.3
53 tok/s7.5s
Sonic English (Oct 2024)
Cartesia——————
Qwen3.5 9B (Non-reasoning)
Alibaba———
27.3
141 tok/s0.3s
GPT-5.4 mini (Non-Reasoning)
OpenAI———
23.3
164 tok/s0.5s
Chirp 3: HD
Google——————
Falcon (Beta)
Murf AI——————
Polly Standard
Amazon——————
JT-MINI
China Mobile———
25.4
——
GPT-5.4 Pro (xhigh)
OpenAI——————
Gemma 4 26B A4B (Reasoning)
Google———
31.2
——
Mistral Small 4 (Non-reasoning)
Mistral———
18.6
147 tok/s0.4s
GPT-5.4 nano (xhigh)
OpenAI———
44
163 tok/s2.8s
Qwen Chat 14B
Alibaba———
7.4
——
GPT-5.4 (xhigh)
OpenAI———
56.8
85 tok/s168.3s
GLM-5.1 (Non-reasoning)
Z AI———
43.8
48 tok/s1.3s
MetaVoice v1
MetaVoice——————
GPT-5.4 mini (xhigh)
OpenAI———
48.9
188 tok/s7.6s
DeepSeek-V2-Chat
DeepSeek———
9.1
——
Qwen3.6 35B A3B (Reasoning)
Alibaba———
43.5
239 tok/s1.7s
Speech 2.8 HD
MiniMax——————
∞AI

Everything AI. In one place.

Platform

ToolsModelsJobsHackathonsSubmit

Company

AboutContact

Stay updated

Get weekly AI news in your inbox

© 2026 ∞AI. Built for the AI community.everythingai.tech

Estimate Your Monthly Cost

Enter your expected usage to compare costs across models

e.g. 1,000,000 = ~750,000 words

Usually 30–50% of input volume

6 models selected

ModelInput CostOutput CostTotal/Monthvs Cheapest
Llama 3.3 70B
Meta AI
$0.23$0.46$0.69✓ Best value
DeepSeek R2
DeepSeek
$0.55$1.09$1.652.4× more
GPT-4.1
OpenAI
$2.00$4.00$6.008.7× more
Claude Sonnet 4.6
Anthropic
$3.00$7.50$10.5015.2× more
GPT-4o
OpenAI
$5.00$7.50$12.5018.1× more
Claude Opus 4.6
Anthropic
$15.00$37.50$52.5076.1× more

Prices are approximate and may vary. Check provider documentation for current pricing.