∞AI
ToolsModelsJobsHackathons
SubmitSign In

AI Model Comparison

Compare pricing, benchmarks, and capabilities across 559 AI models

559 models tracked0 open source
AllLanguage ModelsText → ImageText → VideoText → SpeechImage → Video
Type
AllProprietaryOpen Source
Provider
AllAI21 LabsAlibabaAlibaba CloudAllen Institute for AIAmazonAnthropicArcee AIBaiduByteDance SeedCartesiaChina MobileCohereCoquiDatabricksDeep CogitoDeepSeekElevenLabsFish AudioGoogleGoogle DeepMindHume AIIBMInceptionInclusionAIInworldKimiKokoroKorea TelecomKwaiKATLG AI ResearchLMNTLiquid AILongCatMBZUAI Institute of Foundation ModelsMaya ResearchMetaMeta AIMetaVoiceMicrosoftMicrosoft AzureMiniMaxMistralMistral AIMotif TechnologiesMurf AINVIDIANanbeigeNaverNeuphonicNous ResearchOpenAIOpenChatOpenVoicePerplexityPrime IntellectReka AIResemble AIRimeSarvamServiceNowSmallest.aiSnowflakeSpeechifyStepFunStyleTTS Swiss AI InitiativeTIITII UAETrillion LabsUpstageXiaomiZ AIZyphraasyncxAI
Price
AnyFree<$1/M<$5/M<$20/M
Sort
Best BenchmarkCheapest FirstMost ExpensiveLargest ContextFastest
Clear all filters
ModelProviderInput $/1M↕Output $/1M↕Context↕Intelligence↑Speed↕LatencyAPI
DeepSeek R2
★
DeepSeek$0.55$2.19128K
91%
60 tok/s—
GPT-4.1
★
OpenAI$2$81M
90.5%
80 tok/s—
Claude Opus 4.6
★
Anthropic$15$75200K
88.7%
60 tok/s—
GPT-4o
★
OpenAI$5$15128K
87.2%
120 tok/s—
Claude Sonnet 4.6
★
Anthropic$3$15200K
86.8%
100 tok/s—
o3
OpenAI$10$40200K
96.7%
40 tok/s—
o4-mini
OpenAI$1.1$4.4200K
93.4%
100 tok/s—
Gemini 3 Ultra
Google DeepMind$7$211M
90.1%
70 tok/s—
Claude Opus 4.5 (Reasoning)
Anthropic———
49.7
72 tok/s11.7s
Gemini 3 Pro Preview (low)
Google———
41.3
——
Gemini 3 Flash Preview (Reasoning)
Google———
46.4
195 tok/s5.9s
Claude Opus 4.5 (Non-reasoning)
Anthropic———
43.1
63 tok/s1.1s
Claude 4.5 Sonnet (Reasoning)
Anthropic———
43
59 tok/s10.4s
Claude 4.1 Opus (Reasoning)
Anthropic———
42
42 tok/s8.0s
MiniMax-M2.1
MiniMax———
39.4
59 tok/s2.4s
Grok 3
xAI$3$15131K
87.5%
90 tok/s—
GPT-5 Codex (high)
OpenAI———
44.6
207 tok/s11.4s
GPT-5.1 (high)
OpenAI———
47.7
118 tok/s25.1s
GPT-5.2 (xhigh)
OpenAI———
51.3
72 tok/s81.3s
GPT-5 (high)
OpenAI———
44.6
86 tok/s99.7s
Grok 4
xAI———
41.5
64 tok/s7.4s
GPT-5 (medium)
OpenAI———
42
95 tok/s40.4s
Gemini 3 Pro
Google DeepMind$3.5$10.51M
87%
100 tok/s—
Claude 4 Opus (Reasoning)
Anthropic———
39
41 tok/s8.0s
Qwen3-Max
Alibaba Cloud$0.4$1.232K
87%
90 tok/s—
GPT-5 (low)
OpenAI———
39.2
75 tok/s10.3s
Claude 4 Opus (Non-reasoning)
Anthropic———
33
37 tok/s1.4s
Gemini 2.5 Pro Preview (Mar' 25)
Google———
30.3
——
DeepSeek V3.2 (Reasoning)
DeepSeek———
41.7
29 tok/s1.4s
DeepSeek V3.2 Speciale
DeepSeek———
29.4
——
GPT-5.2 (medium)
OpenAI———
46.6
——
GLM-4.7 (Reasoning)
Z AI———
42.1
109 tok/s0.7s
Gemini 2.5 Pro
Google———
34.6
127 tok/s22.0s
Claude 4.5 Sonnet (Non-reasoning)
Anthropic———
37.1
56 tok/s1.2s
GPT-5.1 Codex (high)
OpenAI———
43.1
167 tok/s6.7s
Cogito v2.1 (Reasoning)
Deep Cogito———
85%
57 tok/s0.5s
Grok 4 Fast (Reasoning)
xAI———
35.1
216 tok/s3.4s
Doubao Seed Code
ByteDance Seed———
33.5
——
DeepSeek V3.2 Exp (Reasoning)
DeepSeek———
32.9
30 tok/s1.4s
Kimi K2 Thinking
Kimi———
40.9
41 tok/s1.1s
DeepSeek V3.1 Terminus (Reasoning)
DeepSeek———
33.9
——
Grok 4.1 Fast (Reasoning)
xAI———
38.6
142 tok/s9.2s
DeepSeek R1 0528 (May '25)
DeepSeek———
27.1
——
DeepSeek V3.1 (Reasoning)
DeepSeek———
27.7
——
Qwen3 235B A22B 2507 (Reasoning)
Alibaba———
29.5
51 tok/s1.3s
MiMo-V2-Flash (Reasoning)
Xiaomi———
39.2
123 tok/s1.8s
Qwen3 Max (Preview)
Alibaba———
26.1
47 tok/s1.8s
DeepSeek V3.2 (Non-reasoning)
DeepSeek———
32.1
30 tok/s1.3s
Claude 4 Sonnet (Non-reasoning)
Anthropic———
33
52 tok/s0.8s
DeepSeek R1 (Jan '25)
DeepSeek———
18.8
——
DeepSeek V3.1 Terminus (Non-reasoning)
DeepSeek———
28.5
——
DeepSeek V3.2 Exp (Non-reasoning)
DeepSeek———
28.4
31 tok/s1.3s
K-EXAONE (Reasoning)
LG AI Research———
32.1
——
Qwen3 VL 235B A22B (Reasoning)
Alibaba———
27.6
45 tok/s1.2s
Gemini 2.5 Pro Preview (May' 25)
Google———
29.5
——
Gemini 2.5 Flash Preview (Sep '25) (Reasoning)
Google———
31.1
——
Claude 4 Sonnet (Reasoning)
Anthropic———
38.7
59 tok/s8.5s
Mistral Large
Mistral AI$2$6128K
84%
90 tok/s—
Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning)
Google———
25.7
——
Claude 3.7 Sonnet (Reasoning)
Anthropic———
34.7
——
GPT-5 mini (high)
OpenAI———
41.2
74 tok/s91.5s
GLM-4.5 (Reasoning)
Z AI———
26.4
38 tok/s0.9s
o1
OpenAI———
30.8
112 tok/s23.6s
Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)
NVIDIA———
15
42 tok/s0.7s
Grok 3 mini Reasoning (high)
xAI———
32.1
216 tok/s0.4s
GPT-5 mini (medium)
OpenAI———
38.9
77 tok/s20.0s
Nova 2.0 Pro Preview (medium)
Amazon———
35.7
120 tok/s17.9s
GLM-4.6 (Reasoning)
Z AI———
32.5
36 tok/s0.9s
Grok 3 Mini
xAI$0.3$0.5131K
83%
160 tok/s—
DeepSeek V3.1 (Non-reasoning)
DeepSeek———
28.1
——
Gemini 2.5 Flash (Reasoning)
Google———
27
205 tok/s13.3s
Hermes 4 - Llama-3.1 405B (Reasoning)
Nous Research———
18.6
32 tok/s0.8s
Qwen3 235B A22B (Reasoning)
Alibaba———
19.8
65 tok/s1.3s
ERNIE 5.0 Thinking Preview
Baidu———
29.1
——
Qwen3 235B A22B 2507 Instruct
Alibaba———
25
70 tok/s1.2s
MiniMax M1 80k
MiniMax———
24.4
——
EXAONE 4.0 32B (Reasoning)
LG AI Research———
16.7
——
Qwen3 VL 32B (Reasoning)
Alibaba———
24.7
97 tok/s1.4s
Seed-OSS-36B-Instruct
ByteDance Seed———
25.2
42 tok/s1.8s
Qwen3 VL 235B A22B Instruct
Alibaba———
20.8
57 tok/s1.2s
Kimi K2 0905
Kimi———
30.9
22 tok/s2.1s
GLM-4.5-Air
Z AI———
23.2
65 tok/s1.3s
INTELLECT-3
Prime Intellect———
22.2
——
MiniMax-M2
MiniMax———
36.1
61 tok/s2.2s
DeepSeek V3 0324
DeepSeek———
22.3
——
Magistral Medium 1.2
Mistral———
27.1
95 tok/s0.4s
GPT-4o mini
OpenAI$0.15$0.6128K
82%
200 tok/s—
Gemini 3 Flash
Google DeepMind$0.075$0.31M
82%
250 tok/s—
Qwen3 Max Thinking (Preview)
Alibaba———
32.5
43 tok/s1.8s
Nova 2.0 Lite (high)
Amazon———
34.5
195 tok/s21.4s
Nova 2.0 Pro Preview (low)
Amazon———
31.9
143 tok/s6.8s
GPT-5 (ChatGPT)
OpenAI———
21.8
158 tok/s0.6s
Kimi K2
Kimi———
26.3
35 tok/s1.3s
GPT-5.1 Codex mini (high)
OpenAI———
38.6
197 tok/s5.9s
Qwen3 Next 80B A3B (Reasoning)
Alibaba———
26.7
164 tok/s1.1s
Ling-1T
InclusionAI———
19
——
Qwen3 Next 80B A3B Instruct
Alibaba———
20.1
166 tok/s1.0s
Ring-1T
InclusionAI———
22.8
——
gpt-oss-120B (high)
OpenAI———
33.3
215 tok/s0.5s
Qwen3 30B A3B 2507 (Reasoning)
Alibaba———
22.4
148 tok/s1.1s
GPT-5.2 (Non-reasoning)
OpenAI———
33.6
63 tok/s0.8s
Mi:dm K 2.5 Pro Preview
Korea Telecom———
81%
——
Qwen3 VL 30B A3B (Reasoning)
Alibaba———
19.7
127 tok/s1.0s
Hermes 4 - Llama-3.1 70B (Reasoning)
Nous Research———
16
62 tok/s0.6s
Llama Nemotron Super 49B v1.5 (Reasoning)
NVIDIA———
18.7
60 tok/s0.3s
Llama 4 Maverick
Meta———
18.4
115 tok/s0.6s
GPT-5 (minimal)
OpenAI———
23.9
74 tok/s1.1s
K-EXAONE (Non-reasoning)
LG AI Research———
23.4
——
Nova 2.0 Omni (medium)
Amazon———
28
——
Nova 2.0 Lite (medium)
Amazon———
29.7
177 tok/s13.8s
Gemini 2.5 Flash (Non-reasoning)
Google———
20.6
180 tok/s0.5s
Mistral Large 3
Mistral———
22.8
56 tok/s0.6s
Gemini 2.0 Pro Experimental (Feb '25)
Google———
18.1
——
Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning)
Google———
21.6
——
KAT-Coder-Pro V1
KwaiKAT———
36
112 tok/s1.0s
Solar Pro 2 (Reasoning)
Upstage———
14.9
——
MiniMax M1 40k
MiniMax———
20.9
——
Mi:dm K 2.5 Pro
Korea Telecom———
23.1
——
Claude 3.7 Sonnet (Non-reasoning)
Anthropic———
30.8
——
Motif-2-12.7B-Reasoning
Motif Technologies———
19.1
——
Claude 4.5 Haiku (Non-reasoning)
Anthropic———
31.1
120 tok/s0.5s
Gemini 2.5 Flash-Lite Preview (Sep '25) (Non-reasoning)
Google———
19.4
——
Qwen3 32B (Reasoning)
Alibaba———
16.5
103 tok/s1.1s
Gemini 2.5 Flash Preview (Reasoning)
Google———
24.3
——
GPT-4o (March 2025, chatgpt-4o-latest)
OpenAI———
18.6
——
o3-mini (high)
OpenAI———
25.2
149 tok/s27.7s
Gemini 2.0 Flash Thinking Experimental (Jan '25)
Google———
19.6
——
DeepSeek R1 Distill Llama 70B
DeepSeek———
16
41 tok/s0.5s
Nova 2.0 Omni (low)
Amazon———
23.2
——
GPT-5.1 (Non-reasoning)
OpenAI———
27.4
108 tok/s0.8s
GLM-4.6V (Reasoning)
Z AI———
23.4
27 tok/s1.2s
Qwen3 Coder 480B A35B Instruct
Alibaba———
24.8
65 tok/s1.7s
Grok Code Fast 1
xAI———
28.7
185 tok/s5.4s
Nova 2.0 Lite (low)
Amazon———
24.6
210 tok/s5.1s
NVIDIA Nemotron 3 Nano 30B A3B (Reasoning)
NVIDIA———
24.3
133 tok/s1.3s
Llama 3.3 Nemotron Super 49B v1 (Reasoning)
NVIDIA———
18.5
——
K2-V2 (high)
MBZUAI Institute of Foundation Models———
20.6
——
HyperCLOVA X SEED Think (32B)
Naver———
23.7
——
Apriel-v1.6-15B-Thinker
ServiceNow———
27.6
——
Ring-flash-2.0
InclusionAI———
14
87 tok/s1.4s
Qwen3 Omni 30B A3B (Reasoning)
Alibaba———
15.6
93 tok/s1.0s
o3-mini
OpenAI———
25.9
151 tok/s8.1s
GLM-4.5V (Reasoning)
Z AI———
15.1
45 tok/s1.0s
GLM-4.7 (Non-reasoning)
Z AI———
34.2
106 tok/s0.7s
Qwen3 VL 32B Instruct
Alibaba———
17.2
83 tok/s1.3s
ERNIE 4.5 300B A47B
Baidu———
15
29 tok/s1.8s
GLM-4.6 (Non-reasoning)
Z AI———
30.2
67 tok/s0.9s
Command R+
Cohere$2.5$10128K
78%
80 tok/s—
GPT-4.1 mini
OpenAI———
22.9
90 tok/s0.6s
Qwen3 30B A3B 2507 Instruct
Alibaba———
15
92 tok/s1.3s
Gemini 2.0 Flash (experimental)
Google———
16.8
——
Ling-flash-2.0
InclusionAI———
15.7
94 tok/s1.5s
Gemini 2.5 Flash Preview (Non-reasoning)
Google———
17.8
——
GPT-5 mini (minimal)
OpenAI———
20.7
96 tok/s1.1s
gpt-oss-120B (low)
OpenAI———
24.5
218 tok/s0.5s
Gemini 2.0 Flash (Feb '25)
Google———
18.5
——
Qwen3 30B A3B (Reasoning)
Alibaba———
15.3
70 tok/s1.2s
GPT-5 nano (high)
OpenAI———
26.8
144 tok/s100.6s
GPT-5 nano (medium)
OpenAI———
25.9
145 tok/s50.0s
GPT-4o (ChatGPT)
OpenAI———
14.1
——
Nova 2.0 Pro Preview (Non-reasoning)
Amazon———
23.1
151 tok/s0.7s
Magistral Small 1.2
Mistral———
18.2
188 tok/s0.4s
Claude 3.5 Sonnet (Oct '24)
Anthropic———
15.9
——
Solar Pro 2 (Preview) (Reasoning)
Upstage———
18.8
——
EXAONE 4.0 32B (Non-reasoning)
LG AI Research———
11.7
——
Qwen3 14B (Reasoning)
Alibaba———
16.2
65 tok/s1.1s
Apriel-v1.5-15B-Thinker
ServiceNow———
28.3
——
Mistral Medium 3
Mistral———
18.8
62 tok/s0.5s
Sonar Pro
Perplexity———
15.2
——
Devstral 2
Mistral———
22
79 tok/s0.5s
Claude 4.5 Haiku (Reasoning)
Anthropic———
37.1
156 tok/s10.0s
Olmo 3 32B Think
Allen Institute for AI———
12.1
——
NVIDIA Nemotron Nano 12B v2 VL (Reasoning)
NVIDIA———
14.9
151 tok/s0.5s
Qwen3 235B A22B (Non-reasoning)
Alibaba———
17
63 tok/s1.2s
Gemini 2.5 Flash-Lite (Reasoning)
Google———
17.6
295 tok/s12.3s
Qwen3 VL 30B A3B Instruct
Alibaba———
16.1
123 tok/s1.0s
K2-V2 (medium)
MBZUAI Institute of Foundation Models———
18.7
——
Olmo 3.1 32B Think
Allen Institute for AI———
13.9
——
Qwen2.5 Max
Alibaba———
16.3
46 tok/s1.1s
QwQ 32B
Alibaba———
19.7
33 tok/s0.4s
Claude Haiku 4.5
Anthropic$0.8$4200K
75.2%
250 tok/s—
Claude 3.5 Sonnet (June '24)
Anthropic———
14.2
——
Magistral Small 1
Mistral———
16.8
——
GLM-4.6V (Non-reasoning)
Z AI———
17.1
23 tok/s5.9s
Magistral Medium 1
Mistral———
18.8
——
Solar Pro 2 (Non-reasoning)
Upstage———
13.6
——
Llama 4 Scout
Meta———
13.5
137 tok/s0.5s
Gemini 1.5 Pro (Sep '24)
Google———
16
——
gpt-oss-20B (high)
OpenAI———
24.5
252 tok/s0.3s
Qwen3 VL 8B (Reasoning)
Alibaba———
16.7
135 tok/s1.1s
GLM-4.5V (Non-reasoning)
Z AI———
12.7
39 tok/s29.9s
DeepSeek R1 Distill Qwen 14B
DeepSeek———
15.8
——
Qwen3 4B 2507 (Reasoning)
Alibaba———
18.2
——
MiMo-V2-Flash (Non-reasoning)
Xiaomi———
30.4
124 tok/s1.5s
GPT-4o (May '24)
OpenAI———
14.5
101 tok/s0.5s
Qwen3 8B (Reasoning)
Alibaba———
13.2
91 tok/s1.0s
NVIDIA Nemotron Nano 9B V2 (Reasoning)
NVIDIA———
14.8
117 tok/s0.3s
DeepSeek R1 Distill Qwen 32B
DeepSeek———
17.2
42 tok/s0.5s
NVIDIA Nemotron Nano 9B V2 (Non-reasoning)
NVIDIA———
13.2
153 tok/s0.7s
Nova 2.0 Lite (Non-reasoning)
Amazon———
18
182 tok/s0.8s
DeepSeek R1 0528 Qwen3 8B
DeepSeek———
16.4
——
o1-mini
OpenAI———
20.4
——
Grok 4.1 Fast (Non-reasoning)
xAI———
23.6
131 tok/s0.4s
Llama 3.1 Instruct 405B
Meta———
17.4
31 tok/s0.7s
Solar Pro 2 (Preview) (Non-reasoning)
Upstage———
16
——
Hermes 4 - Llama-3.1 405B (Non-reasoning)
Nous Research———
17.6
32 tok/s0.9s
Qwen3 32B (Non-reasoning)
Alibaba———
14.5
102 tok/s1.2s
Qwen3 Omni 30B A3B Instruct
Alibaba———
10.7
106 tok/s1.1s
Grok 4 Fast (Non-reasoning)
xAI———
23.1
196 tok/s0.4s
Falcon-H1R-7B
TII UAE———
15.8
——
Nova Premier
Amazon———
19
70 tok/s1.2s
Gemini 2.0 Flash-Lite (Feb '25)
Google———
14.7
——
Gemini 3.1 Flash-Lite
Google DeepMind$0.01$0.041M
72%
500 tok/s—
Command R
Cohere$0.15$0.6128K
72%
150 tok/s—
Mistral Small
Mistral AI$0.1$0.332K
72%
200 tok/s—
Nova 2.0 Omni (Non-reasoning)
Amazon———
16.6
227 tok/s0.9s
Qwen2.5 Instruct 72B
Alibaba———
15.6
55 tok/s1.2s
gpt-oss-20B (low)
OpenAI———
20.8
261 tok/s0.4s
Llama 3.1 Tulu3 405B
Allen Institute for AI———
14.1
——
Gemini 2.5 Flash-Lite (Non-reasoning)
Google———
12.7
260 tok/s0.4s
K2-V2 (low)
MBZUAI Institute of Foundation Models———
14.4
——
Grok 2 (Dec '24)
xAI———
13.9
——
Command A
Cohere———
13.5
40 tok/s0.6s
Qwen3 Coder 30B A3B Instruct
Alibaba———
20
113 tok/s1.4s
Qwen3 30B A3B (Non-reasoning)
Alibaba———
12.5
67 tok/s1.2s
Llama 3.3 Instruct 70B
Meta———
14.5
96 tok/s0.6s
Devstral Medium
Mistral———
18.7
145 tok/s0.5s
Llama 3.3 Nemotron Super 49B v1 (Non-reasoning)
NVIDIA———
14.3
——
Sarvam M (Reasoning)
Sarvam———
8.4
——
Mistral Large 2 (Nov '24)
Mistral———
15.1
41 tok/s0.5s
Qwen2.5 Instruct 32B
Alibaba———
13.2
——
Qwen3 4B (Reasoning)
Alibaba———
14.2
104 tok/s1.0s
Grok Beta
xAI———
13.3
——
Qwen3 VL 4B (Reasoning)
Alibaba———
13.7
——
Pixtral Large
Mistral———
14
51 tok/s0.5s
Claude 3 Opus
Anthropic———
18
——
Llama 3.1 Nemotron Instruct 70B
NVIDIA———
13.4
46 tok/s0.3s
Ministral 3 14B
Mistral———
16
99 tok/s0.3s
GPT-4 Turbo
OpenAI———
13.7
32 tok/s1.2s
Sonar
Perplexity———
15.5
——
Qwen3 VL 8B Instruct
Alibaba———
14.3
148 tok/s0.9s
Llama Nemotron Super 49B v1.5 (Non-reasoning)
NVIDIA———
14.6
58 tok/s0.3s
Nova Pro
Amazon———
13.5
——
Gemini 1.5 Flash (Sep '24)
Google———
13.8
——
Llama 3.1 Instruct 70B
Meta———
12.5
31 tok/s0.8s
Mistral Small 3.2
Mistral———
15.1
155 tok/s0.3s
Mistral Medium 3.1
Mistral———
21.3
89 tok/s0.4s
Qwen3 14B (Non-reasoning)
Alibaba———
12.8
65 tok/s1.0s
Mistral Large 2 (Jul '24)
Mistral———
13
——
Devstral Small 2
Mistral———
19.5
80 tok/s0.7s
Llama 3.2 Instruct 90B (Vision)
Meta———
11.9
42 tok/s0.5s
Qwen3 4B 2507 Instruct
Alibaba———
12.9
——
Reka Flash 3
Reka AI———
9.5
94 tok/s1.3s
Ling-mini-2.0
InclusionAI———
9.2
——
GPT-4.1 nano
OpenAI———
13
200 tok/s0.4s
Olmo 3 7B Think
Allen Institute for AI———
9.4
——
Gemini 1.5 Pro (May '24)
Google———
12
——
Mistral Small 3.1
Mistral———
14.5
153 tok/s0.5s
Hermes 4 - Llama-3.1 70B (Non-reasoning)
Nous Research———
12.6
63 tok/s0.6s
QwQ 32B-Preview
Alibaba———
15.2
43 tok/s0.5s
Mistral Small 3
Mistral———
12.7
154 tok/s0.5s
NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning)
NVIDIA———
10.1
175 tok/s0.7s
Qwen3 8B (Non-reasoning)
Alibaba———
10.6
94 tok/s0.9s
Ministral 3 8B
Mistral———
14.8
180 tok/s0.3s
Qwen2.5 Coder Instruct 32B
Alibaba———
12.9
——
Qwen2.5 Turbo
Alibaba———
12
68 tok/s1.2s
Claude 3.5 Haiku
Anthropic———
18.7
——
Qwen3 VL 4B Instruct
Alibaba———
9.6
——
Devstral Small (May '25)
Mistral———
18
——
Qwen2 Instruct 72B
Alibaba———
11.7
——
Granite 4.0 H Small
IBM———
10.8
453 tok/s8.7s
Devstral Small (Jul '25)
Mistral———
15.2
202 tok/s0.4s
Mistral Saba
Mistral———
12.1
——
Gemma 3 12B Instruct
Google———
8.8
30 tok/s10.2s
Exaone 4.0 1.2B (Reasoning)
LG AI Research———
8.3
——
Kimi Linear 48B A3B Instruct
Kimi———
14.4
——
Nova Lite
Amazon———
12.7
221 tok/s0.7s
Qwen3 4B (Non-reasoning)
Alibaba———
12.5
105 tok/s1.0s
Claude 3 Sonnet
Anthropic———
10.3
——
Jamba 1.7 Large
AI21 Labs———
10.9
49 tok/s1.1s
Jamba Reasoning 3B
AI21 Labs———
9.6
——
NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning)
NVIDIA———
13.2
78 tok/s0.3s
DeepHermes 3 - Mistral 24B Preview (Non-reasoning)
Nous Research———
10.9
——
Llama 3 Instruct 70B
Meta———
8.9
42 tok/s0.7s
Gemini 1.5 Flash-8B
Google———
11.1
——
Hermes 3 - Llama-3.1 70B
Nous Research———
10.6
28 tok/s0.4s
Qwen3 1.7B (Reasoning)
Alibaba———
8
138 tok/s1.0s
Jamba 1.5 Large
AI21 Labs———
10.7
——
Gemini 1.5 Flash (May '24)
Google———
10.5
——
Jamba 1.6 Large
AI21 Labs———
10.6
48 tok/s0.9s
Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning)
NVIDIA———
14.4
——
GPT-5 nano (minimal)
OpenAI———
13.8
142 tok/s1.0s
DeepSeek R1 Distill Llama 8B
DeepSeek———
12.1
——
Mixtral 8x22B Instruct
Mistral———
9.8
——
Nova Micro
Amazon———
10.3
314 tok/s0.6s
Olmo 3 7B Instruct
Allen Institute for AI———
8.2
——
Ministral 3 3B
Mistral———
11.2
307 tok/s0.3s
LFM2 8B A1B
Liquid AI———
7
——
OLMo 2 32B
Allen Institute for AI———
10.6
——
Claude 2.1
Anthropic———
9.3
——
Exaone 4.0 1.2B (Non-reasoning)
LG AI Research———
8.1
——
Mistral Medium
Mistral———
9
89 tok/s0.4s
Phi-4 Multimodal Instruct
Microsoft Azure———
10
16 tok/s0.4s
Claude 2.0
Anthropic———
9.1
——
Gemma 3n E4B Instruct
Google———
6.4
14 tok/s0.4s
Gemma 3n E4B Instruct Preview (May '25)
Google———
10.1
——
Llama 3.1 Instruct 8B
Meta———
11.8
170 tok/s0.4s
Phi-4 Mini Instruct
Microsoft Azure———
8.4
44 tok/s0.3s
Granite 3.3 8B (Non-reasoning)
IBM———
7
427 tok/s7.3s
Qwen2.5 Coder Instruct 7B
Alibaba———
10
——
Llama 3.2 Instruct 11B (Vision)
Meta———
8.7
79 tok/s0.5s
GPT-3.5 Turbo
OpenAI———
9
89 tok/s0.5s
Granite 4.0 Micro
IBM———
7.7
——
Phi-3 Mini Instruct 3.8B
Microsoft Azure———
10.1
——
Claude Instant
Anthropic———
7.4
——
Gemini 1.0 Pro
Google———
8.5
——
LFM 40B
Liquid AI———
8.8
——
DeepSeek Coder V2 Lite Instruct
DeepSeek———
8.5
——
Command-R+ (Apr '24)
Cohere———
8.3
——
Mistral Small (Feb '24)
Mistral———
9
154 tok/s0.5s
Gemma 3 4B Instruct
Google———
6.3
30 tok/s1.1s
Qwen3 1.7B (Non-reasoning)
Alibaba———
6.8
141 tok/s0.9s
Llama 2 Chat 13B
Meta———
8.4
——
Llama 2 Chat 70B
Meta———
8.4
——
Llama 3 Instruct 8B
Meta———
6.4
82 tok/s0.5s
Mixtral 8x7B Instruct
Mistral———
7.7
——
Jamba 1.7 Mini
AI21 Labs———
8.1
——
Gemma 3n E2B Instruct
Google———
4.8
51 tok/s0.5s
Molmo 7B-D
Allen Institute for AI———
9.2
——
Jamba 1.5 Mini
AI21 Labs———
8
——
Jamba 1.6 Mini
AI21 Labs———
7.9
178 tok/s0.8s
DeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning)
Nous Research———
7.6
——
Llama 3.2 Instruct 3B
Meta———
9.7
53 tok/s0.6s
Qwen3 0.6B (Reasoning)
Alibaba———
6.5
189 tok/s0.9s
Command-R (Mar '24)
Cohere———
7.4
——
Granite 4.0 1B
IBM———
7.3
——
OpenChat 3.5 (1210)
OpenChat———
8.3
——
LFM2 2.6B
Liquid AI———
8
——
Granite 4.0 H 1B
IBM———
8
——
OLMo 2 7B
Allen Institute for AI———
9.3
——
DeepSeek R1 Distill Qwen 1.5B
DeepSeek———
9.1
——
LFM2 1.2B
Liquid AI———
6.3
——
Mistral 7B Instruct
Mistral———
7.4
190 tok/s0.3s
Qwen3 0.6B (Non-reasoning)
Alibaba———
5.7
194 tok/s0.9s
Llama 3.2 Instruct 1B
Meta———
6.3
88 tok/s0.6s
Llama 2 Chat 7B
Meta———
9.7
108 tok/s12.6s
Gemma 3 1B Instruct
Google———
5.5
48 tok/s0.6s
Granite 4.0 H 350M
IBM———
5.4
——
Granite 4.0 350M
IBM———
6.1
——
Gemma 3 270M
Google———
7.7
——
Gemini 3.1 Flash TTS
Google——————
GPT-5.4 nano (xhigh)
OpenAI———
44
157 tok/s2.5s
Mercury 2
Inception———
32.8
872 tok/s4.7s
NVIDIA Nemotron 3 Nano 4B
NVIDIA———
14.7
——
Gemma 4 31B (Non-reasoning)
Google———
32.3
——
Molmo2-8B
Allen Institute for AI———
7.3
——
MiMo-V2-Flash (Feb 2026)
Xiaomi———
41.5
127 tok/s1.5s
MiMo-V2-Omni
Xiaomi———
43.4
——
MiMo-V2-Pro
Xiaomi———
49.2
67 tok/s2.1s
KAT Coder Pro V2
KwaiKAT———
43.8
114 tok/s1.8s
MiMo-V2-Omni-0327
Xiaomi———
44.9
——
Sarvam 30B (high)
Sarvam———
12.3
294 tok/s1.2s
Sarvam 105B (high)
Sarvam———
18.2
124 tok/s1.2s
K2 Think V2
MBZUAI Institute of Foundation Models———
24.1
——
Step3 VL 10B
StepFun———
15.4
——
o1-preview
OpenAI———
23.7
——
Olmo 3.1 32B Instruct
Allen Institute for AI———
12.2
54 tok/s0.3s
LongCat Flash Lite
LongCat———
23.9
115 tok/s3.9s
Tri-21B-think Preview
Trillion Labs———
20
——
Fish Audio S2 Pro
Fish Audio——————
Nanbeige4.1-3B
Nanbeige———
16.1
——
Tri-21B-Think
Trillion Labs———
18.6
——
Apertus 70B Instruct
Swiss AI Initiative———
7.7
——
Apertus 8B Instruct
Swiss AI Initiative———
5.9
——
Trinity Large Thinking
Arcee AI———
31.9
127 tok/s0.6s
GLM-5 (Non-reasoning)
Z AI———
40.6
53 tok/s1.4s
GLM-5.1 (Reasoning)
Z AI———
51.4
43 tok/s1.2s
GLM-5-Turbo
Z AI———
46.8
——
GLM 5V Turbo (Reasoning)
Z AI———
42.9
——
Tiny Aya Global
Cohere———
4.7
——
GLM-5 (Reasoning)
Z AI———
49.8
67 tok/s0.9s
Qwen3.5 397B A17B (Reasoning)
Alibaba———
45
52 tok/s1.5s
Qwen3.5 0.8B (Reasoning)
Alibaba———
10.5
——
Qwen3.5 2B (Non-reasoning)
Alibaba———
14.7
232 tok/s0.3s
Qwen3.5 0.8B (Non-reasoning)
Alibaba———
9.9
285 tok/s0.3s
Qwen3.5 4B (Non-reasoning)
Alibaba———
22.6
178 tok/s0.3s
Kimi K2.5 (Non-reasoning)
Kimi———
37.3
32 tok/s1.4s
Qwen3 Coder Next
Alibaba———
28.3
165 tok/s0.8s
Qwen3.5 9B (Reasoning)
Alibaba———
32.4
56 tok/s0.4s
Qwen3.5 2B (Reasoning)
Alibaba———
16.3
——
Qwen3.5 35B A3B (Reasoning)
Alibaba———
37.1
149 tok/s1.2s
Qwen3.5 27B (Non-reasoning)
Alibaba———
37.2
92 tok/s1.4s
Qwen3.5 122B A10B (Reasoning)
Alibaba———
41.6
159 tok/s1.1s
Qwen3 Max Thinking
Alibaba———
39.9
36 tok/s1.7s
Step 3.5 Flash 2603
StepFun———
38.5
186 tok/s0.8s
Step 3.5 Flash
StepFun———
37.8
163 tok/s0.8s
Nemotron Cascade 2 30B A3B
NVIDIA———
28.4
——
Qwen3.5 Omni Plus
Alibaba———
38.6
55 tok/s1.3s
Qwen3.5 4B (Reasoning)
Alibaba———
27.1
177 tok/s0.3s
Arcana v3
Rime——————
Magpie Multilingual
NVIDIA——————
Qwen3.6 Plus
Alibaba———
50
53 tok/s1.6s
Qwen3.5 397B A17B (Non-reasoning)
Alibaba———
40.1
52 tok/s1.4s
Qwen3.5 122B A10B (Non-reasoning)
Alibaba———
35.9
152 tok/s1.1s
Qwen3.5 Omni Flash
Alibaba———
25.9
170 tok/s1.2s
Qwen3.5 27B (Reasoning)
Alibaba———
42.1
92 tok/s1.4s
Llama 65B
Meta———
7.4
——
Kimi K2.5 (Reasoning)
Kimi———
46.8
32 tok/s1.3s
Qwen3.5 35B A3B (Non-reasoning)
Alibaba———
30.7
153 tok/s1.1s
GPT-3.5 Turbo (0613)
OpenAI——————
DeepSeek-V2.5
DeepSeek———
12.3
——
o3-pro
OpenAI———
40.7
19 tok/s95.4s
LFM2 24B A2B
Liquid AI———
10.5
163 tok/s0.3s
o1-pro
OpenAI———
25.8
——
GPT-4o (Aug '24)
OpenAI———
18.6
108 tok/s0.6s
Solar Open 100B (Reasoning)
Upstage———
21.7
——
NVIDIA Nemotron 3 Super 120B A12B (Reasoning)
NVIDIA———
36
154 tok/s1.1s
GPT-5.2 Codex (xhigh)
OpenAI———
49
107 tok/s7.4s
GPT-4o Realtime (Dec '24)
OpenAI——————
GPT-4
OpenAI———
12.8
35 tok/s0.8s
MiniMax-M2.7
MiniMax———
49.6
47 tok/s1.6s
LFM2.5-1.2B-Thinking
Liquid AI———
8.1
——
GPT-4o mini Realtime (Dec '24)
OpenAI——————
GPT-4.5 (Preview)
OpenAI———
20
——
Gemini 2.0 Flash-Lite (Preview)
Google———
14.5
——
LFM2.5-1.2B-Instruct
Liquid AI———
8
——
Solar Pro 3
Upstage———
25.9
——
LFM2.5-VL-1.6B
Liquid AI———
6.2
——
Gemini 1.0 Ultra
Google———
10.1
——
Gemini 2.0 Flash Thinking Experimental (Dec '24)
Google———
12.3
——
PALM-2
Google———
8.6
——
Grok 4.20 0309 v2 (Reasoning)
xAI———
49.3
175 tok/s15.5s
Grok 4.20 0309 v2 (Non-reasoning)
xAI———
29
177 tok/s0.4s
Qwen3.6 Max Preview
Alibaba———
51.8
57 tok/s1.9s
Claude 3 Haiku
Anthropic———
12.3
131 tok/s0.5s
R1 1776
Perplexity———
12
——
Codestral
Mistral AI$0.3$0.932K—180 tok/s—
Claude 4.1 Opus (Non-reasoning)
Anthropic———
36
39 tok/s1.4s
Claude Opus 4.7 (Non-reasoning, High Effort)
Anthropic———
51.8
53 tok/s1.2s
DeepSeek-V2.5 (Dec '24)
DeepSeek———
12.5
——
DeepSeek-Coder-V2
DeepSeek———
10.6
——
DeepSeek LLM 67B Chat (V1)
DeepSeek———
8.4
——
Gemini 3.1 Pro Preview
Google———
57.2
124 tok/s28.7s
Magpie-Multilingual 357M (Feb 2026)
NVIDIA——————
Claude Sonnet 4.6 (Non-reasoning, Low Effort)
Anthropic———
42.6
60 tok/s1.0s
Grok 4.20 0309 (Non-reasoning)
xAI———
29.7
164 tok/s0.4s
Sonar Reasoning
Perplexity———
17.9
——
Mistral Small 4 (Reasoning)
Mistral———
27.8
173 tok/s0.5s
Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort)
Anthropic———
51.7
72 tok/s46.6s
Grok 4.20 0309 (Reasoning)
xAI———
48.5
183 tok/s16.1s
Grok 3 Reasoning Beta
xAI———
21.6
——
Solar Mini
Upstage———
11.9
87 tok/s1.4s
MiniMax-M2.5
MiniMax———
41.9
59 tok/s2.1s
Gemma 4 E2B (Non-reasoning)
Google———
12.1
——
Gemma 4 E4B (Non-reasoning)
Google———
14.8
——
Claude Opus 4.6 (Adaptive Reasoning, Max Effort)
Anthropic———
53
53 tok/s11.7s
Sonar Reasoning Pro
Perplexity———
24.6
——
Reka Flash (Sep '24)
Reka AI———
12
85 tok/s1.3s
Gemma 4 E2B (Reasoning)
Google———
15.2
——
Gemini 3.1 Flash-Lite Preview
Google———
33.5
319 tok/s5.7s
GLM-4.7-Flash (Non-reasoning)
Z AI———
22.1
105 tok/s1.0s
GLM-4.7-Flash (Reasoning)
Z AI———
30.1
91 tok/s0.9s
Gemma 4 E4B (Reasoning)
Google———
18.8
——
GPT-5.4 mini (medium)
OpenAI———
37.7
181 tok/s6.3s
GPT-5.4 mini (xhigh)
OpenAI———
48.9
189 tok/s6.9s
Gemini 2.5 Flash Lite TTS
Google——————
Grok-1
xAI———
11.7
——
Gemini 3 Deep Think
Google——————
Gemma 4 26B A4B (Non-reasoning)
Google———
27.1
——
Muse Spark
Meta———
52.1
——
Qwen Chat 72B
Alibaba———
8.8
——
Gemma 4 31B (Reasoning)
Google———
39.2
35 tok/s1.0s
Arctic Instruct
Snowflake———
8.8
——
GPT-5.4 nano (medium)
OpenAI———
38.1
158 tok/s3.8s
Qwen1.5 Chat 110B
Alibaba———
9.5
——
Gemini 2.5 Flash TTS (Dec 2025)
Google——————
Inworld TTS 1.5 Max
Inworld——————
Eleven v3
ElevenLabs——————
GPT-5.4 nano (Non-Reasoning)
OpenAI———
24.4
161 tok/s0.6s
Inworld TTS 1 Max
Inworld——————
Speech 2.6 HD
MiniMax——————
Speech 2.8 Turbo
MiniMax——————
Speech 2.6 Turbo
MiniMax——————
Inworld TTS 1
Inworld——————
Speech-02-HD
MiniMax——————
Azure HD 2.5
Microsoft Azure——————
Multilingual v2
ElevenLabs——————
Speech-02-Turbo
MiniMax——————
TTS-1
OpenAI——————
Step Audio EditX (Mar 2026)
StepFun——————
Turbo v2.5
ElevenLabs——————
Flash v2.5
ElevenLabs——————
TTS-1 HD
OpenAI——————
Sonic 3
Cartesia——————
OpenAudio S1
Fish Audio——————
Studio
Google——————
Kokoro 82M v1.0
Kokoro——————
T2A-01-HD
MiniMax——————
SIMBA 1.6
Speechify——————
Polly Generative
Amazon——————
AsyncFlow V2, async
async——————
Maya1
Maya Research——————
Voxtral TTS
Mistral——————
Azure Neural
Microsoft Azure——————
Inworld TTS 1.5 Mini
Inworld——————
Step TTS 2 (Mar 2026)
StepFun——————
Chatterbox HD
Resemble AI——————
Journey
Google——————
SIMBA 1.0
Speechify——————
MAI-Voice-1
Microsoft Azure——————
Octave TTS
Hume AI——————
T2A-01-Turbo
MiniMax——————
MiMo-V2-TTS
Xiaomi——————
Fish Speech 1.5
Fish Audio——————
Lightning v3.1
Smallest.ai——————
Chatterbox
Resemble AI——————
Gemini 2.5 Pro (Dec 2025)
Google——————
Magpie-Multilingual 357M
NVIDIA——————
Zonos-v0.1
Zyphra——————
LMNT
LMNT——————
VibeVoice 7B
Microsoft Azure——————
Murf Speech Gen 2
Murf AI——————
VibeVoice 1.5B
Microsoft Azure——————
OpenVoice v2
OpenVoice——————
Neuphonic TTS
Neuphonic——————
Qwen3 TTS
Alibaba——————
XTTS v2
Coqui——————
Qwen3 TTS Flash
Alibaba——————
StyleTTS 2
StyleTTS ——————
WaveNet
Google——————
Polly Neural
Amazon——————
Claude Opus 4.7 (Adaptive Reasoning, Max Effort)
Anthropic———
57.3
57 tok/s11.6s
Sonic English (Oct 2024)
Cartesia——————
Polly Long-Form
Amazon——————
Falcon (Beta)
Murf AI——————
Polly Standard
Amazon——————
GPT-5.4 (xhigh)
OpenAI———
56.8
81 tok/s157.8s
Mistral Small 4 (Non-reasoning)
Mistral———
18.6
149 tok/s0.5s
GPT-5.4 (Non-reasoning)
OpenAI———
35.4
62 tok/s0.7s
JT-MINI
China Mobile———
25.4
——
GLM-5.1 (Non-reasoning)
Z AI———
43.8
47 tok/s2.1s
GPT-5.3 Codex (xhigh)
OpenAI———
53.6
85 tok/s60.3s
Qwen3.5 9B (Non-reasoning)
Alibaba———
27.3
143 tok/s0.3s
GPT-5.4 Pro (xhigh)
OpenAI——————
Gemma 4 26B A4B (Reasoning)
Google———
31.2
——
Qwen Chat 14B
Alibaba———
7.4
——
Chirp 3: HD
Google——————
MetaVoice v1
MetaVoice——————
GPT-5.4 mini (Non-Reasoning)
OpenAI———
23.3
176 tok/s0.6s
DeepSeek-V2-Chat
DeepSeek———
9.1
——
Kimi K2.6
Kimi———
53.9
135 tok/s0.8s
Qwen3.6 35B A3B (Reasoning)
Alibaba———
43.5
238 tok/s1.7s
Speech 2.8 HD
MiniMax——————
Standard
Google——————
Qwen3.5 Omni Flash
Alibaba——————
Qwen3.6 35B A3B (Non-reasoning)
Alibaba———
31.5
193 tok/s1.5s
Octave 2
Hume AI——————
Neural2
Google——————
Ling 2.6 Flash
InclusionAI———
26.2
202 tok/s0.8s
∞AI

Everything AI. In one place.

Platform

ToolsModelsJobsHackathonsSubmit

Company

AboutContact

Stay updated

Get weekly AI news in your inbox

© 2026 ∞AI. Built for the AI community.everythingai.tech

Estimate Your Monthly Cost

Enter your expected usage to compare costs across models

e.g. 1,000,000 = ~750,000 words

Usually 30–50% of input volume

6 models selected

ModelInput CostOutput CostTotal/Monthvs Cheapest
DeepSeek R2
DeepSeek
$0.55$1.09$1.65✓ Best value
GPT-4.1
OpenAI
$2.00$4.00$6.003.6× more
Claude Sonnet 4.6
Anthropic
$3.00$7.50$10.506.4× more
GPT-4o
OpenAI
$5.00$7.50$12.507.6× more
o3
OpenAI
$10.00$20.00$30.0018.2× more
Claude Opus 4.6
Anthropic
$15.00$37.50$52.5031.9× more

Prices are approximate and may vary. Check provider documentation for current pricing.