∞AI
ToolsModelsJobsHackathons
SubmitSign In

AI Model Comparison

Compare pricing, benchmarks, and capabilities across 527 AI models

527 models tracked9 open source
AllLanguage ModelsText → ImageText → VideoText → SpeechImage → Video
Type
AllProprietaryOpen Source
Provider
AllAI21 LabsAlibabaAlibaba CloudAllen Institute for AIAmazonAnthropicArcee AIBaiduByteDance SeedCartesiaChina MobileCohereCoquiDatabricksDeep CogitoDeepSeekElevenLabsFish AudioGoogleGoogle DeepMindGradiumHume AIIBMInceptionInclusionAIInworldKimiKlingAIKokoroKorea TelecomKwaiKATLG AI ResearchLMNTLiquid AILongCatMBZUAI Institute of Foundation ModelsMaya ResearchMetaMeta AIMetaVoiceMicrosoftMiniMaxMistralMistral AIMotif TechnologiesMurf AINVIDIANanbeigeNaverNeuphonicNous ResearchOpenAIOpenChatOpenVoicePerplexityPrime IntellectReka AIResemble AIRimeSarvamServiceNowSmallest.aiSnowflakeSpeechifyStepFunStyleTTS Swiss AI InitiativeTIITII UAETencentTrillion LabsUpstageXiaomiZ AIZyphraasyncxAI
Price
AnyFree<$1/M<$5/M<$20/M
Sort
Best BenchmarkCheapest FirstMost ExpensiveLargest ContextFastest
ModelProviderInput $/1M↕Output $/1M↕Context↕Intelligence↑Speed↕LatencyAPI
DeepSeek R2
★
DeepSeek$0.55$2.19128K
91%
60 tok/s—
GPT-4.1
★
OpenAI$2$81M
90.5%
80 tok/s—
Claude Opus 4.6
★
Anthropic$15$75200K
88.7%
60 tok/s—
GPT-4o
★
OpenAI$5$15128K
87.2%
120 tok/s—
Claude Sonnet 4.6
★
Anthropic$3$15200K
86.8%
100 tok/s—
Llama 3.3 70B
Open★
Meta AI$0.23$0.92128K
86%
80 tok/s—
o3
OpenAI$10$40200K
96.7%
40 tok/s—
o4-mini
OpenAI$1.1$4.4200K
93.4%
100 tok/s—
Gemini 3 Ultra
Google DeepMind$7$211M
90.1%
70 tok/s—
Gemini 3 Pro Preview (low)
Google———
41.3
——
Claude Opus 4.5 (Reasoning)
Anthropic———
49.7
70 tok/s13.9s
Claude Opus 4.5 (Non-reasoning)
Anthropic———
43.1
58 tok/s1.2s
Gemini 3 Flash Preview (Reasoning)
Google———
46.4
203 tok/s6.4s
DeepSeek V3
Open
DeepSeek$0.27$1.1128K
88.5%
80 tok/s—
Claude 4.5 Sonnet (Reasoning)
Anthropic———
43
46 tok/s11.3s
Claude 4.1 Opus (Reasoning)
Anthropic———
42
36 tok/s9.3s
MiniMax-M2.1
MiniMax———
39.4
85 tok/s1.1s
Grok 3
xAI$3$15131K
87.5%
90 tok/s—
Llama 3.1 405B
Open
Meta AI$3$3128K
87.3%
30 tok/s—
GPT-5.1 (high)
OpenAI———
47.7
168 tok/s21.6s
GPT-5 Codex (high)
OpenAI———
44.6
178 tok/s5.6s
Qwen3-Max
Alibaba Cloud$0.4$1.232K
87%
90 tok/s—
GPT-5 (medium)
OpenAI———
42
84 tok/s45.4s
Grok 4
xAI———
41.5
44 tok/s15.3s
GPT-5 (high)
OpenAI———
44.6
88 tok/s102.1s
GPT-5.2 (xhigh)
OpenAI———
51.3
80 tok/s91.1s
Claude 4 Opus (Reasoning)
Anthropic———
39
35 tok/s7.6s
Gemini 3 Pro
Google DeepMind$3.5$10.51M
87%
100 tok/s—
Gemini 2.5 Pro
Google———
34.6
135 tok/s20.0s
GPT-5.1 Codex (high)
OpenAI———
43.1
196 tok/s5.9s
DeepSeek V3.2 Speciale
DeepSeek———
29.4
——
GPT-5.2 (medium)
OpenAI———
46.6
——
GPT-5 (low)
OpenAI———
39.2
73 tok/s13.5s
GLM-4.7 (Reasoning)
Z AI———
42.1
106 tok/s0.7s
Gemini 2.5 Pro Preview (Mar' 25)
Google———
30.3
——
Claude 4 Opus (Non-reasoning)
Anthropic———
33
36 tok/s1.6s
DeepSeek V3.2 (Reasoning)
DeepSeek———
41.7
——
Claude 4.5 Sonnet (Non-reasoning)
Anthropic———
37.1
46 tok/s0.9s
Kimi K2 Thinking
Kimi———
40.9
107 tok/s0.9s
DeepSeek V3.1 (Reasoning)
DeepSeek———
27.7
——
Doubao Seed Code
ByteDance Seed———
33.5
——
Cogito v2.1 (Reasoning)
Deep Cogito———
85%
86 tok/s0.5s
DeepSeek V3.2 Exp (Reasoning)
DeepSeek———
32.9
——
Qwen3-72B
Open
Alibaba CloudFreeFree32K
85%
100 tok/s—
DeepSeek R1 0528 (May '25)
DeepSeek———
27.1
——
Grok 4 Fast (Reasoning)
xAI———
35.1
92 tok/s9.0s
DeepSeek V3.1 Terminus (Reasoning)
DeepSeek———
33.9
——
Grok 4.1 Fast (Reasoning)
xAI———
38.6
88 tok/s20.1s
Phi-4
Open
Microsoft$0.07$0.1416K
84.8%
300 tok/s—
DeepSeek V3.1 Terminus (Non-reasoning)
DeepSeek———
28.5
——
MiMo-V2-Flash (Reasoning)
Xiaomi———
39.2
140 tok/s1.6s
DeepSeek R1 (Jan '25)
DeepSeek———
18.8
——
Claude 4 Sonnet (Reasoning)
Anthropic———
38.7
50 tok/s12.5s
Gemini 2.5 Pro Preview (May' 25)
Google———
29.5
——
GLM-4.5 (Reasoning)
Z AI———
26.4
42 tok/s1.2s
Claude 3.7 Sonnet (Reasoning)
Anthropic———
34.7
——
Gemini 2.5 Flash Preview (Sep '25) (Reasoning)
Google———
31.1
——
o1
OpenAI———
30.8
108 tok/s20.4s
Qwen3 VL 235B A22B (Reasoning)
Alibaba———
27.6
38 tok/s1.2s
DeepSeek V3.2 Exp (Non-reasoning)
DeepSeek———
28.4
——
Qwen3 Max (Preview)
Alibaba———
26.1
45 tok/s1.7s
DeepSeek V3.2 (Non-reasoning)
DeepSeek———
32.1
——
Mistral Large
Mistral AI$2$6128K
84%
90 tok/s—
Claude 4 Sonnet (Non-reasoning)
Anthropic———
33
47 tok/s0.8s
Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning)
Google———
25.7
——
Qwen3 235B A22B 2507 (Reasoning)
Alibaba———
29.5
58 tok/s1.2s
K-EXAONE (Reasoning)
LG AI Research———
32.1
——
GPT-5 mini (high)
OpenAI———
41.2
85 tok/s116.6s
Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)
NVIDIA———
15
41 tok/s0.7s
DeepSeek V3.1 (Non-reasoning)
DeepSeek———
28.1
——
Nova 2.0 Pro Preview (medium)
Amazon———
35.7
155 tok/s19.7s
Grok 3 Mini
xAI$0.3$0.5131K
83%
160 tok/s—
GLM-4.6 (Reasoning)
Z AI———
32.5
38 tok/s0.8s
ERNIE 5.0 Thinking Preview
Baidu———
29.1
——
Gemini 2.5 Flash (Reasoning)
Google———
27
246 tok/s14.3s
Qwen3 235B A22B 2507 Instruct
Alibaba———
25
66 tok/s1.1s
Qwen3 235B A22B (Reasoning)
Alibaba———
19.8
61 tok/s1.2s
GPT-5 mini (medium)
OpenAI———
38.9
98 tok/s20.5s
Hermes 4 - Llama-3.1 405B (Reasoning)
Nous Research———
18.6
——
Grok 3 mini Reasoning (high)
xAI———
32.1
191 tok/s0.4s
MiniMax-M2
MiniMax———
36.1
77 tok/s1.2s
Kimi K2 0905
Kimi———
30.9
16 tok/s1.9s
GLM-4.5-Air
Z AI———
23.2
58 tok/s1.5s
Qwen3 Max Thinking (Preview)
Alibaba———
32.5
42 tok/s1.8s
Kimi K2
Kimi———
26.3
35 tok/s1.1s
GPT-5.1 Codex mini (high)
OpenAI———
38.6
198 tok/s4.3s
Qwen3 Next 80B A3B (Reasoning)
Alibaba———
26.7
163 tok/s1.1s
GPT-5 (ChatGPT)
OpenAI———
21.8
178 tok/s0.6s
Seed-OSS-36B-Instruct
ByteDance Seed———
25.2
37 tok/s1.7s
Ling-1T
InclusionAI———
19
——
Magistral Medium 1.2
Mistral———
27.1
44 tok/s0.5s
GPT-4o mini
OpenAI$0.15$0.6128K
82%
200 tok/s—
Gemini 3 Flash
Google DeepMind$0.075$0.31M
82%
250 tok/s—
Qwen3 Next 80B A3B Instruct
Alibaba———
20.1
166 tok/s1.1s
DeepSeek V3 0324
DeepSeek———
22.3
——
Qwen3 VL 32B (Reasoning)
Alibaba———
24.7
96 tok/s1.3s
Nova 2.0 Lite (high)
Amazon———
34.5
153 tok/s18.2s
Nova 2.0 Pro Preview (low)
Amazon———
31.9
162 tok/s8.4s
INTELLECT-3
Prime Intellect———
22.2
——
EXAONE 4.0 32B (Reasoning)
LG AI Research———
16.7
——
MiniMax M1 80k
MiniMax———
24.4
——
Qwen3 VL 235B A22B Instruct
Alibaba———
20.8
51 tok/s1.1s
Nova 2.0 Omni (medium)
Amazon———
28
——
Gemini 2.5 Flash (Non-reasoning)
Google———
20.6
215 tok/s0.5s
Llama Nemotron Super 49B v1.5 (Reasoning)
NVIDIA———
18.7
51 tok/s0.3s
Qwen3 30B A3B 2507 (Reasoning)
Alibaba———
22.4
148 tok/s1.1s
Hermes 4 - Llama-3.1 70B (Reasoning)
Nous Research———
16
——
Qwen3 VL 30B A3B (Reasoning)
Alibaba———
19.7
125 tok/s1.1s
gpt-oss-120B (high)
OpenAI———
33.3
224 tok/s0.5s
Gemini 2.0 Pro Experimental (Feb '25)
Google———
18.1
——
GPT-5.2 (Non-reasoning)
OpenAI———
33.6
68 tok/s0.7s
MiniMax M1 40k
MiniMax———
20.9
——
Llama 4 Maverick
Meta———
18.4
113 tok/s0.6s
KAT-Coder-Pro V1
KwaiKAT———
36
118 tok/s1.4s
Mi:dm K 2.5 Pro Preview
Korea Telecom———
81%
——
Nova 2.0 Lite (medium)
Amazon———
29.7
153 tok/s16.9s
Ring-1T
InclusionAI———
22.8
——
Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning)
Google———
21.6
——
Mistral Large 3
Mistral———
22.8
55 tok/s0.6s
K-EXAONE (Non-reasoning)
LG AI Research———
23.4
——
Mi:dm K 2.5 Pro
Korea Telecom———
23.1
——
GPT-5 (minimal)
OpenAI———
23.9
74 tok/s1.0s
Solar Pro 2 (Reasoning)
Upstage———
14.9
——
DeepSeek R1 Distill Llama 70B
DeepSeek———
16
44 tok/s0.3s
Claude 3.7 Sonnet (Non-reasoning)
Anthropic———
30.8
——
Gemini 2.0 Flash Thinking Experimental (Jan '25)
Google———
19.6
——
Claude 4.5 Haiku (Non-reasoning)
Anthropic———
31.1
100 tok/s0.6s
Gemini 2.5 Flash Preview (Reasoning)
Google———
24.3
——
GPT-5.1 (Non-reasoning)
OpenAI———
27.4
167 tok/s0.6s
o3-mini (high)
OpenAI———
25.2
164 tok/s27.4s
Nova 2.0 Omni (low)
Amazon———
23.2
——
Qwen3 32B (Reasoning)
Alibaba———
16.5
104 tok/s1.0s
Motif-2-12.7B-Reasoning
Motif Technologies———
19.1
——
GPT-4o (March 2025, chatgpt-4o-latest)
OpenAI———
18.6
——
GLM-4.6V (Reasoning)
Z AI———
23.4
30 tok/s1.4s
Gemini 2.5 Flash-Lite Preview (Sep '25) (Non-reasoning)
Google———
19.4
——
Apriel-v1.6-15B-Thinker
ServiceNow———
27.6
——
Qwen3 Coder 480B A35B Instruct
Alibaba———
24.8
66 tok/s1.7s
Nova 2.0 Lite (low)
Amazon———
24.6
157 tok/s6.4s
Grok Code Fast 1
xAI———
28.7
133 tok/s5.7s
Qwen3 VL 32B Instruct
Alibaba———
17.2
72 tok/s1.1s
NVIDIA Nemotron 3 Nano 30B A3B (Reasoning)
NVIDIA———
24.3
137 tok/s1.2s
Llama 3.3 Nemotron Super 49B v1 (Reasoning)
NVIDIA———
18.5
——
GLM-4.5V (Reasoning)
Z AI———
15.1
47 tok/s1.2s
GLM-4.7 (Non-reasoning)
Z AI———
34.2
107 tok/s0.6s
K2-V2 (high)
MBZUAI Institute of Foundation Models———
20.6
——
HyperCLOVA X SEED Think (32B)
Naver———
23.7
——
Qwen3 Omni 30B A3B (Reasoning)
Alibaba———
15.6
97 tok/s1.1s
Ring-flash-2.0
InclusionAI———
14
86 tok/s1.5s
o3-mini
OpenAI———
25.9
162 tok/s6.6s
GPT-4.1 mini
OpenAI———
22.9
79 tok/s0.5s
Qwen3 30B A3B 2507 Instruct
Alibaba———
15
114 tok/s1.1s
GPT-5 nano (high)
OpenAI———
26.8
148 tok/s109.5s
GLM-4.6 (Non-reasoning)
Z AI———
30.2
52 tok/s1.1s
Gemini 2.0 Flash (experimental)
Google———
16.8
——
Command R+
Cohere$2.5$10128K
78%
80 tok/s—
Qwen3 30B A3B (Reasoning)
Alibaba———
15.3
71 tok/s1.3s
Ling-flash-2.0
InclusionAI———
15.7
84 tok/s1.4s
Gemini 2.0 Flash (Feb '25)
Google———
18.5
——
gpt-oss-120B (low)
OpenAI———
24.5
220 tok/s0.5s
GPT-5 mini (minimal)
OpenAI———
20.7
87 tok/s0.8s
ERNIE 4.5 300B A47B
Baidu———
15
23 tok/s1.5s
Gemini 2.5 Flash Preview (Non-reasoning)
Google———
17.8
——
GPT-4o (ChatGPT)
OpenAI———
14.1
——
Qwen3 14B (Reasoning)
Alibaba———
16.2
64 tok/s1.0s
Apriel-v1.5-15B-Thinker
ServiceNow———
28.3
——
Claude 3.5 Sonnet (Oct '24)
Anthropic———
15.9
——
EXAONE 4.0 32B (Non-reasoning)
LG AI Research———
11.7
——
Solar Pro 2 (Preview) (Reasoning)
Upstage———
18.8
——
GPT-5 nano (medium)
OpenAI———
25.9
144 tok/s55.2s
Magistral Small 1.2
Mistral———
18.2
109 tok/s0.4s
Nova 2.0 Pro Preview (Non-reasoning)
Amazon———
23.1
155 tok/s0.7s
Olmo 3.1 32B Think
Allen Institute for AI———
13.9
——
Claude 4.5 Haiku (Reasoning)
Anthropic———
37.1
139 tok/s16.7s
Devstral 2
Mistral———
22
64 tok/s0.6s
QwQ 32B
Alibaba———
19.7
31 tok/s0.5s
Mistral Medium 3
Mistral———
18.8
48 tok/s0.5s
Sonar Pro
Perplexity———
15.2
——
K2-V2 (medium)
MBZUAI Institute of Foundation Models———
18.7
——
Qwen2.5 Max
Alibaba———
16.3
49 tok/s1.2s
Qwen3 VL 30B A3B Instruct
Alibaba———
16.1
123 tok/s1.0s
Gemini 2.5 Flash-Lite (Reasoning)
Google———
17.6
302 tok/s24.7s
NVIDIA Nemotron Nano 12B v2 VL (Reasoning)
NVIDIA———
14.9
——
Qwen3 235B A22B (Non-reasoning)
Alibaba———
17
62 tok/s1.1s
Olmo 3 32B Think
Allen Institute for AI———
12.1
——
Claude Haiku 4.5
Anthropic$0.8$4200K
75.2%
250 tok/s—
Magistral Small 1
Mistral———
16.8
——
Claude 3.5 Sonnet (June '24)
Anthropic———
14.2
——
Magistral Medium 1
Mistral———
18.8
——
Llama 4 Scout
Meta———
13.5
134 tok/s0.6s
Solar Pro 2 (Non-reasoning)
Upstage———
13.6
——
Gemini 1.5 Pro (Sep '24)
Google———
16
——
Gemma 3 27B
Open
Google DeepMindFreeFree128K
75%
120 tok/s—
GLM-4.6V (Non-reasoning)
Z AI———
17.1
34 tok/s4.8s
GLM-4.5V (Non-reasoning)
Z AI———
12.7
49 tok/s29.7s
gpt-oss-20B (high)
OpenAI———
24.5
253 tok/s0.4s
Qwen3 VL 8B (Reasoning)
Alibaba———
16.7
131 tok/s1.1s
NVIDIA Nemotron Nano 9B V2 (Non-reasoning)
NVIDIA———
13.2
141 tok/s0.5s
NVIDIA Nemotron Nano 9B V2 (Reasoning)
NVIDIA———
14.8
125 tok/s0.2s
MiMo-V2-Flash (Non-reasoning)
Xiaomi———
30.4
137 tok/s1.2s
DeepSeek R1 Distill Qwen 32B
DeepSeek———
17.2
——
Nova 2.0 Lite (Non-reasoning)
Amazon———
18
229 tok/s0.8s
DeepSeek R1 0528 Qwen3 8B
DeepSeek———
16.4
——
DeepSeek R1 Distill Qwen 14B
DeepSeek———
15.8
——
Grok 4.1 Fast (Non-reasoning)
xAI———
23.6
75 tok/s0.4s
Qwen3 8B (Reasoning)
Alibaba———
13.2
91 tok/s1.0s
Qwen3 4B 2507 (Reasoning)
Alibaba———
18.2
——
o1-mini
OpenAI———
20.4
——
GPT-4o (May '24)
OpenAI———
14.5
132 tok/s0.6s
DBRX
Open
Databricks$0.75$2.2533K
73.7%
100 tok/s—
Solar Pro 2 (Preview) (Non-reasoning)
Upstage———
16
——
Qwen3 Omni 30B A3B Instruct
Alibaba———
10.7
108 tok/s0.9s
Llama 3.2 11B Vision
Open
Meta AI$0.18$0.18128K
73%
150 tok/s—
Falcon-H1R-7B
TII UAE———
15.8
——
Grok 4 Fast (Non-reasoning)
xAI———
23.1
85 tok/s0.4s
Llama 3.1 Instruct 405B
Meta———
17.4
66 tok/s0.6s
Hermes 4 - Llama-3.1 405B (Non-reasoning)
Nous Research———
17.6
35 tok/s0.8s
Qwen3 32B (Non-reasoning)
Alibaba———
14.5
104 tok/s1.1s
Nova Premier
Amazon———
19
28 tok/s1.5s
Gemini 2.5 Flash-Lite (Non-reasoning)
Google———
12.7
268 tok/s1.7s
Gemini 2.0 Flash-Lite (Feb '25)
Google———
14.7
——
Gemini 3.1 Flash-Lite
Google DeepMind$0.01$0.041M
72%
500 tok/s—
Command R
Cohere$0.15$0.6128K
72%
150 tok/s—
Nova 2.0 Omni (Non-reasoning)
Amazon———
16.6
——
Llama 3.1 Tulu3 405B
Allen Institute for AI———
14.1
——
Mistral Small
Mistral AI$0.1$0.332K
72%
200 tok/s—
Qwen2.5 Instruct 72B
Alibaba———
15.6
55 tok/s1.3s
gpt-oss-20B (low)
OpenAI———
20.8
251 tok/s0.4s
K2-V2 (low)
MBZUAI Institute of Foundation Models———
14.4
——
Qwen3 30B A3B (Non-reasoning)
Alibaba———
12.5
68 tok/s1.1s
Command A
Cohere———
13.5
36 tok/s0.6s
Devstral Medium
Mistral———
18.7
69 tok/s0.5s
Llama 3.3 Instruct 70B
Meta———
14.5
93 tok/s0.6s
Qwen3 Coder 30B A3B Instruct
Alibaba———
20
111 tok/s1.5s
Grok 2 (Dec '24)
xAI———
13.9
——
Falcon 180B
Open
TIIFreeFree4K
70.4%
20 tok/s—
Mistral Large 2 (Nov '24)
Mistral———
15.1
31 tok/s0.6s
Qwen3 4B (Reasoning)
Alibaba———
14.2
104 tok/s1.1s
Qwen2.5 Instruct 32B
Alibaba———
13.2
——
Qwen3 VL 4B (Reasoning)
Alibaba———
13.7
——
Pixtral Large
Mistral———
14
55 tok/s0.6s
Claude 3 Opus
Anthropic———
18
——
Sarvam M (Reasoning)
Sarvam———
8.4
141 tok/s1.2s
Grok Beta
xAI———
13.3
——
Llama 3.3 Nemotron Super 49B v1 (Non-reasoning)
NVIDIA———
14.3
——
Sonar
Perplexity———
15.5
——
Qwen3 VL 8B Instruct
Alibaba———
14.3
144 tok/s0.9s
GPT-4 Turbo
OpenAI———
13.7
37 tok/s1.0s
Llama Nemotron Super 49B v1.5 (Non-reasoning)
NVIDIA———
14.6
52 tok/s0.3s
Llama 3.1 Nemotron Instruct 70B
NVIDIA———
13.4
284 tok/s0.3s
Nova Pro
Amazon———
13.5
——
Ministral 3 14B
Mistral———
16
149 tok/s0.4s
Mistral Medium 3.1
Mistral———
21.3
85 tok/s0.5s
Devstral Small 2
Mistral———
19.5
66 tok/s0.5s
Mistral Small 3.2
Mistral———
15.1
152 tok/s0.4s
Qwen3 14B (Non-reasoning)
Alibaba———
12.8
64 tok/s1.0s
Llama 3.1 Instruct 70B
Meta———
12.5
34 tok/s0.6s
Gemini 1.5 Flash (Sep '24)
Google———
13.8
——
Mistral Large 2 (Jul '24)
Mistral———
13
——
Llama 3.2 Instruct 90B (Vision)
Meta———
11.9
48 tok/s0.6s
Reka Flash 3
Reka AI———
9.5
92 tok/s1.6s
Ling-mini-2.0
InclusionAI———
9.2
——
Qwen3 4B 2507 Instruct
Alibaba———
12.9
——
Gemini 1.5 Pro (May '24)
Google———
12
——
GPT-4.1 nano
OpenAI———
13
126 tok/s0.4s
Mistral Small 3.1
Mistral———
14.5
137 tok/s0.5s
Olmo 3 7B Think
Allen Institute for AI———
9.4
——
Hermes 4 - Llama-3.1 70B (Non-reasoning)
Nous Research———
12.6
76 tok/s0.6s
QwQ 32B-Preview
Alibaba———
15.2
——
Mistral Small 3
Mistral———
12.7
138 tok/s0.5s
NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning)
NVIDIA———
10.1
235 tok/s0.7s
Qwen3 8B (Non-reasoning)
Alibaba———
10.6
84 tok/s1.0s
Ministral 3 8B
Mistral———
14.8
166 tok/s0.3s
Qwen2.5 Coder Instruct 32B
Alibaba———
12.9
——
Devstral Small (May '25)
Mistral———
18
——
Claude 3.5 Haiku
Anthropic———
18.7
——
Qwen2.5 Turbo
Alibaba———
12
68 tok/s1.2s
Qwen3 VL 4B Instruct
Alibaba———
9.6
——
Devstral Small (Jul '25)
Mistral———
15.2
196 tok/s0.4s
Granite 4.0 H Small
IBM———
10.8
286 tok/s8.8s
Qwen2 Instruct 72B
Alibaba———
11.7
——
Mistral Saba
Mistral———
12.1
——
Gemma 3 12B Instruct
Google———
8.8
——
Qwen3 4B (Non-reasoning)
Alibaba———
12.5
105 tok/s1.0s
Nova Lite
Amazon———
12.7
200 tok/s0.7s
Kimi Linear 48B A3B Instruct
Kimi———
14.4
——
Exaone 4.0 1.2B (Reasoning)
LG AI Research———
8.3
——
Claude 3 Sonnet
Anthropic———
10.3
——
Jamba Reasoning 3B
AI21 Labs———
9.6
——
NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning)
NVIDIA———
13.2
81 tok/s0.3s
DeepHermes 3 - Mistral 24B Preview (Non-reasoning)
Nous Research———
10.9
——
Jamba 1.7 Large
AI21 Labs———
10.9
50 tok/s1.0s
Llama 3 Instruct 70B
Meta———
8.9
46 tok/s0.7s
Qwen3 1.7B (Reasoning)
Alibaba———
8
138 tok/s0.9s
Gemini 1.5 Flash-8B
Google———
11.1
——
Gemini 1.5 Flash (May '24)
Google———
10.5
——
Hermes 3 - Llama-3.1 70B
Nous Research———
10.6
30 tok/s0.4s
Jamba 1.5 Large
AI21 Labs———
10.7
——
Jamba 1.6 Large
AI21 Labs———
10.6
56 tok/s0.9s
Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning)
NVIDIA———
14.4
——
GPT-5 nano (minimal)
OpenAI———
13.8
142 tok/s0.8s
DeepSeek R1 Distill Llama 8B
DeepSeek———
12.1
——
Mixtral 8x22B Instruct
Mistral———
9.8
——
Nova Micro
Amazon———
10.3
305 tok/s0.6s
Ministral 3 3B
Mistral———
11.2
297 tok/s0.3s
Olmo 3 7B Instruct
Allen Institute for AI———
8.2
——
LFM2 8B A1B
Liquid AI———
7
——
OLMo 2 32B
Allen Institute for AI———
10.6
——
Exaone 4.0 1.2B (Non-reasoning)
LG AI Research———
8.1
——
Claude 2.1
Anthropic———
9.3
——
Claude 2.0
Anthropic———
9.1
——
Phi-4 Multimodal Instruct
Microsoft———
10
16 tok/s1.8s
Mistral Medium
Mistral———
9
83 tok/s0.6s
Gemma 3n E4B Instruct
Google———
6.4
30 tok/s0.7s
Llama 3.1 Instruct 8B
Meta———
11.8
203 tok/s0.5s
Gemma 3n E4B Instruct Preview (May '25)
Google———
10.1
——
Granite 3.3 8B (Non-reasoning)
IBM———
7
392 tok/s21.9s
Phi-4 Mini Instruct
Microsoft———
8.4
45 tok/s0.3s
Qwen2.5 Coder Instruct 7B
Alibaba———
10
——
GPT-3.5 Turbo
OpenAI———
9
93 tok/s0.4s
Llama 3.2 Instruct 11B (Vision)
Meta———
8.7
86 tok/s0.4s
Granite 4.0 Micro
IBM———
7.7
——
Phi-3 Mini Instruct 3.8B
Microsoft———
10.1
——
Claude Instant
Anthropic———
7.4
——
Gemini 1.0 Pro
Google———
8.5
——
Command-R+ (Apr '24)
Cohere———
8.3
——
LFM 40B
Liquid AI———
8.8
——
DeepSeek Coder V2 Lite Instruct
DeepSeek———
8.5
——
Gemma 3 4B Instruct
Google———
6.3
——
Mistral Small (Feb '24)
Mistral———
9
143 tok/s0.6s
Llama 3 Instruct 8B
Meta———
6.4
82 tok/s0.5s
Qwen3 1.7B (Non-reasoning)
Alibaba———
6.8
139 tok/s1.0s
Llama 2 Chat 13B
Meta———
8.4
——
Llama 2 Chat 70B
Meta———
8.4
——
Mixtral 8x7B Instruct
Mistral———
7.7
——
Jamba 1.7 Mini
AI21 Labs———
8.1
——
Gemma 3n E2B Instruct
Google———
4.8
——
Jamba 1.5 Mini
AI21 Labs———
8
——
Jamba 1.6 Mini
AI21 Labs———
7.9
180 tok/s0.7s
Molmo 7B-D
Allen Institute for AI———
9.2
——
DeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning)
Nous Research———
7.6
——
Qwen3 0.6B (Reasoning)
Alibaba———
6.5
225 tok/s0.9s
Llama 3.2 Instruct 3B
Meta———
9.7
52 tok/s0.7s
Command-R (Mar '24)
Cohere———
7.4
——
Granite 4.0 1B
IBM———
7.3
——
OpenChat 3.5 (1210)
OpenChat———
8.3
——
LFM2 2.6B
Liquid AI———
8
——
OLMo 2 7B
Allen Institute for AI———
9.3
——
Granite 4.0 H 1B
IBM———
8
——
DeepSeek R1 Distill Qwen 1.5B
DeepSeek———
9.1
——
LFM2 1.2B
Liquid AI———
6.3
——
Mistral 7B Instruct
Mistral———
7.4
163 tok/s0.4s
Qwen3 0.6B (Non-reasoning)
Alibaba———
5.7
222 tok/s0.9s
Llama 3.2 Instruct 1B
Meta———
6.3
98 tok/s0.6s
Llama 2 Chat 7B
Meta———
9.7
98 tok/s10.3s
Gemma 3 1B Instruct
Google———
5.5
——
Granite 4.0 H 350M
IBM———
5.4
——
Granite 4.0 350M
IBM———
6.1
——
Gemma 3 270M
Google———
7.7
——
Kimi K2.5 (Reasoning)
Kimi———
46.8
42 tok/s1.1s
Molmo2-8B
Allen Institute for AI———
7.3
——
LFM2 24B A2B
Liquid AI———
10.5
158 tok/s0.2s
Solar Pro 3
Upstage———
25.9
——
LFM2.5-1.2B-Thinking
Liquid AI———
8.1
——
Granite 4.1 3B
IBM———
8.5
——
DeepSeek V4 Pro (Reasoning, Max Effort)
DeepSeek———
51.5
31 tok/s1.0s
Qwen1.5 Chat 110B
Alibaba———
9.5
——
GPT-5.5 (high)
OpenAI———
58.9
55 tok/s19.6s
Grok 4.3
xAI———
53.2
92 tok/s9.6s
GPT-5.5 (low)
OpenAI———
50.8
55 tok/s2.0s
Kimi K2.6 (Non-reasoning)
Kimi———
43
40 tok/s1.3s
LFM2.5-VL-1.6B
Liquid AI———
6.2
——
LFM2.5-1.2B-Instruct
Liquid AI———
8
——
NVIDIA Nemotron 3 Super 120B A12B (Reasoning)
NVIDIA———
36
159 tok/s0.9s
Solar Open 100B (Reasoning)
Upstage———
21.7
——
MiniMax-M2.7
MiniMax———
49.6
47 tok/s1.2s
R1 1776
Perplexity———
12
——
Qwen Chat 72B
Alibaba———
8.8
——
Grok 4.20 0309 v2 (Non-reasoning)
xAI———
29
89 tok/s0.5s
Grok 4.20 0309 v2 (Reasoning)
xAI———
49.3
97 tok/s33.2s
Codestral
Mistral AI$0.3$0.932K—180 tok/s—
Claude Sonnet 4.6 (Non-reasoning, Low Effort)
Anthropic———
42.6
50 tok/s0.9s
Gemma 4 E2B (Reasoning)
Google———
15.2
——
Gemini 3.1 Pro Preview
Google———
57.2
130 tok/s22.5s
GPT-5.5 (xhigh)
OpenAI———
60.2
67 tok/s62.9s
Gemma 4 E4B (Reasoning)
Google———
18.8
44 tok/s1.0s
Mistral Small 4 (Reasoning)
Mistral———
27.8
162 tok/s0.6s
Gemma 4 E4B (Non-reasoning)
Google———
14.8
55 tok/s0.5s
Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort)
Anthropic———
51.7
79 tok/s55.2s
Mistral Medium 3.5
Mistral———
39.2
169 tok/s0.6s
GPT-5.5 (Non-reasoning)
OpenAI———
40.9
56 tok/s1.0s
MiMo-V2.5
Xiaomi———
49
96 tok/s1.6s
Claude Opus 4.6 (Adaptive Reasoning, Max Effort)
Anthropic———
53
52 tok/s16.5s
Gemma 4 E2B (Non-reasoning)
Google———
12.1
——
Claude Opus 4.7 (Adaptive Reasoning, Max Effort)
Anthropic———
57.3
60 tok/s27.0s
GPT-5.4 (xhigh)
OpenAI———
56.8
94 tok/s179.1s
GLM-5.1 (Non-reasoning)
Z AI———
43.8
45 tok/s1.1s
Qwen3.5 9B (Non-reasoning)
Alibaba———
27.3
——
GPT-5.4 Pro (xhigh)
OpenAI——————
Gemini 3.1 Flash-Lite Preview
Google———
33.5
350 tok/s5.1s
Mistral Small 4 (Non-reasoning)
Mistral———
18.6
142 tok/s0.5s
GPT-5.4 nano (xhigh)
OpenAI———
44
169 tok/s3.7s
Grok-1
xAI———
11.7
——
Gemma 4 26B A4B (Non-reasoning)
Google———
27.1
——
Qwen Chat 14B
Alibaba———
7.4
——
Qwen3.6 27B (Reasoning)
Alibaba———
45.8
64 tok/s1.5s
JT-MINI
China Mobile———
25.4
——
Muse Spark
Meta———
52.1
——
Gemma 4 26B A4B (Reasoning)
Google———
31.2
——
Gemini 3 Deep Think
Google——————
EXAONE 4.5 33B
LG AI Research———
30.2
——
DeepSeek-V2-Chat
DeepSeek———
9.1
——
DeepSeek V4 Pro (Reasoning, High Effort)
DeepSeek———
49.8
31 tok/s1.1s
Grok 4.3 (Non-reasoning)
xAI———
31
79 tok/s0.6s
GPT-5.4 mini (xhigh)
OpenAI———
48.9
185 tok/s5.7s
Claude Opus 4.7 (Non-reasoning, High Effort)
Anthropic———
51.8
47 tok/s1.6s
DeepSeek V4 Flash (Reasoning, High Effort)
DeepSeek———
44.9
——
Kimi K2.6
Kimi———
53.9
41 tok/s1.4s
Qwen3.6 35B A3B (Reasoning)
Alibaba———
43.5
189 tok/s1.5s
Ling-2.6-1T
InclusionAI———
33.6
——
Qwen3.6 35B A3B (Non-reasoning)
Alibaba———
31.5
182 tok/s1.4s
DeepSeek V4 Flash (Non-reasoning)
DeepSeek———
36.5
66 tok/s0.8s
DeepSeek V4 Flash (Reasoning, Max Effort)
DeepSeek———
46.5
65 tok/s0.8s
Hy3-preview (Reasoning)
Tencent———
41.9
115 tok/s2.3s
GPT-5.4 nano (medium)
OpenAI———
38.1
168 tok/s3.6s
Qwen3.6 27B (Non-reasoning)
Alibaba———
37.1
61 tok/s1.4s
Ling 2.6 Flash
InclusionAI———
26.2
211 tok/s1.2s
GPT-5.4 (Non-reasoning)
OpenAI———
35.4
68 tok/s0.7s
GPT-5.4 mini (medium)
OpenAI———
37.7
184 tok/s5.4s
GPT-5.4 nano (Non-Reasoning)
OpenAI———
24.4
167 tok/s0.5s
GPT-5.4 mini (Non-Reasoning)
OpenAI———
23.3
170 tok/s0.5s
Arctic Instruct
Snowflake———
8.8
——
MiMo-V2.5-Pro
Xiaomi———
53.8
60 tok/s1.9s
GPT-5.4 (low)
OpenAI———
47.9
64 tok/s1.6s
Granite 4.1 30B
IBM———
14.7
——
Gemma 4 31B (Reasoning)
Google———
39.2
35 tok/s1.0s
GPT-5.3 Codex (xhigh)
OpenAI———
53.6
96 tok/s67.2s
Gemma 4 31B (Non-reasoning)
Google———
32.3
——
EXAONE 4.5 33B (Non-reasoning)
LG AI Research——————
GPT-5.5 Pro (xhigh)
OpenAI——————
DeepSeek V4 Pro (Non-reasoning)
DeepSeek———
39.3
31 tok/s1.1s
MiMo-V2.5-Pro (Non-reasoning)
Xiaomi———
35.6
59 tok/s1.9s
DeepSeek-V2.5
DeepSeek———
12.3
——
GPT-4o Realtime (Dec '24)
OpenAI——————
GPT-4
OpenAI———
12.8
30 tok/s1.1s
GPT-4o mini Realtime (Dec '24)
OpenAI——————
Gemini 2.0 Flash-Lite (Preview)
Google———
14.5
——
GPT-4.5 (Preview)
OpenAI———
20
——
o1-pro
OpenAI———
25.8
——
o3-pro
OpenAI———
40.7
21 tok/s84.7s
Gemini 1.0 Ultra
Google———
10.1
——
GPT-4o (Aug '24)
OpenAI———
18.6
106 tok/s0.5s
PALM-2
Google———
8.6
——
Gemini 2.0 Flash Thinking Experimental (Dec '24)
Google———
12.3
——
GPT-5.2 Codex (xhigh)
OpenAI———
49
103 tok/s1.3s
Qwen3.6 Max Preview
Alibaba———
51.8
38 tok/s2.0s
GPT-3.5 Turbo (0613)
OpenAI——————
Qwen3 Max Thinking
Alibaba———
39.9
46 tok/s1.5s
Qwen3.5 122B A10B (Non-reasoning)
Alibaba———
35.9
163 tok/s1.1s
Qwen3.5 27B (Reasoning)
Alibaba———
42.1
92 tok/s1.4s
Qwen3.5 2B (Reasoning)
Alibaba———
16.3
——
Qwen3.6 Plus
Alibaba———
50
53 tok/s1.7s
Qwen3.5 27B (Non-reasoning)
Alibaba———
37.2
94 tok/s1.4s
Qwen3.5 9B (Reasoning)
Alibaba———
32.4
71 tok/s0.4s
Qwen3 Coder Next
Alibaba———
28.3
127 tok/s1.0s
Claude 3 Haiku
Anthropic———
12.3
——
Qwen3.5 4B (Reasoning)
Alibaba———
27.1
199 tok/s0.2s
Qwen3.5 Omni Flash
Alibaba———
25.9
243 tok/s0.9s
Claude 4.1 Opus (Non-reasoning)
Anthropic———
36
36 tok/s1.7s
Qwen3.5 Omni Plus
Alibaba———
38.6
56 tok/s1.3s
Qwen3.5 35B A3B (Reasoning)
Alibaba———
37.1
118 tok/s1.1s
Qwen3.5 35B A3B (Non-reasoning)
Alibaba———
30.7
134 tok/s1.2s
Qwen3.5 397B A17B (Non-reasoning)
Alibaba———
40.1
53 tok/s1.8s
Qwen3.5 122B A10B (Reasoning)
Alibaba———
41.6
162 tok/s1.1s
GLM-5 (Reasoning)
Z AI———
49.8
68 tok/s0.7s
Apertus 8B Instruct
Swiss AI Initiative———
5.9
——
Qwen3.5 0.8B (Non-reasoning)
Alibaba———
9.9
356 tok/s0.2s
Tiny Aya Global
Cohere———
4.7
124 tok/s0.3s
GLM 5V Turbo (Reasoning)
Z AI———
42.9
——
DeepSeek LLM 67B Chat (V1)
DeepSeek———
8.4
——
GLM-5-Turbo
Z AI———
46.8
——
GLM-5 (Non-reasoning)
Z AI———
40.6
58 tok/s1.3s
Nanbeige4.1-3B
Nanbeige———
16.1
——
Tri-21B-think Preview
Trillion Labs———
20
——
Trinity Large Thinking
Arcee AI———
31.9
127 tok/s0.6s
Qwen3.5 2B (Non-reasoning)
Alibaba———
14.7
343 tok/s0.2s
Qwen3.5 397B A17B (Reasoning)
Alibaba———
45
52 tok/s1.7s
Tri-21B-Think
Trillion Labs———
18.6
——
Qwen3.5 0.8B (Reasoning)
Alibaba———
10.5
——
LongCat Flash Lite
LongCat———
23.9
122 tok/s4.6s
Qwen3.5 4B (Non-reasoning)
Alibaba———
22.6
201 tok/s0.2s
DeepSeek-V2.5 (Dec '24)
DeepSeek———
12.5
——
DeepSeek-Coder-V2
DeepSeek———
10.6
——
Apertus 70B Instruct
Swiss AI Initiative———
7.7
——
GLM-5.1 (Reasoning)
Z AI———
51.4
56 tok/s0.9s
Nemotron 3 Nano Omni 30B A3B Reasoning
NVIDIA———
21.4
307 tok/s0.6s
Sarvam 105B (high)
Sarvam———
18.2
158 tok/s1.3s
Grok 4.20 0309 (Non-reasoning)
xAI———
29.7
88 tok/s0.6s
KAT Coder Pro V2
KwaiKAT———
43.8
117 tok/s1.7s
MiMo-V2-Omni-0327
Xiaomi———
44.9
109 tok/s1.3s
MiMo-V2-Omni
Xiaomi———
43.4
107 tok/s1.8s
MiniMax-M2.5
MiniMax———
41.9
86 tok/s1.2s
MiMo-V2-Pro
Xiaomi———
49.2
62 tok/s2.1s
o1-preview
OpenAI———
23.7
——
Sarvam 30B (high)
Sarvam———
12.3
170 tok/s1.2s
K2 Think V2
MBZUAI Institute of Foundation Models———
24.1
——
Sonar Reasoning Pro
Perplexity———
24.6
——
Granite 4.1 8B
IBM———
12.4
121 tok/s0.5s
Sonar Reasoning
Perplexity———
17.9
——
Hy3-preview (Non-reasoning)
Tencent———
33.7
114 tok/s2.6s
MiMo-V2-Flash (Feb 2026)
Xiaomi———
41.5
138 tok/s1.4s
Solar Mini
Upstage———
11.9
66 tok/s1.0s
Grok 3 Reasoning Beta
xAI———
21.6
——
Grok 4.20 0309 (Reasoning)
xAI———
48.5
86 tok/s34.1s
Step 3.5 Flash
StepFun———
37.8
143 tok/s0.9s
Step 3.5 Flash 2603
StepFun———
38.5
159 tok/s0.9s
Kimi K2.5 (Non-reasoning)
Kimi———
37.3
49 tok/s1.3s
GLM-4.7-Flash (Reasoning)
Z AI———
30.1
86 tok/s0.9s
Reka Flash (Sep '24)
Reka AI———
12
84 tok/s1.8s
Llama 65B
Meta———
7.4
——
GLM-4.7-Flash (Non-reasoning)
Z AI———
22.1
153 tok/s1.0s
Olmo 3.1 32B Instruct
Allen Institute for AI———
12.2
——
NVIDIA Nemotron 3 Nano 4B
NVIDIA———
14.7
——
Step3 VL 10B
StepFun———
15.4
——
Nemotron Cascade 2 30B A3B
NVIDIA———
28.4
——
GPT-5.5 (medium)
OpenAI———
56.7
57 tok/s6.1s
Mercury 2
Inception———
32.8
812 tok/s4.1s
∞AI

Everything AI. In one place.

Platform

ToolsModelsJobsHackathonsSubmit

Company

AboutContact

Stay updated

Get weekly AI news in your inbox

© 2026 ∞AI. Built for the AI community.everythingai.tech

Estimate Your Monthly Cost

Enter your expected usage to compare costs across models

e.g. 1,000,000 = ~750,000 words

Usually 30–50% of input volume

6 models selected

ModelInput CostOutput CostTotal/Monthvs Cheapest
Llama 3.3 70B
Meta AI
$0.23$0.46$0.69✓ Best value
DeepSeek R2
DeepSeek
$0.55$1.09$1.652.4× more
GPT-4.1
OpenAI
$2.00$4.00$6.008.7× more
Claude Sonnet 4.6
Anthropic
$3.00$7.50$10.5015.2× more
GPT-4o
OpenAI
$5.00$7.50$12.5018.1× more
Claude Opus 4.6
Anthropic
$15.00$37.50$52.5076.1× more

Prices are approximate and may vary. Check provider documentation for current pricing.