∞AI
ToolsModelsJobsHackathons
SubmitSign In

AI Model Comparison

Compare pricing, benchmarks, and capabilities across 487 AI models

487 models tracked0 open source
AllLanguage ModelsText → ImageText → VideoText → SpeechImage → Video
Type
AllProprietaryOpen Source
Provider
AllAI21 LabsAlibabaAlibaba CloudAllen Institute for AIAmazonAnthropicArcee AIBaiduByteDance SeedCartesiaChina MobileCohereCoquiDatabricksDeep CogitoDeepSeekElevenLabsFish AudioGoogleGoogle DeepMindHume AIIBMInceptionInclusionAIInworldKimiKokoroKorea TelecomKwaiKATLG AI ResearchLMNTLiquid AILongCatMBZUAI Institute of Foundation ModelsMaya ResearchMetaMeta AIMetaVoiceMicrosoftMicrosoft AzureMiniMaxMistralMistral AIMotif TechnologiesMurf AINVIDIANanbeigeNaverNeuphonicNous ResearchOpenAIOpenChatOpenVoicePerplexityPrime IntellectReka AIResemble AIRimeSarvamServiceNowSmallest.aiSnowflakeSpeechifyStepFunStyleTTS Swiss AI InitiativeTIITII UAETrillion LabsUpstageXiaomiZ AIZyphraasyncxAI
Price
AnyFree<$1/M<$5/M<$20/M
Sort
Best BenchmarkCheapest FirstMost ExpensiveLargest ContextFastest
Clear all filters
ModelProviderInput $/1M↕Output $/1M↕Context↕Intelligence↑Speed↕LatencyAPI
DeepSeek R2
★
DeepSeek$0.55$2.19128K
91%
60 tok/s—
GPT-4.1
★
OpenAI$2$81M
90.5%
80 tok/s—
Claude Opus 4.6
★
Anthropic$15$75200K
88.7%
60 tok/s—
GPT-4o
★
OpenAI$5$15128K
87.2%
120 tok/s—
Claude Sonnet 4.6
★
Anthropic$3$15200K
86.8%
100 tok/s—
o3
OpenAI$10$40200K
96.7%
40 tok/s—
o4-mini
OpenAI$1.1$4.4200K
93.4%
100 tok/s—
Gemini 3 Ultra
Google DeepMind$7$211M
90.1%
70 tok/s—
Claude Opus 4.5 (Reasoning)
Anthropic———
49.7
72 tok/s11.7s
Gemini 3 Pro Preview (low)
Google———
41.3
——
Claude Opus 4.5 (Non-reasoning)
Anthropic———
43.1
63 tok/s1.1s
Gemini 3 Flash Preview (Reasoning)
Google———
46.4
195 tok/s5.9s
Claude 4.1 Opus (Reasoning)
Anthropic———
42
42 tok/s8.0s
MiniMax-M2.1
MiniMax———
39.4
59 tok/s2.4s
Claude 4.5 Sonnet (Reasoning)
Anthropic———
43
59 tok/s10.4s
Grok 3
xAI$3$15131K
87.5%
90 tok/s—
Grok 4
xAI———
41.5
64 tok/s7.4s
Gemini 3 Pro
Google DeepMind$3.5$10.51M
87%
100 tok/s—
Claude 4 Opus (Reasoning)
Anthropic———
39
41 tok/s8.0s
GPT-5 (medium)
OpenAI———
42
95 tok/s40.4s
GPT-5 Codex (high)
OpenAI———
44.6
207 tok/s11.4s
Qwen3-Max
Alibaba Cloud$0.4$1.232K
87%
90 tok/s—
GPT-5 (high)
OpenAI———
44.6
86 tok/s99.7s
GPT-5.1 (high)
OpenAI———
47.7
118 tok/s25.1s
GPT-5.2 (xhigh)
OpenAI———
51.3
72 tok/s81.3s
DeepSeek V3.2 (Reasoning)
DeepSeek———
41.7
29 tok/s1.4s
GPT-5 (low)
OpenAI———
39.2
75 tok/s10.3s
GLM-4.7 (Reasoning)
Z AI———
42.1
109 tok/s0.7s
GPT-5.2 (medium)
OpenAI———
46.6
——
Claude 4 Opus (Non-reasoning)
Anthropic———
33
37 tok/s1.4s
GPT-5.1 Codex (high)
OpenAI———
43.1
167 tok/s6.7s
Gemini 2.5 Pro
Google———
34.6
127 tok/s22.0s
Claude 4.5 Sonnet (Non-reasoning)
Anthropic———
37.1
56 tok/s1.2s
DeepSeek V3.2 Speciale
DeepSeek———
29.4
——
Gemini 2.5 Pro Preview (Mar' 25)
Google———
30.3
——
DeepSeek V3.1 (Reasoning)
DeepSeek———
27.7
——
DeepSeek R1 0528 (May '25)
DeepSeek———
27.1
——
Kimi K2 Thinking
Kimi———
40.9
41 tok/s1.1s
Grok 4 Fast (Reasoning)
xAI———
35.1
216 tok/s3.4s
Cogito v2.1 (Reasoning)
Deep Cogito———
85%
57 tok/s0.5s
Grok 4.1 Fast (Reasoning)
xAI———
38.6
142 tok/s9.2s
DeepSeek V3.2 Exp (Reasoning)
DeepSeek———
32.9
30 tok/s1.4s
DeepSeek V3.1 Terminus (Reasoning)
DeepSeek———
33.9
——
Doubao Seed Code
ByteDance Seed———
33.5
——
GLM-4.5 (Reasoning)
Z AI———
26.4
38 tok/s0.9s
Claude 4 Sonnet (Reasoning)
Anthropic———
38.7
59 tok/s8.5s
Claude 4 Sonnet (Non-reasoning)
Anthropic———
33
52 tok/s0.8s
Claude 3.7 Sonnet (Reasoning)
Anthropic———
34.7
——
Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning)
Google———
25.7
——
o1
OpenAI———
30.8
112 tok/s23.6s
Qwen3 VL 235B A22B (Reasoning)
Alibaba———
27.6
45 tok/s1.2s
Qwen3 Max (Preview)
Alibaba———
26.1
47 tok/s1.8s
Qwen3 235B A22B 2507 (Reasoning)
Alibaba———
29.5
51 tok/s1.3s
K-EXAONE (Reasoning)
LG AI Research———
32.1
——
GPT-5 mini (high)
OpenAI———
41.2
74 tok/s91.5s
MiMo-V2-Flash (Reasoning)
Xiaomi———
39.2
123 tok/s1.8s
Mistral Large
Mistral AI$2$6128K
84%
90 tok/s—
DeepSeek R1 (Jan '25)
DeepSeek———
18.8
——
DeepSeek V3.2 (Non-reasoning)
DeepSeek———
32.1
30 tok/s1.3s
DeepSeek V3.1 Terminus (Non-reasoning)
DeepSeek———
28.5
——
DeepSeek V3.2 Exp (Non-reasoning)
DeepSeek———
28.4
31 tok/s1.3s
Gemini 2.5 Flash Preview (Sep '25) (Reasoning)
Google———
31.1
——
Gemini 2.5 Pro Preview (May' 25)
Google———
29.5
——
Grok 3 Mini
xAI$0.3$0.5131K
83%
160 tok/s—
ERNIE 5.0 Thinking Preview
Baidu———
29.1
——
GPT-5 mini (medium)
OpenAI———
38.9
77 tok/s20.0s
DeepSeek V3.1 (Non-reasoning)
DeepSeek———
28.1
——
Nova 2.0 Pro Preview (medium)
Amazon———
35.7
120 tok/s17.9s
Qwen3 235B A22B 2507 Instruct
Alibaba———
25
70 tok/s1.2s
Hermes 4 - Llama-3.1 405B (Reasoning)
Nous Research———
18.6
32 tok/s0.8s
GLM-4.6 (Reasoning)
Z AI———
32.5
36 tok/s0.9s
Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)
NVIDIA———
15
42 tok/s0.7s
Grok 3 mini Reasoning (high)
xAI———
32.1
216 tok/s0.4s
Gemini 2.5 Flash (Reasoning)
Google———
27
205 tok/s13.3s
Qwen3 235B A22B (Reasoning)
Alibaba———
19.8
65 tok/s1.3s
Nova 2.0 Lite (high)
Amazon———
34.5
195 tok/s21.4s
Kimi K2
Kimi———
26.3
35 tok/s1.3s
INTELLECT-3
Prime Intellect———
22.2
——
Qwen3 VL 32B (Reasoning)
Alibaba———
24.7
97 tok/s1.4s
GPT-4o mini
OpenAI$0.15$0.6128K
82%
200 tok/s—
Ling-1T
InclusionAI———
19
——
Magistral Medium 1.2
Mistral———
27.1
95 tok/s0.4s
Gemini 3 Flash
Google DeepMind$0.075$0.31M
82%
250 tok/s—
MiniMax M1 80k
MiniMax———
24.4
——
GPT-5 (ChatGPT)
OpenAI———
21.8
158 tok/s0.6s
MiniMax-M2
MiniMax———
36.1
61 tok/s2.2s
GLM-4.5-Air
Z AI———
23.2
65 tok/s1.3s
GPT-5.1 Codex mini (high)
OpenAI———
38.6
197 tok/s5.9s
EXAONE 4.0 32B (Reasoning)
LG AI Research———
16.7
——
Kimi K2 0905
Kimi———
30.9
22 tok/s2.1s
Qwen3 Next 80B A3B Instruct
Alibaba———
20.1
166 tok/s1.0s
Nova 2.0 Pro Preview (low)
Amazon———
31.9
143 tok/s6.8s
Seed-OSS-36B-Instruct
ByteDance Seed———
25.2
42 tok/s1.8s
Qwen3 VL 235B A22B Instruct
Alibaba———
20.8
57 tok/s1.2s
DeepSeek V3 0324
DeepSeek———
22.3
——
Qwen3 Max Thinking (Preview)
Alibaba———
32.5
43 tok/s1.8s
Qwen3 Next 80B A3B (Reasoning)
Alibaba———
26.7
164 tok/s1.1s
Hermes 4 - Llama-3.1 70B (Reasoning)
Nous Research———
16
62 tok/s0.6s
Mi:dm K 2.5 Pro Preview
Korea Telecom———
81%
——
GPT-5 (minimal)
OpenAI———
23.9
74 tok/s1.1s
Llama 4 Maverick
Meta———
18.4
115 tok/s0.6s
Qwen3 VL 30B A3B (Reasoning)
Alibaba———
19.7
127 tok/s1.0s
Qwen3 30B A3B 2507 (Reasoning)
Alibaba———
22.4
148 tok/s1.1s
gpt-oss-120B (high)
OpenAI———
33.3
215 tok/s0.5s
Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning)
Google———
21.6
——
Mistral Large 3
Mistral———
22.8
56 tok/s0.6s
MiniMax M1 40k
MiniMax———
20.9
——
Nova 2.0 Lite (medium)
Amazon———
29.7
177 tok/s13.8s
Nova 2.0 Omni (medium)
Amazon———
28
——
Solar Pro 2 (Reasoning)
Upstage———
14.9
——
Gemini 2.5 Flash (Non-reasoning)
Google———
20.6
180 tok/s0.5s
Llama Nemotron Super 49B v1.5 (Reasoning)
NVIDIA———
18.7
60 tok/s0.3s
Gemini 2.0 Pro Experimental (Feb '25)
Google———
18.1
——
GPT-5.2 (Non-reasoning)
OpenAI———
33.6
63 tok/s0.8s
K-EXAONE (Non-reasoning)
LG AI Research———
23.4
——
Ring-1T
InclusionAI———
22.8
——
KAT-Coder-Pro V1
KwaiKAT———
36
112 tok/s1.0s
Mi:dm K 2.5 Pro
Korea Telecom———
23.1
——
Nova 2.0 Omni (low)
Amazon———
23.2
——
Qwen3 32B (Reasoning)
Alibaba———
16.5
103 tok/s1.1s
DeepSeek R1 Distill Llama 70B
DeepSeek———
16
41 tok/s0.5s
Motif-2-12.7B-Reasoning
Motif Technologies———
19.1
——
Gemini 2.5 Flash Preview (Reasoning)
Google———
24.3
——
GPT-4o (March 2025, chatgpt-4o-latest)
OpenAI———
18.6
——
o3-mini (high)
OpenAI———
25.2
149 tok/s27.7s
GLM-4.6V (Reasoning)
Z AI———
23.4
27 tok/s1.2s
Claude 3.7 Sonnet (Non-reasoning)
Anthropic———
30.8
——
Claude 4.5 Haiku (Non-reasoning)
Anthropic———
31.1
120 tok/s0.5s
Gemini 2.5 Flash-Lite Preview (Sep '25) (Non-reasoning)
Google———
19.4
——
Gemini 2.0 Flash Thinking Experimental (Jan '25)
Google———
19.6
——
GPT-5.1 (Non-reasoning)
OpenAI———
27.4
108 tok/s0.8s
Qwen3 Coder 480B A35B Instruct
Alibaba———
24.8
65 tok/s1.7s
NVIDIA Nemotron 3 Nano 30B A3B (Reasoning)
NVIDIA———
24.3
133 tok/s1.3s
Ring-flash-2.0
InclusionAI———
14
87 tok/s1.4s
GLM-4.7 (Non-reasoning)
Z AI———
34.2
106 tok/s0.7s
Llama 3.3 Nemotron Super 49B v1 (Reasoning)
NVIDIA———
18.5
——
Grok Code Fast 1
xAI———
28.7
185 tok/s5.4s
GLM-4.5V (Reasoning)
Z AI———
15.1
45 tok/s1.0s
Qwen3 VL 32B Instruct
Alibaba———
17.2
83 tok/s1.3s
Apriel-v1.6-15B-Thinker
ServiceNow———
27.6
——
Qwen3 Omni 30B A3B (Reasoning)
Alibaba———
15.6
93 tok/s1.0s
K2-V2 (high)
MBZUAI Institute of Foundation Models———
20.6
——
o3-mini
OpenAI———
25.9
151 tok/s8.1s
HyperCLOVA X SEED Think (32B)
Naver———
23.7
——
Nova 2.0 Lite (low)
Amazon———
24.6
210 tok/s5.1s
GPT-5 mini (minimal)
OpenAI———
20.7
96 tok/s1.1s
Command R+
Cohere$2.5$10128K
78%
80 tok/s—
Gemini 2.0 Flash (Feb '25)
Google———
18.5
——
Gemini 2.0 Flash (experimental)
Google———
16.8
——
GLM-4.6 (Non-reasoning)
Z AI———
30.2
67 tok/s0.9s
GPT-4.1 mini
OpenAI———
22.9
90 tok/s0.6s
GPT-5 nano (high)
OpenAI———
26.8
144 tok/s100.6s
gpt-oss-120B (low)
OpenAI———
24.5
218 tok/s0.5s
Ling-flash-2.0
InclusionAI———
15.7
94 tok/s1.5s
Qwen3 30B A3B 2507 Instruct
Alibaba———
15
92 tok/s1.3s
Qwen3 30B A3B (Reasoning)
Alibaba———
15.3
70 tok/s1.2s
Gemini 2.5 Flash Preview (Non-reasoning)
Google———
17.8
——
ERNIE 4.5 300B A47B
Baidu———
15
29 tok/s1.8s
Claude 3.5 Sonnet (Oct '24)
Anthropic———
15.9
——
GPT-5 nano (medium)
OpenAI———
25.9
145 tok/s50.0s
Apriel-v1.5-15B-Thinker
ServiceNow———
28.3
——
Magistral Small 1.2
Mistral———
18.2
188 tok/s0.4s
Solar Pro 2 (Preview) (Reasoning)
Upstage———
18.8
——
Nova 2.0 Pro Preview (Non-reasoning)
Amazon———
23.1
151 tok/s0.7s
EXAONE 4.0 32B (Non-reasoning)
LG AI Research———
11.7
——
Qwen3 14B (Reasoning)
Alibaba———
16.2
65 tok/s1.1s
GPT-4o (ChatGPT)
OpenAI———
14.1
——
Qwen3 235B A22B (Non-reasoning)
Alibaba———
17
63 tok/s1.2s
Claude 4.5 Haiku (Reasoning)
Anthropic———
37.1
156 tok/s10.0s
NVIDIA Nemotron Nano 12B v2 VL (Reasoning)
NVIDIA———
14.9
151 tok/s0.5s
Olmo 3.1 32B Think
Allen Institute for AI———
13.9
——
K2-V2 (medium)
MBZUAI Institute of Foundation Models———
18.7
——
Gemini 2.5 Flash-Lite (Reasoning)
Google———
17.6
295 tok/s12.3s
Mistral Medium 3
Mistral———
18.8
62 tok/s0.5s
Sonar Pro
Perplexity———
15.2
——
Qwen3 VL 30B A3B Instruct
Alibaba———
16.1
123 tok/s1.0s
Qwen2.5 Max
Alibaba———
16.3
46 tok/s1.1s
QwQ 32B
Alibaba———
19.7
33 tok/s0.4s
Devstral 2
Mistral———
22
79 tok/s0.5s
Olmo 3 32B Think
Allen Institute for AI———
12.1
——
Claude Haiku 4.5
Anthropic$0.8$4200K
75.2%
250 tok/s—
Magistral Medium 1
Mistral———
18.8
——
Claude 3.5 Sonnet (June '24)
Anthropic———
14.2
——
Magistral Small 1
Mistral———
16.8
——
Qwen3 VL 8B (Reasoning)
Alibaba———
16.7
135 tok/s1.1s
Solar Pro 2 (Non-reasoning)
Upstage———
13.6
——
Llama 4 Scout
Meta———
13.5
137 tok/s0.5s
GLM-4.6V (Non-reasoning)
Z AI———
17.1
23 tok/s5.9s
gpt-oss-20B (high)
OpenAI———
24.5
252 tok/s0.3s
GLM-4.5V (Non-reasoning)
Z AI———
12.7
39 tok/s29.9s
Gemini 1.5 Pro (Sep '24)
Google———
16
——
DeepSeek R1 Distill Qwen 14B
DeepSeek———
15.8
——
NVIDIA Nemotron Nano 9B V2 (Reasoning)
NVIDIA———
14.8
117 tok/s0.3s
GPT-4o (May '24)
OpenAI———
14.5
101 tok/s0.5s
DeepSeek R1 0528 Qwen3 8B
DeepSeek———
16.4
——
Nova 2.0 Lite (Non-reasoning)
Amazon———
18
182 tok/s0.8s
DeepSeek R1 Distill Qwen 32B
DeepSeek———
17.2
42 tok/s0.5s
Grok 4.1 Fast (Non-reasoning)
xAI———
23.6
131 tok/s0.4s
NVIDIA Nemotron Nano 9B V2 (Non-reasoning)
NVIDIA———
13.2
153 tok/s0.7s
MiMo-V2-Flash (Non-reasoning)
Xiaomi———
30.4
124 tok/s1.5s
o1-mini
OpenAI———
20.4
——
Qwen3 4B 2507 (Reasoning)
Alibaba———
18.2
——
Qwen3 8B (Reasoning)
Alibaba———
13.2
91 tok/s1.0s
Grok 4 Fast (Non-reasoning)
xAI———
23.1
196 tok/s0.4s
Hermes 4 - Llama-3.1 405B (Non-reasoning)
Nous Research———
17.6
32 tok/s0.9s
Nova Premier
Amazon———
19
70 tok/s1.2s
Llama 3.1 Instruct 405B
Meta———
17.4
31 tok/s0.7s
Qwen3 32B (Non-reasoning)
Alibaba———
14.5
102 tok/s1.2s
Qwen3 Omni 30B A3B Instruct
Alibaba———
10.7
106 tok/s1.1s
Falcon-H1R-7B
TII UAE———
15.8
——
Solar Pro 2 (Preview) (Non-reasoning)
Upstage———
16
——
Llama 3.1 Tulu3 405B
Allen Institute for AI———
14.1
——
Command R
Cohere$0.15$0.6128K
72%
150 tok/s—
Mistral Small
Mistral AI$0.1$0.332K
72%
200 tok/s—
Gemini 3.1 Flash-Lite
Google DeepMind$0.01$0.041M
72%
500 tok/s—
Nova 2.0 Omni (Non-reasoning)
Amazon———
16.6
227 tok/s0.9s
gpt-oss-20B (low)
OpenAI———
20.8
261 tok/s0.4s
Gemini 2.0 Flash-Lite (Feb '25)
Google———
14.7
——
Qwen2.5 Instruct 72B
Alibaba———
15.6
55 tok/s1.2s
Gemini 2.5 Flash-Lite (Non-reasoning)
Google———
12.7
260 tok/s0.4s
K2-V2 (low)
MBZUAI Institute of Foundation Models———
14.4
——
Grok 2 (Dec '24)
xAI———
13.9
——
Command A
Cohere———
13.5
40 tok/s0.6s
Qwen3 Coder 30B A3B Instruct
Alibaba———
20
113 tok/s1.4s
Qwen3 30B A3B (Non-reasoning)
Alibaba———
12.5
67 tok/s1.2s
Llama 3.3 Instruct 70B
Meta———
14.5
96 tok/s0.6s
Devstral Medium
Mistral———
18.7
145 tok/s0.5s
Llama 3.3 Nemotron Super 49B v1 (Non-reasoning)
NVIDIA———
14.3
——
Qwen2.5 Instruct 32B
Alibaba———
13.2
——
Sarvam M (Reasoning)
Sarvam———
8.4
——
Qwen3 4B (Reasoning)
Alibaba———
14.2
104 tok/s1.0s
Qwen3 VL 4B (Reasoning)
Alibaba———
13.7
——
Grok Beta
xAI———
13.3
——
Pixtral Large
Mistral———
14
51 tok/s0.5s
Claude 3 Opus
Anthropic———
18
——
Mistral Large 2 (Nov '24)
Mistral———
15.1
41 tok/s0.5s
Ministral 3 14B
Mistral———
16
99 tok/s0.3s
Sonar
Perplexity———
15.5
——
Nova Pro
Amazon———
13.5
——
Llama 3.1 Nemotron Instruct 70B
NVIDIA———
13.4
46 tok/s0.3s
GPT-4 Turbo
OpenAI———
13.7
32 tok/s1.2s
Qwen3 VL 8B Instruct
Alibaba———
14.3
148 tok/s0.9s
Llama Nemotron Super 49B v1.5 (Non-reasoning)
NVIDIA———
14.6
58 tok/s0.3s
Mistral Large 2 (Jul '24)
Mistral———
13
——
Mistral Medium 3.1
Mistral———
21.3
89 tok/s0.4s
Devstral Small 2
Mistral———
19.5
80 tok/s0.7s
Gemini 1.5 Flash (Sep '24)
Google———
13.8
——
Qwen3 14B (Non-reasoning)
Alibaba———
12.8
65 tok/s1.0s
Mistral Small 3.2
Mistral———
15.1
155 tok/s0.3s
Llama 3.1 Instruct 70B
Meta———
12.5
31 tok/s0.8s
Qwen3 4B 2507 Instruct
Alibaba———
12.9
——
Ling-mini-2.0
InclusionAI———
9.2
——
Reka Flash 3
Reka AI———
9.5
94 tok/s1.3s
Llama 3.2 Instruct 90B (Vision)
Meta———
11.9
42 tok/s0.5s
Mistral Small 3.1
Mistral———
14.5
153 tok/s0.5s
Gemini 1.5 Pro (May '24)
Google———
12
——
Hermes 4 - Llama-3.1 70B (Non-reasoning)
Nous Research———
12.6
63 tok/s0.6s
Olmo 3 7B Think
Allen Institute for AI———
9.4
——
GPT-4.1 nano
OpenAI———
13
200 tok/s0.4s
Mistral Small 3
Mistral———
12.7
154 tok/s0.5s
NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning)
NVIDIA———
10.1
175 tok/s0.7s
QwQ 32B-Preview
Alibaba———
15.2
43 tok/s0.5s
Qwen2.5 Coder Instruct 32B
Alibaba———
12.9
——
Qwen3 8B (Non-reasoning)
Alibaba———
10.6
94 tok/s0.9s
Ministral 3 8B
Mistral———
14.8
180 tok/s0.3s
Qwen2.5 Turbo
Alibaba———
12
68 tok/s1.2s
Devstral Small (May '25)
Mistral———
18
——
Qwen3 VL 4B Instruct
Alibaba———
9.6
——
Claude 3.5 Haiku
Anthropic———
18.7
——
Devstral Small (Jul '25)
Mistral———
15.2
202 tok/s0.4s
Granite 4.0 H Small
IBM———
10.8
453 tok/s8.7s
Qwen2 Instruct 72B
Alibaba———
11.7
——
Mistral Saba
Mistral———
12.1
——
Gemma 3 12B Instruct
Google———
8.8
30 tok/s10.2s
Exaone 4.0 1.2B (Reasoning)
LG AI Research———
8.3
——
Kimi Linear 48B A3B Instruct
Kimi———
14.4
——
Qwen3 4B (Non-reasoning)
Alibaba———
12.5
105 tok/s1.0s
Nova Lite
Amazon———
12.7
221 tok/s0.7s
DeepHermes 3 - Mistral 24B Preview (Non-reasoning)
Nous Research———
10.9
——
Jamba Reasoning 3B
AI21 Labs———
9.6
——
Jamba 1.7 Large
AI21 Labs———
10.9
49 tok/s1.1s
NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning)
NVIDIA———
13.2
78 tok/s0.3s
Claude 3 Sonnet
Anthropic———
10.3
——
Gemini 1.5 Flash-8B
Google———
11.1
——
Jamba 1.5 Large
AI21 Labs———
10.7
——
Hermes 3 - Llama-3.1 70B
Nous Research———
10.6
28 tok/s0.4s
Qwen3 1.7B (Reasoning)
Alibaba———
8
138 tok/s1.0s
Gemini 1.5 Flash (May '24)
Google———
10.5
——
Llama 3 Instruct 70B
Meta———
8.9
42 tok/s0.7s
Jamba 1.6 Large
AI21 Labs———
10.6
48 tok/s0.9s
GPT-5 nano (minimal)
OpenAI———
13.8
142 tok/s1.0s
Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning)
NVIDIA———
14.4
——
Mixtral 8x22B Instruct
Mistral———
9.8
——
DeepSeek R1 Distill Llama 8B
DeepSeek———
12.1
——
Nova Micro
Amazon———
10.3
314 tok/s0.6s
Ministral 3 3B
Mistral———
11.2
307 tok/s0.3s
Olmo 3 7B Instruct
Allen Institute for AI———
8.2
——
OLMo 2 32B
Allen Institute for AI———
10.6
——
LFM2 8B A1B
Liquid AI———
7
——
Claude 2.1
Anthropic———
9.3
——
Exaone 4.0 1.2B (Non-reasoning)
LG AI Research———
8.1
——
Gemma 3n E4B Instruct
Google———
6.4
14 tok/s0.4s
Phi-4 Multimodal Instruct
Microsoft Azure———
10
16 tok/s0.4s
Mistral Medium
Mistral———
9
89 tok/s0.4s
Claude 2.0
Anthropic———
9.1
——
Llama 3.1 Instruct 8B
Meta———
11.8
170 tok/s0.4s
Gemma 3n E4B Instruct Preview (May '25)
Google———
10.1
——
Qwen2.5 Coder Instruct 7B
Alibaba———
10
——
Phi-4 Mini Instruct
Microsoft Azure———
8.4
44 tok/s0.3s
Granite 3.3 8B (Non-reasoning)
IBM———
7
427 tok/s7.3s
Llama 3.2 Instruct 11B (Vision)
Meta———
8.7
79 tok/s0.5s
GPT-3.5 Turbo
OpenAI———
9
89 tok/s0.5s
Granite 4.0 Micro
IBM———
7.7
——
Phi-3 Mini Instruct 3.8B
Microsoft Azure———
10.1
——
Claude Instant
Anthropic———
7.4
——
DeepSeek Coder V2 Lite Instruct
DeepSeek———
8.5
——
LFM 40B
Liquid AI———
8.8
——
Command-R+ (Apr '24)
Cohere———
8.3
——
Gemini 1.0 Pro
Google———
8.5
——
Mistral Small (Feb '24)
Mistral———
9
154 tok/s0.5s
Gemma 3 4B Instruct
Google———
6.3
30 tok/s1.1s
Qwen3 1.7B (Non-reasoning)
Alibaba———
6.8
141 tok/s0.9s
Llama 2 Chat 13B
Meta———
8.4
——
Llama 3 Instruct 8B
Meta———
6.4
82 tok/s0.5s
Llama 2 Chat 70B
Meta———
8.4
——
Jamba 1.7 Mini
AI21 Labs———
8.1
——
Mixtral 8x7B Instruct
Mistral———
7.7
——
Gemma 3n E2B Instruct
Google———
4.8
51 tok/s0.5s
Molmo 7B-D
Allen Institute for AI———
9.2
——
DeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning)
Nous Research———
7.6
——
Jamba 1.6 Mini
AI21 Labs———
7.9
178 tok/s0.8s
Jamba 1.5 Mini
AI21 Labs———
8
——
Llama 3.2 Instruct 3B
Meta———
9.7
53 tok/s0.6s
Qwen3 0.6B (Reasoning)
Alibaba———
6.5
189 tok/s0.9s
Command-R (Mar '24)
Cohere———
7.4
——
Granite 4.0 1B
IBM———
7.3
——
OpenChat 3.5 (1210)
OpenChat———
8.3
——
LFM2 2.6B
Liquid AI———
8
——
OLMo 2 7B
Allen Institute for AI———
9.3
——
Granite 4.0 H 1B
IBM———
8
——
DeepSeek R1 Distill Qwen 1.5B
DeepSeek———
9.1
——
LFM2 1.2B
Liquid AI———
6.3
——
Mistral 7B Instruct
Mistral———
7.4
190 tok/s0.3s
Qwen3 0.6B (Non-reasoning)
Alibaba———
5.7
194 tok/s0.9s
Llama 3.2 Instruct 1B
Meta———
6.3
88 tok/s0.6s
Llama 2 Chat 7B
Meta———
9.7
108 tok/s12.6s
Gemma 3 1B Instruct
Google———
5.5
48 tok/s0.6s
Granite 4.0 H 350M
IBM———
5.4
——
Granite 4.0 350M
IBM———
6.1
——
Gemma 3 270M
Google———
7.7
——
GLM-5.1 (Reasoning)
Z AI———
51.4
43 tok/s1.2s
GLM 5V Turbo (Reasoning)
Z AI———
42.9
——
Tiny Aya Global
Cohere———
4.7
——
GLM-5 (Reasoning)
Z AI———
49.8
67 tok/s0.9s
Qwen3.5 397B A17B (Reasoning)
Alibaba———
45
52 tok/s1.5s
Qwen3.5 0.8B (Reasoning)
Alibaba———
10.5
——
Qwen3.5 2B (Non-reasoning)
Alibaba———
14.7
232 tok/s0.3s
Qwen3.5 0.8B (Non-reasoning)
Alibaba———
9.9
285 tok/s0.3s
Qwen3.5 4B (Non-reasoning)
Alibaba———
22.6
178 tok/s0.3s
o1-preview
OpenAI———
23.7
——
Qwen3 Coder Next
Alibaba———
28.3
165 tok/s0.8s
Qwen3.5 9B (Reasoning)
Alibaba———
32.4
56 tok/s0.4s
Qwen3.5 2B (Reasoning)
Alibaba———
16.3
——
Qwen3.5 35B A3B (Reasoning)
Alibaba———
37.1
149 tok/s1.2s
Qwen3.5 27B (Non-reasoning)
Alibaba———
37.2
92 tok/s1.4s
Qwen3.5 122B A10B (Reasoning)
Alibaba———
41.6
159 tok/s1.1s
Qwen3 Max Thinking
Alibaba———
39.9
36 tok/s1.7s
K2 Think V2
MBZUAI Institute of Foundation Models———
24.1
——
Sarvam 105B (high)
Sarvam———
18.2
124 tok/s1.2s
Sarvam 30B (high)
Sarvam———
12.3
294 tok/s1.2s
Qwen3.5 Omni Plus
Alibaba———
38.6
55 tok/s1.3s
Qwen3.5 4B (Reasoning)
Alibaba———
27.1
177 tok/s0.3s
MiMo-V2-Omni-0327
Xiaomi———
44.9
——
KAT Coder Pro V2
KwaiKAT———
43.8
114 tok/s1.8s
Qwen3.6 Plus
Alibaba———
50
53 tok/s1.6s
Qwen3.5 397B A17B (Non-reasoning)
Alibaba———
40.1
52 tok/s1.4s
Qwen3.5 122B A10B (Non-reasoning)
Alibaba———
35.9
152 tok/s1.1s
Qwen3.5 Omni Flash
Alibaba———
25.9
170 tok/s1.2s
Qwen3.5 27B (Reasoning)
Alibaba———
42.1
92 tok/s1.4s
MiMo-V2-Pro
Xiaomi———
49.2
67 tok/s2.1s
MiMo-V2-Omni
Xiaomi———
43.4
——
Qwen3.5 35B A3B (Non-reasoning)
Alibaba———
30.7
153 tok/s1.1s
MiMo-V2-Flash (Feb 2026)
Xiaomi———
41.5
127 tok/s1.5s
GPT-3.5 Turbo (0613)
OpenAI——————
NVIDIA Nemotron 3 Nano 4B
NVIDIA———
14.7
——
DeepSeek-V2.5
DeepSeek———
12.3
——
o3-pro
OpenAI———
40.7
19 tok/s95.4s
o1-pro
OpenAI———
25.8
——
Mercury 2
Inception———
32.8
872 tok/s4.7s
GPT-4o (Aug '24)
OpenAI———
18.6
108 tok/s0.6s
Molmo2-8B
Allen Institute for AI———
7.3
——
GPT-5.2 Codex (xhigh)
OpenAI———
49
107 tok/s7.4s
GPT-4o Realtime (Dec '24)
OpenAI——————
GPT-4
OpenAI———
12.8
35 tok/s0.8s
GPT-4o mini Realtime (Dec '24)
OpenAI——————
Step3 VL 10B
StepFun———
15.4
——
GPT-4.5 (Preview)
OpenAI———
20
——
Olmo 3.1 32B Instruct
Allen Institute for AI———
12.2
54 tok/s0.3s
Gemini 2.0 Flash-Lite (Preview)
Google———
14.5
——
Kimi K2.5 (Non-reasoning)
Kimi———
37.3
32 tok/s1.4s
Step 3.5 Flash 2603
StepFun———
38.5
186 tok/s0.8s
Step 3.5 Flash
StepFun———
37.8
163 tok/s0.8s
Nemotron Cascade 2 30B A3B
NVIDIA———
28.4
——
Llama 65B
Meta———
7.4
——
Kimi K2.5 (Reasoning)
Kimi———
46.8
32 tok/s1.3s
Gemini 1.0 Ultra
Google———
10.1
——
Gemini 2.0 Flash Thinking Experimental (Dec '24)
Google———
12.3
——
PALM-2
Google———
8.6
——
LFM2 24B A2B
Liquid AI———
10.5
163 tok/s0.3s
Solar Open 100B (Reasoning)
Upstage———
21.7
——
Ling 2.6 Flash
InclusionAI———
26.2
202 tok/s0.8s
Qwen3.6 Max Preview
Alibaba———
51.8
57 tok/s1.9s
Claude 3 Haiku
Anthropic———
12.3
131 tok/s0.5s
NVIDIA Nemotron 3 Super 120B A12B (Reasoning)
NVIDIA———
36
154 tok/s1.1s
MiniMax-M2.7
MiniMax———
49.6
47 tok/s1.6s
LFM2.5-1.2B-Thinking
Liquid AI———
8.1
——
LFM2.5-1.2B-Instruct
Liquid AI———
8
——
Solar Pro 3
Upstage———
25.9
——
LFM2.5-VL-1.6B
Liquid AI———
6.2
——
Claude 4.1 Opus (Non-reasoning)
Anthropic———
36
39 tok/s1.4s
Claude Opus 4.7 (Non-reasoning, High Effort)
Anthropic———
51.8
53 tok/s1.2s
DeepSeek-V2.5 (Dec '24)
DeepSeek———
12.5
——
DeepSeek-Coder-V2
DeepSeek———
10.6
——
DeepSeek LLM 67B Chat (V1)
DeepSeek———
8.4
——
Grok 4.20 0309 v2 (Reasoning)
xAI———
49.3
175 tok/s15.5s
Grok 4.20 0309 v2 (Non-reasoning)
xAI———
29
177 tok/s0.4s
R1 1776
Perplexity———
12
——
Grok 4.20 0309 (Non-reasoning)
xAI———
29.7
164 tok/s0.4s
GPT-5.4 mini (Non-Reasoning)
OpenAI———
23.3
176 tok/s0.6s
Sonar Reasoning
Perplexity———
17.9
——
Grok 4.20 0309 (Reasoning)
xAI———
48.5
183 tok/s16.1s
Grok 3 Reasoning Beta
xAI———
21.6
——
Solar Mini
Upstage———
11.9
87 tok/s1.4s
Codestral
Mistral AI$0.3$0.932K—180 tok/s—
MiniMax-M2.5
MiniMax———
41.9
59 tok/s2.1s
Gemini 3.1 Pro Preview
Google———
57.2
124 tok/s28.7s
Sonar Reasoning Pro
Perplexity———
24.6
——
Reka Flash (Sep '24)
Reka AI———
12
85 tok/s1.3s
Claude Sonnet 4.6 (Non-reasoning, Low Effort)
Anthropic———
42.6
60 tok/s1.0s
GLM-4.7-Flash (Non-reasoning)
Z AI———
22.1
105 tok/s1.0s
GLM-4.7-Flash (Reasoning)
Z AI———
30.1
91 tok/s0.9s
Mistral Small 4 (Reasoning)
Mistral———
27.8
173 tok/s0.5s
Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort)
Anthropic———
51.7
72 tok/s46.6s
Gemma 4 E2B (Non-reasoning)
Google———
12.1
——
Gemma 4 E4B (Non-reasoning)
Google———
14.8
——
Claude Opus 4.6 (Adaptive Reasoning, Max Effort)
Anthropic———
53
53 tok/s11.7s
Gemma 4 E2B (Reasoning)
Google———
15.2
——
Gemini 3.1 Flash-Lite Preview
Google———
33.5
319 tok/s5.7s
GPT-5.4 nano (xhigh)
OpenAI———
44
157 tok/s2.5s
Gemma 4 E4B (Reasoning)
Google———
18.8
——
GPT-5.4 mini (medium)
OpenAI———
37.7
181 tok/s6.3s
GPT-5.4 mini (xhigh)
OpenAI———
48.9
189 tok/s6.9s
Gemma 4 31B (Non-reasoning)
Google———
32.3
——
Qwen Chat 72B
Alibaba———
8.8
——
Grok-1
xAI———
11.7
——
Arctic Instruct
Snowflake———
8.8
——
Qwen1.5 Chat 110B
Alibaba———
9.5
——
Gemini 3 Deep Think
Google——————
Gemma 4 26B A4B (Non-reasoning)
Google———
27.1
——
Muse Spark
Meta———
52.1
——
Gemma 4 31B (Reasoning)
Google———
39.2
35 tok/s1.0s
GPT-5.4 nano (medium)
OpenAI———
38.1
158 tok/s3.8s
GPT-5.4 nano (Non-Reasoning)
OpenAI———
24.4
161 tok/s0.6s
Claude Opus 4.7 (Adaptive Reasoning, Max Effort)
Anthropic———
57.3
57 tok/s11.6s
GPT-5.4 (xhigh)
OpenAI———
56.8
81 tok/s157.8s
Mistral Small 4 (Non-reasoning)
Mistral———
18.6
149 tok/s0.5s
JT-MINI
China Mobile———
25.4
——
GLM-5.1 (Non-reasoning)
Z AI———
43.8
47 tok/s2.1s
GPT-5.4 (Non-reasoning)
OpenAI———
35.4
62 tok/s0.7s
Qwen3.5 9B (Non-reasoning)
Alibaba———
27.3
143 tok/s0.3s
GPT-5.4 Pro (xhigh)
OpenAI——————
Gemma 4 26B A4B (Reasoning)
Google———
31.2
——
Qwen Chat 14B
Alibaba———
7.4
——
GPT-5.3 Codex (xhigh)
OpenAI———
53.6
85 tok/s60.3s
DeepSeek-V2-Chat
DeepSeek———
9.1
——
Kimi K2.6
Kimi———
53.9
135 tok/s0.8s
Qwen3.6 35B A3B (Reasoning)
Alibaba———
43.5
238 tok/s1.7s
Qwen3.6 35B A3B (Non-reasoning)
Alibaba———
31.5
193 tok/s1.5s
Tri-21B-think Preview
Trillion Labs———
20
——
LongCat Flash Lite
LongCat———
23.9
115 tok/s3.9s
Nanbeige4.1-3B
Nanbeige———
16.1
——
Tri-21B-Think
Trillion Labs———
18.6
——
Apertus 70B Instruct
Swiss AI Initiative———
7.7
——
Apertus 8B Instruct
Swiss AI Initiative———
5.9
——
Trinity Large Thinking
Arcee AI———
31.9
127 tok/s0.6s
GLM-5 (Non-reasoning)
Z AI———
40.6
53 tok/s1.4s
GLM-5-Turbo
Z AI———
46.8
——
∞AI

Everything AI. In one place.

Platform

ToolsModelsJobsHackathonsSubmit

Company

AboutContact

Stay updated

Get weekly AI news in your inbox

© 2026 ∞AI. Built for the AI community.everythingai.tech

Estimate Your Monthly Cost

Enter your expected usage to compare costs across models

e.g. 1,000,000 = ~750,000 words

Usually 30–50% of input volume

6 models selected

ModelInput CostOutput CostTotal/Monthvs Cheapest
DeepSeek R2
DeepSeek
$0.55$1.09$1.65✓ Best value
GPT-4.1
OpenAI
$2.00$4.00$6.003.6× more
Claude Sonnet 4.6
Anthropic
$3.00$7.50$10.506.4× more
GPT-4o
OpenAI
$5.00$7.50$12.507.6× more
o3
OpenAI
$10.00$20.00$30.0018.2× more
Claude Opus 4.6
Anthropic
$15.00$37.50$52.5031.9× more

Prices are approximate and may vary. Check provider documentation for current pricing.