∞AI
ToolsModelsJobsHackathons
SubmitSign In

AI Model Comparison

Compare pricing, benchmarks, and capabilities across 496 AI models

496 models tracked9 open source
AllLanguage ModelsText → ImageText → VideoText → SpeechImage → Video
Type
AllProprietaryOpen Source
Provider
AllAI21 LabsAlibabaAlibaba CloudAllen Institute for AIAmazonAnthropicArcee AIBaiduByteDance SeedCartesiaChina MobileCohereCoquiDatabricksDeep CogitoDeepSeekElevenLabsFish AudioGoogleGoogle DeepMindHume AIIBMInceptionInclusionAIInworldKimiKokoroKorea TelecomKwaiKATLG AI ResearchLMNTLiquid AILongCatMBZUAI Institute of Foundation ModelsMaya ResearchMetaMeta AIMetaVoiceMicrosoftMicrosoft AzureMiniMaxMistralMistral AIMotif TechnologiesMurf AINVIDIANanbeigeNaverNeuphonicNous ResearchOpenAIOpenChatOpenVoicePerplexityPrime IntellectReka AIResemble AIRimeSarvamServiceNowSmallest.aiSnowflakeSpeechifyStepFunStyleTTS Swiss AI InitiativeTIITII UAETrillion LabsUpstageXiaomiZ AIZyphraasyncxAI
Price
AnyFree<$1/M<$5/M<$20/M
Sort
Best BenchmarkCheapest FirstMost ExpensiveLargest ContextFastest
ModelProviderInput $/1M↕Output $/1M↕Context↕Intelligence↑Speed↕LatencyAPI
DeepSeek R2
★
DeepSeek$0.55$2.19128K
91%
60 tok/s—
GPT-4.1
★
OpenAI$2$81M
90.5%
80 tok/s—
Claude Opus 4.6
★
Anthropic$15$75200K
88.7%
60 tok/s—
GPT-4o
★
OpenAI$5$15128K
87.2%
120 tok/s—
Claude Sonnet 4.6
★
Anthropic$3$15200K
86.8%
100 tok/s—
Llama 3.3 70B
Open★
Meta AI$0.23$0.92128K
86%
80 tok/s—
o3
OpenAI$10$40200K
96.7%
40 tok/s—
o4-mini
OpenAI$1.1$4.4200K
93.4%
100 tok/s—
Gemini 3 Ultra
Google DeepMind$7$211M
90.1%
70 tok/s—
Gemini 3 Pro Preview (low)
Google———
41.3
——
Claude Opus 4.5 (Reasoning)
Anthropic———
49.7
72 tok/s11.7s
Claude Opus 4.5 (Non-reasoning)
Anthropic———
43.1
63 tok/s1.1s
Gemini 3 Flash Preview (Reasoning)
Google———
46.4
195 tok/s5.9s
DeepSeek V3
Open
DeepSeek$0.27$1.1128K
88.5%
80 tok/s—
MiniMax-M2.1
MiniMax———
39.4
59 tok/s2.4s
Claude 4.5 Sonnet (Reasoning)
Anthropic———
43
59 tok/s10.4s
Claude 4.1 Opus (Reasoning)
Anthropic———
42
42 tok/s8.0s
Grok 3
xAI$3$15131K
87.5%
90 tok/s—
Llama 3.1 405B
Open
Meta AI$3$3128K
87.3%
30 tok/s—
Grok 4
xAI———
41.5
64 tok/s7.4s
Gemini 3 Pro
Google DeepMind$3.5$10.51M
87%
100 tok/s—
Qwen3-Max
Alibaba Cloud$0.4$1.232K
87%
90 tok/s—
GPT-5.1 (high)
OpenAI———
47.7
118 tok/s25.1s
GPT-5.2 (xhigh)
OpenAI———
51.3
72 tok/s81.3s
GPT-5 (high)
OpenAI———
44.6
86 tok/s99.7s
GPT-5 (medium)
OpenAI———
42
95 tok/s40.4s
Claude 4 Opus (Reasoning)
Anthropic———
39
41 tok/s8.0s
GPT-5 Codex (high)
OpenAI———
44.6
207 tok/s11.4s
DeepSeek V3.2 (Reasoning)
DeepSeek———
41.7
29 tok/s1.4s
GPT-5.2 (medium)
OpenAI———
46.6
——
Gemini 2.5 Pro
Google———
34.6
127 tok/s22.0s
Claude 4.5 Sonnet (Non-reasoning)
Anthropic———
37.1
56 tok/s1.2s
Claude 4 Opus (Non-reasoning)
Anthropic———
33
37 tok/s1.4s
Gemini 2.5 Pro Preview (Mar' 25)
Google———
30.3
——
DeepSeek V3.2 Speciale
DeepSeek———
29.4
——
GPT-5 (low)
OpenAI———
39.2
75 tok/s10.3s
GPT-5.1 Codex (high)
OpenAI———
43.1
167 tok/s6.7s
GLM-4.7 (Reasoning)
Z AI———
42.1
109 tok/s0.7s
Kimi K2 Thinking
Kimi———
40.9
41 tok/s1.1s
DeepSeek R1 0528 (May '25)
DeepSeek———
27.1
——
Qwen3-72B
Open
Alibaba CloudFreeFree32K
85%
100 tok/s—
DeepSeek V3.1 (Reasoning)
DeepSeek———
27.7
——
DeepSeek V3.1 Terminus (Reasoning)
DeepSeek———
33.9
——
Cogito v2.1 (Reasoning)
Deep Cogito———
85%
57 tok/s0.5s
Doubao Seed Code
ByteDance Seed———
33.5
——
DeepSeek V3.2 Exp (Reasoning)
DeepSeek———
32.9
30 tok/s1.4s
Grok 4 Fast (Reasoning)
xAI———
35.1
216 tok/s3.4s
Grok 4.1 Fast (Reasoning)
xAI———
38.6
142 tok/s9.2s
Phi-4
Open
Microsoft$0.07$0.1416K
84.8%
300 tok/s—
Claude 3.7 Sonnet (Reasoning)
Anthropic———
34.7
——
Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning)
Google———
25.7
——
DeepSeek V3.2 (Non-reasoning)
DeepSeek———
32.1
30 tok/s1.3s
Gemini 2.5 Pro Preview (May' 25)
Google———
29.5
——
Qwen3 VL 235B A22B (Reasoning)
Alibaba———
27.6
45 tok/s1.2s
K-EXAONE (Reasoning)
LG AI Research———
32.1
——
Qwen3 Max (Preview)
Alibaba———
26.1
47 tok/s1.8s
Qwen3 235B A22B 2507 (Reasoning)
Alibaba———
29.5
51 tok/s1.3s
GPT-5 mini (high)
OpenAI———
41.2
74 tok/s91.5s
o1
OpenAI———
30.8
112 tok/s23.6s
GLM-4.5 (Reasoning)
Z AI———
26.4
38 tok/s0.9s
MiMo-V2-Flash (Reasoning)
Xiaomi———
39.2
123 tok/s1.8s
DeepSeek R1 (Jan '25)
DeepSeek———
18.8
——
DeepSeek V3.1 Terminus (Non-reasoning)
DeepSeek———
28.5
——
DeepSeek V3.2 Exp (Non-reasoning)
DeepSeek———
28.4
31 tok/s1.3s
Gemini 2.5 Flash Preview (Sep '25) (Reasoning)
Google———
31.1
——
Mistral Large
Mistral AI$2$6128K
84%
90 tok/s—
Claude 4 Sonnet (Reasoning)
Anthropic———
38.7
59 tok/s8.5s
Claude 4 Sonnet (Non-reasoning)
Anthropic———
33
52 tok/s0.8s
Grok 3 Mini
xAI$0.3$0.5131K
83%
160 tok/s—
ERNIE 5.0 Thinking Preview
Baidu———
29.1
——
DeepSeek V3.1 (Non-reasoning)
DeepSeek———
28.1
——
Hermes 4 - Llama-3.1 405B (Reasoning)
Nous Research———
18.6
32 tok/s0.8s
GLM-4.6 (Reasoning)
Z AI———
32.5
36 tok/s0.9s
Qwen3 235B A22B 2507 Instruct
Alibaba———
25
70 tok/s1.2s
Nova 2.0 Pro Preview (medium)
Amazon———
35.7
120 tok/s17.9s
Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)
NVIDIA———
15
42 tok/s0.7s
Gemini 2.5 Flash (Reasoning)
Google———
27
205 tok/s13.3s
Grok 3 mini Reasoning (high)
xAI———
32.1
216 tok/s0.4s
Qwen3 235B A22B (Reasoning)
Alibaba———
19.8
65 tok/s1.3s
GPT-5 mini (medium)
OpenAI———
38.9
77 tok/s20.0s
INTELLECT-3
Prime Intellect———
22.2
——
EXAONE 4.0 32B (Reasoning)
LG AI Research———
16.7
——
Qwen3 VL 32B (Reasoning)
Alibaba———
24.7
97 tok/s1.4s
Seed-OSS-36B-Instruct
ByteDance Seed———
25.2
42 tok/s1.8s
Qwen3 VL 235B A22B Instruct
Alibaba———
20.8
57 tok/s1.2s
Kimi K2 0905
Kimi———
30.9
22 tok/s2.1s
GLM-4.5-Air
Z AI———
23.2
65 tok/s1.3s
MiniMax M1 80k
MiniMax———
24.4
——
MiniMax-M2
MiniMax———
36.1
61 tok/s2.2s
DeepSeek V3 0324
DeepSeek———
22.3
——
Magistral Medium 1.2
Mistral———
27.1
95 tok/s0.4s
GPT-4o mini
OpenAI$0.15$0.6128K
82%
200 tok/s—
Gemini 3 Flash
Google DeepMind$0.075$0.31M
82%
250 tok/s—
Qwen3 Max Thinking (Preview)
Alibaba———
32.5
43 tok/s1.8s
Nova 2.0 Lite (high)
Amazon———
34.5
195 tok/s21.4s
Nova 2.0 Pro Preview (low)
Amazon———
31.9
143 tok/s6.8s
GPT-5 (ChatGPT)
OpenAI———
21.8
158 tok/s0.6s
Kimi K2
Kimi———
26.3
35 tok/s1.3s
GPT-5.1 Codex mini (high)
OpenAI———
38.6
197 tok/s5.9s
Qwen3 Next 80B A3B (Reasoning)
Alibaba———
26.7
164 tok/s1.1s
Ling-1T
InclusionAI———
19
——
Qwen3 Next 80B A3B Instruct
Alibaba———
20.1
166 tok/s1.0s
Llama 4 Maverick
Meta———
18.4
115 tok/s0.6s
GPT-5 (minimal)
OpenAI———
23.9
74 tok/s1.1s
K-EXAONE (Non-reasoning)
LG AI Research———
23.4
——
Hermes 4 - Llama-3.1 70B (Reasoning)
Nous Research———
16
62 tok/s0.6s
Llama Nemotron Super 49B v1.5 (Reasoning)
NVIDIA———
18.7
60 tok/s0.3s
Ring-1T
InclusionAI———
22.8
——
Gemini 2.5 Flash (Non-reasoning)
Google———
20.6
180 tok/s0.5s
Nova 2.0 Lite (medium)
Amazon———
29.7
177 tok/s13.8s
Nova 2.0 Omni (medium)
Amazon———
28
——
KAT-Coder-Pro V1
KwaiKAT———
36
112 tok/s1.0s
MiniMax M1 40k
MiniMax———
20.9
——
Gemini 2.0 Pro Experimental (Feb '25)
Google———
18.1
——
gpt-oss-120B (high)
OpenAI———
33.3
215 tok/s0.5s
Solar Pro 2 (Reasoning)
Upstage———
14.9
——
Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning)
Google———
21.6
——
Mi:dm K 2.5 Pro
Korea Telecom———
23.1
——
GPT-5.2 (Non-reasoning)
OpenAI———
33.6
63 tok/s0.8s
Mi:dm K 2.5 Pro Preview
Korea Telecom———
81%
——
Qwen3 30B A3B 2507 (Reasoning)
Alibaba———
22.4
148 tok/s1.1s
Qwen3 VL 30B A3B (Reasoning)
Alibaba———
19.7
127 tok/s1.0s
Mistral Large 3
Mistral———
22.8
56 tok/s0.6s
Nova 2.0 Omni (low)
Amazon———
23.2
——
Claude 4.5 Haiku (Non-reasoning)
Anthropic———
31.1
120 tok/s0.5s
Gemini 2.5 Flash-Lite Preview (Sep '25) (Non-reasoning)
Google———
19.4
——
GPT-4o (March 2025, chatgpt-4o-latest)
OpenAI———
18.6
——
o3-mini (high)
OpenAI———
25.2
149 tok/s27.7s
GLM-4.6V (Reasoning)
Z AI———
23.4
27 tok/s1.2s
Qwen3 32B (Reasoning)
Alibaba———
16.5
103 tok/s1.1s
GPT-5.1 (Non-reasoning)
OpenAI———
27.4
108 tok/s0.8s
Motif-2-12.7B-Reasoning
Motif Technologies———
19.1
——
Gemini 2.5 Flash Preview (Reasoning)
Google———
24.3
——
Gemini 2.0 Flash Thinking Experimental (Jan '25)
Google———
19.6
——
DeepSeek R1 Distill Llama 70B
DeepSeek———
16
41 tok/s0.5s
Claude 3.7 Sonnet (Non-reasoning)
Anthropic———
30.8
——
Qwen3 Coder 480B A35B Instruct
Alibaba———
24.8
65 tok/s1.7s
Grok Code Fast 1
xAI———
28.7
185 tok/s5.4s
Nova 2.0 Lite (low)
Amazon———
24.6
210 tok/s5.1s
NVIDIA Nemotron 3 Nano 30B A3B (Reasoning)
NVIDIA———
24.3
133 tok/s1.3s
Llama 3.3 Nemotron Super 49B v1 (Reasoning)
NVIDIA———
18.5
——
K2-V2 (high)
MBZUAI Institute of Foundation Models———
20.6
——
HyperCLOVA X SEED Think (32B)
Naver———
23.7
——
Apriel-v1.6-15B-Thinker
ServiceNow———
27.6
——
Ring-flash-2.0
InclusionAI———
14
87 tok/s1.4s
Qwen3 Omni 30B A3B (Reasoning)
Alibaba———
15.6
93 tok/s1.0s
o3-mini
OpenAI———
25.9
151 tok/s8.1s
GLM-4.5V (Reasoning)
Z AI———
15.1
45 tok/s1.0s
GLM-4.7 (Non-reasoning)
Z AI———
34.2
106 tok/s0.7s
Qwen3 VL 32B Instruct
Alibaba———
17.2
83 tok/s1.3s
Ling-flash-2.0
InclusionAI———
15.7
94 tok/s1.5s
GPT-5 mini (minimal)
OpenAI———
20.7
96 tok/s1.1s
GPT-4.1 mini
OpenAI———
22.9
90 tok/s0.6s
GLM-4.6 (Non-reasoning)
Z AI———
30.2
67 tok/s0.9s
Gemini 2.5 Flash Preview (Non-reasoning)
Google———
17.8
——
Qwen3 30B A3B (Reasoning)
Alibaba———
15.3
70 tok/s1.2s
ERNIE 4.5 300B A47B
Baidu———
15
29 tok/s1.8s
gpt-oss-120B (low)
OpenAI———
24.5
218 tok/s0.5s
Gemini 2.0 Flash (Feb '25)
Google———
18.5
——
Gemini 2.0 Flash (experimental)
Google———
16.8
——
Command R+
Cohere$2.5$10128K
78%
80 tok/s—
GPT-5 nano (high)
OpenAI———
26.8
144 tok/s100.6s
Qwen3 30B A3B 2507 Instruct
Alibaba———
15
92 tok/s1.3s
Apriel-v1.5-15B-Thinker
ServiceNow———
28.3
——
GPT-4o (ChatGPT)
OpenAI———
14.1
——
GPT-5 nano (medium)
OpenAI———
25.9
145 tok/s50.0s
Qwen3 14B (Reasoning)
Alibaba———
16.2
65 tok/s1.1s
EXAONE 4.0 32B (Non-reasoning)
LG AI Research———
11.7
——
Magistral Small 1.2
Mistral———
18.2
188 tok/s0.4s
Nova 2.0 Pro Preview (Non-reasoning)
Amazon———
23.1
151 tok/s0.7s
Solar Pro 2 (Preview) (Reasoning)
Upstage———
18.8
——
Claude 3.5 Sonnet (Oct '24)
Anthropic———
15.9
——
Qwen2.5 Max
Alibaba———
16.3
46 tok/s1.1s
Mistral Medium 3
Mistral———
18.8
62 tok/s0.5s
Devstral 2
Mistral———
22
79 tok/s0.5s
Olmo 3.1 32B Think
Allen Institute for AI———
13.9
——
Qwen3 235B A22B (Non-reasoning)
Alibaba———
17
63 tok/s1.2s
Olmo 3 32B Think
Allen Institute for AI———
12.1
——
QwQ 32B
Alibaba———
19.7
33 tok/s0.4s
Gemini 2.5 Flash-Lite (Reasoning)
Google———
17.6
295 tok/s12.3s
Claude 4.5 Haiku (Reasoning)
Anthropic———
37.1
156 tok/s10.0s
K2-V2 (medium)
MBZUAI Institute of Foundation Models———
18.7
——
Qwen3 VL 30B A3B Instruct
Alibaba———
16.1
123 tok/s1.0s
Sonar Pro
Perplexity———
15.2
——
NVIDIA Nemotron Nano 12B v2 VL (Reasoning)
NVIDIA———
14.9
151 tok/s0.5s
Claude Haiku 4.5
Anthropic$0.8$4200K
75.2%
250 tok/s—
Magistral Medium 1
Mistral———
18.8
——
Gemini 1.5 Pro (Sep '24)
Google———
16
——
Gemma 3 27B
Open
Google DeepMindFreeFree128K
75%
120 tok/s—
Solar Pro 2 (Non-reasoning)
Upstage———
13.6
——
Llama 4 Scout
Meta———
13.5
137 tok/s0.5s
Magistral Small 1
Mistral———
16.8
——
Qwen3 VL 8B (Reasoning)
Alibaba———
16.7
135 tok/s1.1s
Claude 3.5 Sonnet (June '24)
Anthropic———
14.2
——
gpt-oss-20B (high)
OpenAI———
24.5
252 tok/s0.3s
GLM-4.6V (Non-reasoning)
Z AI———
17.1
23 tok/s5.9s
GLM-4.5V (Non-reasoning)
Z AI———
12.7
39 tok/s29.9s
o1-mini
OpenAI———
20.4
——
MiMo-V2-Flash (Non-reasoning)
Xiaomi———
30.4
124 tok/s1.5s
NVIDIA Nemotron Nano 9B V2 (Non-reasoning)
NVIDIA———
13.2
153 tok/s0.7s
Nova 2.0 Lite (Non-reasoning)
Amazon———
18
182 tok/s0.8s
DeepSeek R1 Distill Qwen 14B
DeepSeek———
15.8
——
Grok 4.1 Fast (Non-reasoning)
xAI———
23.6
131 tok/s0.4s
Qwen3 4B 2507 (Reasoning)
Alibaba———
18.2
——
NVIDIA Nemotron Nano 9B V2 (Reasoning)
NVIDIA———
14.8
117 tok/s0.3s
GPT-4o (May '24)
OpenAI———
14.5
101 tok/s0.5s
DeepSeek R1 0528 Qwen3 8B
DeepSeek———
16.4
——
DeepSeek R1 Distill Qwen 32B
DeepSeek———
17.2
42 tok/s0.5s
Qwen3 8B (Reasoning)
Alibaba———
13.2
91 tok/s1.0s
DBRX
Open
Databricks$0.75$2.2533K
73.7%
100 tok/s—
Qwen3 Omni 30B A3B Instruct
Alibaba———
10.7
106 tok/s1.1s
Llama 3.2 11B Vision
Open
Meta AI$0.18$0.18128K
73%
150 tok/s—
Llama 3.1 Instruct 405B
Meta———
17.4
31 tok/s0.7s
Nova Premier
Amazon———
19
70 tok/s1.2s
Solar Pro 2 (Preview) (Non-reasoning)
Upstage———
16
——
Falcon-H1R-7B
TII UAE———
15.8
——
Grok 4 Fast (Non-reasoning)
xAI———
23.1
196 tok/s0.4s
Hermes 4 - Llama-3.1 405B (Non-reasoning)
Nous Research———
17.6
32 tok/s0.9s
Qwen3 32B (Non-reasoning)
Alibaba———
14.5
102 tok/s1.2s
Llama 3.1 Tulu3 405B
Allen Institute for AI———
14.1
——
Nova 2.0 Omni (Non-reasoning)
Amazon———
16.6
227 tok/s0.9s
Qwen2.5 Instruct 72B
Alibaba———
15.6
55 tok/s1.2s
gpt-oss-20B (low)
OpenAI———
20.8
261 tok/s0.4s
Gemini 3.1 Flash-Lite
Google DeepMind$0.01$0.041M
72%
500 tok/s—
Mistral Small
Mistral AI$0.1$0.332K
72%
200 tok/s—
Command R
Cohere$0.15$0.6128K
72%
150 tok/s—
Gemini 2.5 Flash-Lite (Non-reasoning)
Google———
12.7
260 tok/s0.4s
Gemini 2.0 Flash-Lite (Feb '25)
Google———
14.7
——
Devstral Medium
Mistral———
18.7
145 tok/s0.5s
K2-V2 (low)
MBZUAI Institute of Foundation Models———
14.4
——
Qwen3 30B A3B (Non-reasoning)
Alibaba———
12.5
67 tok/s1.2s
Qwen3 Coder 30B A3B Instruct
Alibaba———
20
113 tok/s1.4s
Llama 3.3 Instruct 70B
Meta———
14.5
96 tok/s0.6s
Grok 2 (Dec '24)
xAI———
13.9
——
Command A
Cohere———
13.5
40 tok/s0.6s
Falcon 180B
Open
TIIFreeFree4K
70.4%
20 tok/s—
Llama 3.3 Nemotron Super 49B v1 (Non-reasoning)
NVIDIA———
14.3
——
Sarvam M (Reasoning)
Sarvam———
8.4
——
Claude 3 Opus
Anthropic———
18
——
Qwen3 VL 4B (Reasoning)
Alibaba———
13.7
——
Mistral Large 2 (Nov '24)
Mistral———
15.1
41 tok/s0.5s
Qwen2.5 Instruct 32B
Alibaba———
13.2
——
Grok Beta
xAI———
13.3
——
Qwen3 4B (Reasoning)
Alibaba———
14.2
104 tok/s1.0s
Pixtral Large
Mistral———
14
51 tok/s0.5s
Qwen3 VL 8B Instruct
Alibaba———
14.3
148 tok/s0.9s
Ministral 3 14B
Mistral———
16
99 tok/s0.3s
Nova Pro
Amazon———
13.5
——
Llama 3.1 Nemotron Instruct 70B
NVIDIA———
13.4
46 tok/s0.3s
Sonar
Perplexity———
15.5
——
Llama Nemotron Super 49B v1.5 (Non-reasoning)
NVIDIA———
14.6
58 tok/s0.3s
GPT-4 Turbo
OpenAI———
13.7
32 tok/s1.2s
Llama 3.1 Instruct 70B
Meta———
12.5
31 tok/s0.8s
Mistral Medium 3.1
Mistral———
21.3
89 tok/s0.4s
Mistral Small 3.2
Mistral———
15.1
155 tok/s0.3s
Mistral Large 2 (Jul '24)
Mistral———
13
——
Qwen3 14B (Non-reasoning)
Alibaba———
12.8
65 tok/s1.0s
Gemini 1.5 Flash (Sep '24)
Google———
13.8
——
Devstral Small 2
Mistral———
19.5
80 tok/s0.7s
Ling-mini-2.0
InclusionAI———
9.2
——
Qwen3 4B 2507 Instruct
Alibaba———
12.9
——
Llama 3.2 Instruct 90B (Vision)
Meta———
11.9
42 tok/s0.5s
Reka Flash 3
Reka AI———
9.5
94 tok/s1.3s
Olmo 3 7B Think
Allen Institute for AI———
9.4
——
Hermes 4 - Llama-3.1 70B (Non-reasoning)
Nous Research———
12.6
63 tok/s0.6s
GPT-4.1 nano
OpenAI———
13
200 tok/s0.4s
Gemini 1.5 Pro (May '24)
Google———
12
——
Mistral Small 3.1
Mistral———
14.5
153 tok/s0.5s
NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning)
NVIDIA———
10.1
175 tok/s0.7s
Mistral Small 3
Mistral———
12.7
154 tok/s0.5s
QwQ 32B-Preview
Alibaba———
15.2
43 tok/s0.5s
Qwen3 8B (Non-reasoning)
Alibaba———
10.6
94 tok/s0.9s
Ministral 3 8B
Mistral———
14.8
180 tok/s0.3s
Qwen2.5 Coder Instruct 32B
Alibaba———
12.9
——
Devstral Small (May '25)
Mistral———
18
——
Claude 3.5 Haiku
Anthropic———
18.7
——
Qwen3 VL 4B Instruct
Alibaba———
9.6
——
Qwen2.5 Turbo
Alibaba———
12
68 tok/s1.2s
Devstral Small (Jul '25)
Mistral———
15.2
202 tok/s0.4s
Qwen2 Instruct 72B
Alibaba———
11.7
——
Granite 4.0 H Small
IBM———
10.8
453 tok/s8.7s
Mistral Saba
Mistral———
12.1
——
Gemma 3 12B Instruct
Google———
8.8
30 tok/s10.2s
Nova Lite
Amazon———
12.7
221 tok/s0.7s
Exaone 4.0 1.2B (Reasoning)
LG AI Research———
8.3
——
Kimi Linear 48B A3B Instruct
Kimi———
14.4
——
Qwen3 4B (Non-reasoning)
Alibaba———
12.5
105 tok/s1.0s
Claude 3 Sonnet
Anthropic———
10.3
——
Jamba 1.7 Large
AI21 Labs———
10.9
49 tok/s1.1s
Jamba Reasoning 3B
AI21 Labs———
9.6
——
NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning)
NVIDIA———
13.2
78 tok/s0.3s
DeepHermes 3 - Mistral 24B Preview (Non-reasoning)
Nous Research———
10.9
——
Jamba 1.5 Large
AI21 Labs———
10.7
——
Llama 3 Instruct 70B
Meta———
8.9
42 tok/s0.7s
Gemini 1.5 Flash-8B
Google———
11.1
——
Hermes 3 - Llama-3.1 70B
Nous Research———
10.6
28 tok/s0.4s
Qwen3 1.7B (Reasoning)
Alibaba———
8
138 tok/s1.0s
Gemini 1.5 Flash (May '24)
Google———
10.5
——
Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning)
NVIDIA———
14.4
——
Jamba 1.6 Large
AI21 Labs———
10.6
48 tok/s0.9s
GPT-5 nano (minimal)
OpenAI———
13.8
142 tok/s1.0s
DeepSeek R1 Distill Llama 8B
DeepSeek———
12.1
——
Mixtral 8x22B Instruct
Mistral———
9.8
——
Nova Micro
Amazon———
10.3
314 tok/s0.6s
Ministral 3 3B
Mistral———
11.2
307 tok/s0.3s
Olmo 3 7B Instruct
Allen Institute for AI———
8.2
——
OLMo 2 32B
Allen Institute for AI———
10.6
——
LFM2 8B A1B
Liquid AI———
7
——
Claude 2.1
Anthropic———
9.3
——
Exaone 4.0 1.2B (Non-reasoning)
LG AI Research———
8.1
——
Gemma 3n E4B Instruct
Google———
6.4
14 tok/s0.4s
Claude 2.0
Anthropic———
9.1
——
Mistral Medium
Mistral———
9
89 tok/s0.4s
Phi-4 Multimodal Instruct
Microsoft Azure———
10
16 tok/s0.4s
Llama 3.1 Instruct 8B
Meta———
11.8
170 tok/s0.4s
Gemma 3n E4B Instruct Preview (May '25)
Google———
10.1
——
Granite 3.3 8B (Non-reasoning)
IBM———
7
427 tok/s7.3s
Phi-4 Mini Instruct
Microsoft Azure———
8.4
44 tok/s0.3s
Qwen2.5 Coder Instruct 7B
Alibaba———
10
——
Llama 3.2 Instruct 11B (Vision)
Meta———
8.7
79 tok/s0.5s
GPT-3.5 Turbo
OpenAI———
9
89 tok/s0.5s
Granite 4.0 Micro
IBM———
7.7
——
Phi-3 Mini Instruct 3.8B
Microsoft Azure———
10.1
——
Gemini 1.0 Pro
Google———
8.5
——
Claude Instant
Anthropic———
7.4
——
DeepSeek Coder V2 Lite Instruct
DeepSeek———
8.5
——
LFM 40B
Liquid AI———
8.8
——
Command-R+ (Apr '24)
Cohere———
8.3
——
Gemma 3 4B Instruct
Google———
6.3
30 tok/s1.1s
Mistral Small (Feb '24)
Mistral———
9
154 tok/s0.5s
Qwen3 1.7B (Non-reasoning)
Alibaba———
6.8
141 tok/s0.9s
Llama 2 Chat 13B
Meta———
8.4
——
Llama 2 Chat 70B
Meta———
8.4
——
Llama 3 Instruct 8B
Meta———
6.4
82 tok/s0.5s
Mixtral 8x7B Instruct
Mistral———
7.7
——
Jamba 1.7 Mini
AI21 Labs———
8.1
——
Gemma 3n E2B Instruct
Google———
4.8
51 tok/s0.5s
Molmo 7B-D
Allen Institute for AI———
9.2
——
Jamba 1.5 Mini
AI21 Labs———
8
——
DeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning)
Nous Research———
7.6
——
Jamba 1.6 Mini
AI21 Labs———
7.9
178 tok/s0.8s
Qwen3 0.6B (Reasoning)
Alibaba———
6.5
189 tok/s0.9s
Llama 3.2 Instruct 3B
Meta———
9.7
53 tok/s0.6s
Command-R (Mar '24)
Cohere———
7.4
——
Granite 4.0 1B
IBM———
7.3
——
OpenChat 3.5 (1210)
OpenChat———
8.3
——
LFM2 2.6B
Liquid AI———
8
——
OLMo 2 7B
Allen Institute for AI———
9.3
——
Granite 4.0 H 1B
IBM———
8
——
DeepSeek R1 Distill Qwen 1.5B
DeepSeek———
9.1
——
LFM2 1.2B
Liquid AI———
6.3
——
Mistral 7B Instruct
Mistral———
7.4
190 tok/s0.3s
Qwen3 0.6B (Non-reasoning)
Alibaba———
5.7
194 tok/s0.9s
Llama 3.2 Instruct 1B
Meta———
6.3
88 tok/s0.6s
Llama 2 Chat 7B
Meta———
9.7
108 tok/s12.6s
Gemma 3 1B Instruct
Google———
5.5
48 tok/s0.6s
Granite 4.0 H 350M
IBM———
5.4
——
Granite 4.0 350M
IBM———
6.1
——
Gemma 3 270M
Google———
7.7
——
Qwen3.5 27B (Non-reasoning)
Alibaba———
37.2
92 tok/s1.4s
Gemini 3.1 Flash-Lite Preview
Google———
33.5
319 tok/s5.7s
GLM-4.7-Flash (Non-reasoning)
Z AI———
22.1
105 tok/s1.0s
Qwen3.5 35B A3B (Reasoning)
Alibaba———
37.1
149 tok/s1.2s
GLM-4.7-Flash (Reasoning)
Z AI———
30.1
91 tok/s0.9s
Gemma 4 E4B (Reasoning)
Google———
18.8
——
GPT-5.4 mini (medium)
OpenAI———
37.7
181 tok/s6.3s
Qwen3.5 2B (Reasoning)
Alibaba———
16.3
——
GPT-5.4 mini (xhigh)
OpenAI———
48.9
189 tok/s6.9s
Qwen3.5 9B (Reasoning)
Alibaba———
32.4
56 tok/s0.4s
Qwen3 Coder Next
Alibaba———
28.3
165 tok/s0.8s
Gemma 4 31B (Non-reasoning)
Google———
32.3
——
Nemotron Cascade 2 30B A3B
NVIDIA———
28.4
——
Step 3.5 Flash
StepFun———
37.8
163 tok/s0.8s
Qwen3.5 4B (Non-reasoning)
Alibaba———
22.6
178 tok/s0.3s
Qwen3.5 0.8B (Non-reasoning)
Alibaba———
9.9
285 tok/s0.3s
Qwen3.5 2B (Non-reasoning)
Alibaba———
14.7
232 tok/s0.3s
Grok-1
xAI———
11.7
——
Qwen3.5 0.8B (Reasoning)
Alibaba———
10.5
——
Qwen3.5 397B A17B (Reasoning)
Alibaba———
45
52 tok/s1.5s
GLM-5 (Reasoning)
Z AI———
49.8
67 tok/s0.9s
Gemini 3 Deep Think
Google——————
Tiny Aya Global
Cohere———
4.7
——
Gemma 4 26B A4B (Non-reasoning)
Google———
27.1
——
Muse Spark
Meta———
52.1
——
GLM 5V Turbo (Reasoning)
Z AI———
42.9
——
Qwen Chat 72B
Alibaba———
8.8
——
Gemma 4 31B (Reasoning)
Google———
39.2
35 tok/s1.0s
Arctic Instruct
Snowflake———
8.8
——
GPT-5.4 nano (medium)
OpenAI———
38.1
158 tok/s3.8s
Qwen1.5 Chat 110B
Alibaba———
9.5
——
GLM-5-Turbo
Z AI———
46.8
——
GLM-5.1 (Reasoning)
Z AI———
51.4
43 tok/s1.2s
GLM-5 (Non-reasoning)
Z AI———
40.6
53 tok/s1.4s
Trinity Large Thinking
Arcee AI———
31.9
127 tok/s0.6s
Apertus 8B Instruct
Swiss AI Initiative———
5.9
——
Apertus 70B Instruct
Swiss AI Initiative———
7.7
——
Tri-21B-Think
Trillion Labs———
18.6
——
Nanbeige4.1-3B
Nanbeige———
16.1
——
Ling 2.6 Flash
InclusionAI———
26.2
202 tok/s0.8s
Tri-21B-think Preview
Trillion Labs———
20
——
LongCat Flash Lite
LongCat———
23.9
115 tok/s3.9s
Step 3.5 Flash 2603
StepFun———
38.5
186 tok/s0.8s
Mercury 2
Inception———
32.8
872 tok/s4.7s
o1-preview
OpenAI———
23.7
——
Kimi K2.5 (Non-reasoning)
Kimi———
37.3
32 tok/s1.4s
K2 Think V2
MBZUAI Institute of Foundation Models———
24.1
——
GPT-5.4 nano (Non-Reasoning)
OpenAI———
24.4
161 tok/s0.6s
Sarvam 105B (high)
Sarvam———
18.2
124 tok/s1.2s
Olmo 3.1 32B Instruct
Allen Institute for AI———
12.2
54 tok/s0.3s
Sarvam 30B (high)
Sarvam———
12.3
294 tok/s1.2s
MiMo-V2-Omni-0327
Xiaomi———
44.9
——
Claude Opus 4.7 (Adaptive Reasoning, Max Effort)
Anthropic———
57.3
57 tok/s11.6s
Step3 VL 10B
StepFun———
15.4
——
KAT Coder Pro V2
KwaiKAT———
43.8
114 tok/s1.8s
GPT-5.4 (xhigh)
OpenAI———
56.8
81 tok/s157.8s
Mistral Small 4 (Non-reasoning)
Mistral———
18.6
149 tok/s0.5s
MiMo-V2-Pro
Xiaomi———
49.2
67 tok/s2.1s
GPT-5.4 (Non-reasoning)
OpenAI———
35.4
62 tok/s0.7s
MiMo-V2-Omni
Xiaomi———
43.4
——
JT-MINI
China Mobile———
25.4
——
GLM-5.1 (Non-reasoning)
Z AI———
43.8
47 tok/s2.1s
GPT-5.3 Codex (xhigh)
OpenAI———
53.6
85 tok/s60.3s
Qwen3.5 9B (Non-reasoning)
Alibaba———
27.3
143 tok/s0.3s
GPT-5.4 Pro (xhigh)
OpenAI——————
Gemma 4 26B A4B (Reasoning)
Google———
31.2
——
MiMo-V2-Flash (Feb 2026)
Xiaomi———
41.5
127 tok/s1.5s
Qwen Chat 14B
Alibaba———
7.4
——
GPT-5.4 mini (Non-Reasoning)
OpenAI———
23.3
176 tok/s0.6s
DeepSeek-V2-Chat
DeepSeek———
9.1
——
Kimi K2.6
Kimi———
53.9
135 tok/s0.8s
Qwen3.6 35B A3B (Reasoning)
Alibaba———
43.5
238 tok/s1.7s
Qwen3.6 35B A3B (Non-reasoning)
Alibaba———
31.5
193 tok/s1.5s
Molmo2-8B
Allen Institute for AI———
7.3
——
Grok 4.20 0309 v2 (Reasoning)
xAI———
49.3
175 tok/s15.5s
PALM-2
Google———
8.6
——
Gemini 2.0 Flash Thinking Experimental (Dec '24)
Google———
12.3
——
Grok 4.20 0309 v2 (Non-reasoning)
xAI———
29
177 tok/s0.4s
Gemini 1.0 Ultra
Google———
10.1
——
LFM2.5-VL-1.6B
Liquid AI———
6.2
——
Qwen3.6 Max Preview
Alibaba———
51.8
57 tok/s1.9s
Claude 3 Haiku
Anthropic———
12.3
131 tok/s0.5s
R1 1776
Perplexity———
12
——
Gemini 2.0 Flash-Lite (Preview)
Google———
14.5
——
Solar Pro 3
Upstage———
25.9
——
Codestral
Mistral AI$0.3$0.932K—180 tok/s—
GPT-4.5 (Preview)
OpenAI———
20
——
LFM2.5-1.2B-Instruct
Liquid AI———
8
——
GPT-4o mini Realtime (Dec '24)
OpenAI——————
Claude 4.1 Opus (Non-reasoning)
Anthropic———
36
39 tok/s1.4s
GPT-4
OpenAI———
12.8
35 tok/s0.8s
GPT-4o Realtime (Dec '24)
OpenAI——————
Claude Opus 4.7 (Non-reasoning, High Effort)
Anthropic———
51.8
53 tok/s1.2s
GPT-5.2 Codex (xhigh)
OpenAI———
49
107 tok/s7.4s
LFM2.5-1.2B-Thinking
Liquid AI———
8.1
——
MiniMax-M2.7
MiniMax———
49.6
47 tok/s1.6s
NVIDIA Nemotron 3 Super 120B A12B (Reasoning)
NVIDIA———
36
154 tok/s1.1s
GPT-4o (Aug '24)
OpenAI———
18.6
108 tok/s0.6s
DeepSeek-V2.5 (Dec '24)
DeepSeek———
12.5
——
Solar Open 100B (Reasoning)
Upstage———
21.7
——
o1-pro
OpenAI———
25.8
——
o3-pro
OpenAI———
40.7
19 tok/s95.4s
DeepSeek-V2.5
DeepSeek———
12.3
——
DeepSeek-Coder-V2
DeepSeek———
10.6
——
DeepSeek LLM 67B Chat (V1)
DeepSeek———
8.4
——
Gemini 3.1 Pro Preview
Google———
57.2
124 tok/s28.7s
GPT-3.5 Turbo (0613)
OpenAI——————
LFM2 24B A2B
Liquid AI———
10.5
163 tok/s0.3s
Claude Sonnet 4.6 (Non-reasoning, Low Effort)
Anthropic———
42.6
60 tok/s1.0s
Qwen3.5 35B A3B (Non-reasoning)
Alibaba———
30.7
153 tok/s1.1s
Grok 4.20 0309 (Non-reasoning)
xAI———
29.7
164 tok/s0.4s
Qwen3.5 27B (Reasoning)
Alibaba———
42.1
92 tok/s1.4s
Sonar Reasoning
Perplexity———
17.9
——
Qwen3.5 Omni Flash
Alibaba———
25.9
170 tok/s1.2s
Qwen3.5 122B A10B (Non-reasoning)
Alibaba———
35.9
152 tok/s1.1s
Mistral Small 4 (Reasoning)
Mistral———
27.8
173 tok/s0.5s
Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort)
Anthropic———
51.7
72 tok/s46.6s
Grok 4.20 0309 (Reasoning)
xAI———
48.5
183 tok/s16.1s
Qwen3.5 397B A17B (Non-reasoning)
Alibaba———
40.1
52 tok/s1.4s
Grok 3 Reasoning Beta
xAI———
21.6
——
Qwen3.6 Plus
Alibaba———
50
53 tok/s1.6s
Solar Mini
Upstage———
11.9
87 tok/s1.4s
GPT-5.4 nano (xhigh)
OpenAI———
44
157 tok/s2.5s
MiniMax-M2.5
MiniMax———
41.9
59 tok/s2.1s
Qwen3.5 4B (Reasoning)
Alibaba———
27.1
177 tok/s0.3s
Qwen3.5 Omni Plus
Alibaba———
38.6
55 tok/s1.3s
Gemma 4 E2B (Non-reasoning)
Google———
12.1
——
Gemma 4 E4B (Non-reasoning)
Google———
14.8
——
Claude Opus 4.6 (Adaptive Reasoning, Max Effort)
Anthropic———
53
53 tok/s11.7s
Kimi K2.5 (Reasoning)
Kimi———
46.8
32 tok/s1.3s
Sonar Reasoning Pro
Perplexity———
24.6
——
Llama 65B
Meta———
7.4
——
NVIDIA Nemotron 3 Nano 4B
NVIDIA———
14.7
——
Reka Flash (Sep '24)
Reka AI———
12
85 tok/s1.3s
Qwen3 Max Thinking
Alibaba———
39.9
36 tok/s1.7s
Gemma 4 E2B (Reasoning)
Google———
15.2
——
Qwen3.5 122B A10B (Reasoning)
Alibaba———
41.6
159 tok/s1.1s
∞AI

Everything AI. In one place.

Platform

ToolsModelsJobsHackathonsSubmit

Company

AboutContact

Stay updated

Get weekly AI news in your inbox

© 2026 ∞AI. Built for the AI community.everythingai.tech

Estimate Your Monthly Cost

Enter your expected usage to compare costs across models

e.g. 1,000,000 = ~750,000 words

Usually 30–50% of input volume

6 models selected

ModelInput CostOutput CostTotal/Monthvs Cheapest
Llama 3.3 70B
Meta AI
$0.23$0.46$0.69✓ Best value
DeepSeek R2
DeepSeek
$0.55$1.09$1.652.4× more
GPT-4.1
OpenAI
$2.00$4.00$6.008.7× more
Claude Sonnet 4.6
Anthropic
$3.00$7.50$10.5015.2× more
GPT-4o
OpenAI
$5.00$7.50$12.5018.1× more
Claude Opus 4.6
Anthropic
$15.00$37.50$52.5076.1× more

Prices are approximate and may vary. Check provider documentation for current pricing.