AI Model Comparison

Compare pricing, benchmarks, and capabilities across 562 AI models

562 models tracked9 open source

All Language Models Text → Image Text → Video Text → Speech Image → Video

Type

Provider

All AI21 Labs Alibaba Alibaba Cloud Allen Institute for AI Amazon Anthropic Arcee AI Baidu ByteDance Seed Cartesia China Mobile Cohere Coqui Databricks Deep Cogito DeepSeek ElevenLabs Fish Audio Google Google DeepMind Hume AI IBM Inception InclusionAI Inworld Kimi Kokoro Korea Telecom KwaiKAT LG AI Research LMNT Liquid AI LongCat MBZUAI Institute of Foundation Models Maya Research Meta Meta AI MetaVoice Microsoft Microsoft Azure MiniMax Mistral Mistral AI Motif Technologies Murf AI NVIDIA Nanbeige Naver Neuphonic Nous Research OpenAI OpenChat OpenVoice Perplexity Prime Intellect Reka AI Resemble AI Sarvam ServiceNow Smallest.ai Snowflake Speechify StepFun StyleTTS Swiss AI Initiative TII TII UAE Trillion Labs Upstage Xiaomi Z AI Zyphra async xAI

Price

Any Free <$1/M <$5/M <$20/M

Sort

Best Benchmark Cheapest First Most Expensive Largest Context Fastest

Model	Provider	Input $/1M↕	Output $/1M↕	Context↕	Intelligence↑	Speed↕	Latency
DeepSeek R2 ★	DeepSeek	$0.55	$2.19	128K	91%	60 tok/s	—
GPT-4.1 ★	OpenAI	$2	$8	1M	90.5%	80 tok/s	—
Claude Opus 4.6 ★	Anthropic	$15	$75	200K	88.7%	60 tok/s	—
GPT-4o ★	OpenAI	$5	$15	128K	87.2%	120 tok/s	—
Claude Sonnet 4.6 ★	Anthropic	$3	$15	200K	86.8%	100 tok/s	—
Llama 3.3 70B Open★	Meta AI	$0.23	$0.92	128K	86%	80 tok/s	—
o3	OpenAI	$10	$40	200K	96.7%	40 tok/s	—
o4-mini	OpenAI	$1.1	$4.4	200K	93.4%	100 tok/s	—
Gemini 3 Ultra	Google DeepMind	$7	$21	1M	90.1%	70 tok/s	—
Claude Opus 4.5 (Reasoning)	Anthropic	—	—	—	49.7	68 tok/s	13.5s
Gemini 3 Pro Preview (low)	Google	—	—	—	41.3	—	—
Claude Opus 4.5 (Non-reasoning)	Anthropic	—	—	—	43.1	53 tok/s	1.1s
Gemini 3 Flash Preview (Reasoning)	Google	—	—	—	46.4	197 tok/s	6.1s
DeepSeek V3 Open	DeepSeek	$0.27	$1.1	128K	88.5%	80 tok/s	—
Claude 4.1 Opus (Reasoning)	Anthropic	—	—	—	42	37 tok/s	8.2s
Claude 4.5 Sonnet (Reasoning)	Anthropic	—	—	—	43	56 tok/s	11.4s
MiniMax-M2.1	MiniMax	—	—	—	39.4	74 tok/s	1.5s
Grok 3	xAI	$3	$15	131K	87.5%	90 tok/s	—
Llama 3.1 405B Open	Meta AI	$3	$3	128K	87.3%	30 tok/s	—
Gemini 3 Pro	Google DeepMind	$3.5	$10.5	1M	87%	100 tok/s	—
GPT-5.1 (high)	OpenAI	—	—	—	47.7	121 tok/s	33.8s
GPT-5 Codex (high)	OpenAI	—	—	—	44.6	208 tok/s	8.0s
GPT-5 (medium)	OpenAI	—	—	—	42	83 tok/s	50.4s
GPT-5.2 (xhigh)	OpenAI	—	—	—	51.3	76 tok/s	109.3s
Grok 4	xAI	—	—	—	41.5	60 tok/s	7.7s
GPT-5 (high)	OpenAI	—	—	—	44.6	82 tok/s	101.8s
Qwen3-Max	Alibaba Cloud	$0.4	$1.2	32K	87%	90 tok/s	—
Claude 4 Opus (Reasoning)	Anthropic	—	—	—	39	39 tok/s	7.6s
GPT-5.2 (medium)	OpenAI	—	—	—	46.6	—	—
GPT-5.1 Codex (high)	OpenAI	—	—	—	43.1	170 tok/s	6.4s
Gemini 2.5 Pro Preview (Mar' 25)	Google	—	—	—	30.3	—	—
DeepSeek V3.2 (Reasoning)	DeepSeek	—	—	—	41.7	32 tok/s	1.4s
DeepSeek V3.2 Speciale	DeepSeek	—	—	—	29.4	—	—
GPT-5 (low)	OpenAI	—	—	—	39.2	79 tok/s	10.2s
Gemini 2.5 Pro	Google	—	—	—	34.6	134 tok/s	21.4s
Claude 4 Opus (Non-reasoning)	Anthropic	—	—	—	33	37 tok/s	1.3s
Claude 4.5 Sonnet (Non-reasoning)	Anthropic	—	—	—	37.1	43 tok/s	1.0s
GLM-4.7 (Reasoning)	Z AI	—	—	—	42.1	107 tok/s	0.7s
Doubao Seed Code	ByteDance Seed	—	—	—	33.5	—	—
DeepSeek V3.1 (Reasoning)	DeepSeek	—	—	—	27.7	—	—
Grok 4 Fast (Reasoning)	xAI	—	—	—	35.1	214 tok/s	2.9s
Qwen3-72B Open	Alibaba Cloud	Free	Free	32K	85%	100 tok/s	—
Kimi K2 Thinking	Kimi	—	—	—	40.9	50 tok/s	1.0s
DeepSeek V3.2 Exp (Reasoning)	DeepSeek	—	—	—	32.9	33 tok/s	1.4s
DeepSeek R1 0528 (May '25)	DeepSeek	—	—	—	27.1	—	—
Grok 4.1 Fast (Reasoning)	xAI	—	—	—	38.6	151 tok/s	9.8s
Cogito v2.1 (Reasoning)	Deep Cogito	—	—	—	85%	61 tok/s	0.5s
DeepSeek V3.1 Terminus (Reasoning)	DeepSeek	—	—	—	33.9	—	—
Phi-4 Open	Microsoft	$0.07	$0.14	16K	84.8%	300 tok/s	—
Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning)	Google	—	—	—	25.7	—	—
Claude 3.7 Sonnet (Reasoning)	Anthropic	—	—	—	34.7	—	—
GPT-5 mini (high)	OpenAI	—	—	—	41.2	91 tok/s	140.6s
MiMo-V2-Flash (Reasoning)	Xiaomi	—	—	—	39.2	134 tok/s	1.7s
Qwen3 VL 235B A22B (Reasoning)	Alibaba	—	—	—	27.6	48 tok/s	1.3s
Qwen3 Max (Preview)	Alibaba	—	—	—	26.1	45 tok/s	1.8s
DeepSeek V3.1 Terminus (Non-reasoning)	DeepSeek	—	—	—	28.5	—	—
Qwen3 235B A22B 2507 (Reasoning)	Alibaba	—	—	—	29.5	40 tok/s	1.4s
o1	OpenAI	—	—	—	30.8	129 tok/s	18.5s
K-EXAONE (Reasoning)	LG AI Research	—	—	—	32.1	—	—
Gemini 2.5 Pro Preview (May' 25)	Google	—	—	—	29.5	—	—
DeepSeek V3.2 Exp (Non-reasoning)	DeepSeek	—	—	—	28.4	33 tok/s	1.3s
DeepSeek R1 (Jan '25)	DeepSeek	—	—	—	18.8	—	—
Mistral Large	Mistral AI	$2	$6	128K	84%	90 tok/s	—
Claude 4 Sonnet (Non-reasoning)	Anthropic	—	—	—	33	47 tok/s	0.8s
Gemini 2.5 Flash Preview (Sep '25) (Reasoning)	Google	—	—	—	31.1	—	—
Claude 4 Sonnet (Reasoning)	Anthropic	—	—	—	38.7	51 tok/s	9.1s
GLM-4.5 (Reasoning)	Z AI	—	—	—	26.4	42 tok/s	0.9s
DeepSeek V3.2 (Non-reasoning)	DeepSeek	—	—	—	32.1	32 tok/s	1.4s
DeepSeek V3.1 (Non-reasoning)	DeepSeek	—	—	—	28.1	—	—
Grok 3 Mini	xAI	$0.3	$0.5	131K	83%	160 tok/s	—
ERNIE 5.0 Thinking Preview	Baidu	—	—	—	29.1	—	—
GPT-5 mini (medium)	OpenAI	—	—	—	38.9	83 tok/s	18.4s
Nova 2.0 Pro Preview (medium)	Amazon	—	—	—	35.7	144 tok/s	14.5s
Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)	NVIDIA	—	—	—	15	43 tok/s	0.7s
Grok 3 mini Reasoning (high)	xAI	—	—	—	32.1	217 tok/s	0.4s
GLM-4.6 (Reasoning)	Z AI	—	—	—	32.5	80 tok/s	0.7s
Qwen3 235B A22B 2507 Instruct	Alibaba	—	—	—	25	69 tok/s	1.2s
Qwen3 235B A22B (Reasoning)	Alibaba	—	—	—	19.8	64 tok/s	1.2s
Hermes 4 - Llama-3.1 405B (Reasoning)	Nous Research	—	—	—	18.6	34 tok/s	0.7s
Gemini 2.5 Flash (Reasoning)	Google	—	—	—	27	231 tok/s	14.9s
Qwen3 Next 80B A3B Instruct	Alibaba	—	—	—	20.1	172 tok/s	1.1s
Qwen3 Max Thinking (Preview)	Alibaba	—	—	—	32.5	43 tok/s	1.8s
Kimi K2	Kimi	—	—	—	26.3	34 tok/s	1.3s
Qwen3 Next 80B A3B (Reasoning)	Alibaba	—	—	—	26.7	169 tok/s	1.1s
Seed-OSS-36B-Instruct	ByteDance Seed	—	—	—	25.2	43 tok/s	1.6s
Qwen3 VL 32B (Reasoning)	Alibaba	—	—	—	24.7	95 tok/s	1.4s
Qwen3 VL 235B A22B Instruct	Alibaba	—	—	—	20.8	60 tok/s	1.1s
Kimi K2 0905	Kimi	—	—	—	30.9	24 tok/s	6.0s
GLM-4.5-Air	Z AI	—	—	—	23.2	67 tok/s	1.1s
MiniMax M1 80k	MiniMax	—	—	—	24.4	—	—
MiniMax-M2	MiniMax	—	—	—	36.1	68 tok/s	2.3s
Magistral Medium 1.2	Mistral	—	—	—	27.1	99 tok/s	0.5s
DeepSeek V3 0324	DeepSeek	—	—	—	22.3	—	—
GPT-4o mini	OpenAI	$0.15	$0.6	128K	82%	200 tok/s	—
Gemini 3 Flash	Google DeepMind	$0.075	$0.3	1M	82%	250 tok/s	—
Nova 2.0 Pro Preview (low)	Amazon	—	—	—	31.9	154 tok/s	6.0s
Nova 2.0 Lite (high)	Amazon	—	—	—	34.5	192 tok/s	17.9s
GPT-5.1 Codex mini (high)	OpenAI	—	—	—	38.6	208 tok/s	5.6s
GPT-5 (ChatGPT)	OpenAI	—	—	—	21.8	154 tok/s	0.6s
Ling-1T	InclusionAI	—	—	—	19	—	—
INTELLECT-3	Prime Intellect	—	—	—	22.2	—	—
EXAONE 4.0 32B (Reasoning)	LG AI Research	—	—	—	16.7	—	—
Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning)	Google	—	—	—	21.6	—	—
Qwen3 VL 30B A3B (Reasoning)	Alibaba	—	—	—	19.7	128 tok/s	1.0s
gpt-oss-120B (high)	OpenAI	—	—	—	33.3	212 tok/s	0.5s
Nova 2.0 Lite (medium)	Amazon	—	—	—	29.7	197 tok/s	15.3s
Llama Nemotron Super 49B v1.5 (Reasoning)	NVIDIA	—	—	—	18.7	66 tok/s	0.3s
Qwen3 30B A3B 2507 (Reasoning)	Alibaba	—	—	—	22.4	146 tok/s	1.1s
Ring-1T	InclusionAI	—	—	—	22.8	—	—
MiniMax M1 40k	MiniMax	—	—	—	20.9	—	—
Hermes 4 - Llama-3.1 70B (Reasoning)	Nous Research	—	—	—	16	74 tok/s	0.6s
GPT-5 (minimal)	OpenAI	—	—	—	23.9	72 tok/s	1.2s
Llama 4 Maverick	Meta	—	—	—	18.4	116 tok/s	0.6s
Gemini 2.5 Flash (Non-reasoning)	Google	—	—	—	20.6	189 tok/s	0.5s
Nova 2.0 Omni (medium)	Amazon	—	—	—	28	—	—
Mistral Large 3	Mistral	—	—	—	22.8	56 tok/s	0.6s
Gemini 2.0 Pro Experimental (Feb '25)	Google	—	—	—	18.1	—	—
Solar Pro 2 (Reasoning)	Upstage	—	—	—	14.9	—	—
KAT-Coder-Pro V1	KwaiKAT	—	—	—	36	119 tok/s	0.9s
K-EXAONE (Non-reasoning)	LG AI Research	—	—	—	23.4	—	—
Mi:dm K 2.5 Pro Preview	Korea Telecom	—	—	—	81%	—	—
Mi:dm K 2.5 Pro	Korea Telecom	—	—	—	23.1	—	—
GPT-5.2 (Non-reasoning)	OpenAI	—	—	—	33.6	63 tok/s	0.6s
Gemini 2.5 Flash Preview (Reasoning)	Google	—	—	—	24.3	—	—
Motif-2-12.7B-Reasoning	Motif Technologies	—	—	—	19.1	—	—
Claude 3.7 Sonnet (Non-reasoning)	Anthropic	—	—	—	30.8	—	—
Gemini 2.0 Flash Thinking Experimental (Jan '25)	Google	—	—	—	19.6	—	—
o3-mini (high)	OpenAI	—	—	—	25.2	156 tok/s	26.1s
Claude 4.5 Haiku (Non-reasoning)	Anthropic	—	—	—	31.1	100 tok/s	0.5s
GPT-4o (March 2025, chatgpt-4o-latest)	OpenAI	—	—	—	18.6	—	—
GLM-4.6V (Reasoning)	Z AI	—	—	—	23.4	29 tok/s	1.1s
Gemini 2.5 Flash-Lite Preview (Sep '25) (Non-reasoning)	Google	—	—	—	19.4	—	—
Qwen3 32B (Reasoning)	Alibaba	—	—	—	16.5	105 tok/s	1.1s
GPT-5.1 (Non-reasoning)	OpenAI	—	—	—	27.4	120 tok/s	0.8s
DeepSeek R1 Distill Llama 70B	DeepSeek	—	—	—	16	43 tok/s	0.5s
Nova 2.0 Omni (low)	Amazon	—	—	—	23.2	—	—
Llama 3.3 Nemotron Super 49B v1 (Reasoning)	NVIDIA	—	—	—	18.5	—	—
Qwen3 Omni 30B A3B (Reasoning)	Alibaba	—	—	—	15.6	92 tok/s	1.0s
NVIDIA Nemotron 3 Nano 30B A3B (Reasoning)	NVIDIA	—	—	—	24.3	162 tok/s	1.6s
Apriel-v1.6-15B-Thinker	ServiceNow	—	—	—	27.6	—	—
GLM-4.7 (Non-reasoning)	Z AI	—	—	—	34.2	96 tok/s	0.7s
HyperCLOVA X SEED Think (32B)	Naver	—	—	—	23.7	—	—
K2-V2 (high)	MBZUAI Institute of Foundation Models	—	—	—	20.6	—	—
GLM-4.5V (Reasoning)	Z AI	—	—	—	15.1	48 tok/s	0.8s
o3-mini	OpenAI	—	—	—	25.9	167 tok/s	8.5s
Grok Code Fast 1	xAI	—	—	—	28.7	189 tok/s	3.9s
Nova 2.0 Lite (low)	Amazon	—	—	—	24.6	206 tok/s	4.8s
Qwen3 Coder 480B A35B Instruct	Alibaba	—	—	—	24.8	62 tok/s	1.6s
Ring-flash-2.0	InclusionAI	—	—	—	14	84 tok/s	1.3s
Qwen3 VL 32B Instruct	Alibaba	—	—	—	17.2	78 tok/s	1.3s
Command R+	Cohere	$2.5	$10	128K	78%	80 tok/s	—
ERNIE 4.5 300B A47B	Baidu	—	—	—	15	29 tok/s	1.8s
Ling-flash-2.0	InclusionAI	—	—	—	15.7	99 tok/s	1.4s
GPT-4.1 mini	OpenAI	—	—	—	22.9	99 tok/s	0.5s
GPT-5 nano (high)	OpenAI	—	—	—	26.8	150 tok/s	86.4s
GPT-5 mini (minimal)	OpenAI	—	—	—	20.7	78 tok/s	1.0s
Gemini 2.0 Flash (experimental)	Google	—	—	—	16.8	—	—
Gemini 2.0 Flash (Feb '25)	Google	—	—	—	18.5	—	—
Gemini 2.5 Flash Preview (Non-reasoning)	Google	—	—	—	17.8	—	—
GLM-4.6 (Non-reasoning)	Z AI	—	—	—	30.2	88 tok/s	0.9s
gpt-oss-120B (low)	OpenAI	—	—	—	24.5	210 tok/s	0.5s
Qwen3 30B A3B 2507 Instruct	Alibaba	—	—	—	15	109 tok/s	1.1s
Qwen3 30B A3B (Reasoning)	Alibaba	—	—	—	15.3	70 tok/s	1.1s
GPT-4o (ChatGPT)	OpenAI	—	—	—	14.1	—	—
Solar Pro 2 (Preview) (Reasoning)	Upstage	—	—	—	18.8	—	—
Qwen3 14B (Reasoning)	Alibaba	—	—	—	16.2	64 tok/s	1.2s
EXAONE 4.0 32B (Non-reasoning)	LG AI Research	—	—	—	11.7	—	—
Apriel-v1.5-15B-Thinker	ServiceNow	—	—	—	28.3	—	—
Magistral Small 1.2	Mistral	—	—	—	18.2	176 tok/s	0.4s
Nova 2.0 Pro Preview (Non-reasoning)	Amazon	—	—	—	23.1	184 tok/s	0.7s
GPT-5 nano (medium)	OpenAI	—	—	—	25.9	154 tok/s	39.1s
Claude 3.5 Sonnet (Oct '24)	Anthropic	—	—	—	15.9	—	—
K2-V2 (medium)	MBZUAI Institute of Foundation Models	—	—	—	18.7	—	—
QwQ 32B	Alibaba	—	—	—	19.7	33 tok/s	0.4s
Devstral 2	Mistral	—	—	—	22	77 tok/s	0.7s
Mistral Medium 3	Mistral	—	—	—	18.8	54 tok/s	0.4s
Sonar Pro	Perplexity	—	—	—	15.2	—	—
Olmo 3.1 32B Think	Allen Institute for AI	—	—	—	13.9	—	—
Claude 4.5 Haiku (Reasoning)	Anthropic	—	—	—	37.1	145 tok/s	14.2s
Olmo 3 32B Think	Allen Institute for AI	—	—	—	12.1	—	—
Gemini 2.5 Flash-Lite (Reasoning)	Google	—	—	—	17.6	274 tok/s	17.2s
Qwen3 235B A22B (Non-reasoning)	Alibaba	—	—	—	17	65 tok/s	1.2s
Qwen2.5 Max	Alibaba	—	—	—	16.3	49 tok/s	1.2s
NVIDIA Nemotron Nano 12B v2 VL (Reasoning)	NVIDIA	—	—	—	14.9	152 tok/s	0.6s
Qwen3 VL 30B A3B Instruct	Alibaba	—	—	—	16.1	122 tok/s	1.1s
Claude Haiku 4.5	Anthropic	$0.8	$4	200K	75.2%	250 tok/s	—
Gemini 1.5 Pro (Sep '24)	Google	—	—	—	16	—	—
Claude 3.5 Sonnet (June '24)	Anthropic	—	—	—	14.2	—	—
GLM-4.5V (Non-reasoning)	Z AI	—	—	—	12.7	50 tok/s	30.9s
Gemma 3 27B Open	Google DeepMind	Free	Free	128K	75%	120 tok/s	—
Magistral Small 1	Mistral	—	—	—	16.8	—	—
Solar Pro 2 (Non-reasoning)	Upstage	—	—	—	13.6	—	—
Magistral Medium 1	Mistral	—	—	—	18.8	—	—
gpt-oss-20B (high)	OpenAI	—	—	—	24.5	276 tok/s	0.3s
GLM-4.6V (Non-reasoning)	Z AI	—	—	—	17.1	23 tok/s	4.1s
Llama 4 Scout	Meta	—	—	—	13.5	128 tok/s	0.5s
Qwen3 VL 8B (Reasoning)	Alibaba	—	—	—	16.7	130 tok/s	1.1s
Nova 2.0 Lite (Non-reasoning)	Amazon	—	—	—	18	173 tok/s	0.8s
DeepSeek R1 0528 Qwen3 8B	DeepSeek	—	—	—	16.4	—	—
Grok 4.1 Fast (Non-reasoning)	xAI	—	—	—	23.6	148 tok/s	0.4s
NVIDIA Nemotron Nano 9B V2 (Reasoning)	NVIDIA	—	—	—	14.8	109 tok/s	0.3s
MiMo-V2-Flash (Non-reasoning)	Xiaomi	—	—	—	30.4	138 tok/s	1.5s
Qwen3 8B (Reasoning)	Alibaba	—	—	—	13.2	83 tok/s	1.0s
Qwen3 4B 2507 (Reasoning)	Alibaba	—	—	—	18.2	—	—
o1-mini	OpenAI	—	—	—	20.4	—	—
NVIDIA Nemotron Nano 9B V2 (Non-reasoning)	NVIDIA	—	—	—	13.2	138 tok/s	0.7s
GPT-4o (May '24)	OpenAI	—	—	—	14.5	112 tok/s	0.6s
DeepSeek R1 Distill Qwen 14B	DeepSeek	—	—	—	15.8	—	—
DeepSeek R1 Distill Qwen 32B	DeepSeek	—	—	—	17.2	43 tok/s	0.4s
DBRX Open	Databricks	$0.75	$2.25	33K	73.7%	100 tok/s	—
Solar Pro 2 (Preview) (Non-reasoning)	Upstage	—	—	—	16	—	—
Qwen3 Omni 30B A3B Instruct	Alibaba	—	—	—	10.7	105 tok/s	1.1s
Nova Premier	Amazon	—	—	—	19	62 tok/s	1.1s
Qwen3 32B (Non-reasoning)	Alibaba	—	—	—	14.5	102 tok/s	1.2s
Hermes 4 - Llama-3.1 405B (Non-reasoning)	Nous Research	—	—	—	17.6	33 tok/s	0.8s
Llama 3.1 Instruct 405B	Meta	—	—	—	17.4	31 tok/s	0.7s
Falcon-H1R-7B	TII UAE	—	—	—	15.8	—	—
Grok 4 Fast (Non-reasoning)	xAI	—	—	—	23.1	204 tok/s	0.3s
Llama 3.2 11B Vision Open	Meta AI	$0.18	$0.18	128K	73%	150 tok/s	—
gpt-oss-20B (low)	OpenAI	—	—	—	20.8	263 tok/s	0.4s
Llama 3.1 Tulu3 405B	Allen Institute for AI	—	—	—	14.1	—	—
Qwen2.5 Instruct 72B	Alibaba	—	—	—	15.6	55 tok/s	1.2s
Gemini 2.5 Flash-Lite (Non-reasoning)	Google	—	—	—	12.7	279 tok/s	0.6s
Gemini 2.0 Flash-Lite (Feb '25)	Google	—	—	—	14.7	—	—
Command R	Cohere	$0.15	$0.6	128K	72%	150 tok/s	—
Mistral Small	Mistral AI	$0.1	$0.3	32K	72%	200 tok/s	—
Nova 2.0 Omni (Non-reasoning)	Amazon	—	—	—	16.6	223 tok/s	0.9s
Gemini 3.1 Flash-Lite	Google DeepMind	$0.01	$0.04	1M	72%	500 tok/s	—
Command A	Cohere	—	—	—	13.5	42 tok/s	0.5s
Qwen3 Coder 30B A3B Instruct	Alibaba	—	—	—	20	112 tok/s	1.5s
Llama 3.3 Instruct 70B	Meta	—	—	—	14.5	97 tok/s	0.6s
Grok 2 (Dec '24)	xAI	—	—	—	13.9	—	—
Devstral Medium	Mistral	—	—	—	18.7	139 tok/s	0.5s
Qwen3 30B A3B (Non-reasoning)	Alibaba	—	—	—	12.5	70 tok/s	1.2s
K2-V2 (low)	MBZUAI Institute of Foundation Models	—	—	—	14.4	—	—
Falcon 180B Open	TII	Free	Free	4K	70.4%	20 tok/s	—
Qwen3 VL 4B (Reasoning)	Alibaba	—	—	—	13.7	—	—
Llama 3.3 Nemotron Super 49B v1 (Non-reasoning)	NVIDIA	—	—	—	14.3	—	—
Qwen3 4B (Reasoning)	Alibaba	—	—	—	14.2	103 tok/s	1.0s
Mistral Large 2 (Nov '24)	Mistral	—	—	—	15.1	38 tok/s	0.5s
Pixtral Large	Mistral	—	—	—	14	52 tok/s	0.5s
Grok Beta	xAI	—	—	—	13.3	—	—
Qwen2.5 Instruct 32B	Alibaba	—	—	—	13.2	—	—
Claude 3 Opus	Anthropic	—	—	—	18	—	—
Sarvam M (Reasoning)	Sarvam	—	—	—	8.4	—	—
Qwen3 VL 8B Instruct	Alibaba	—	—	—	14.3	145 tok/s	1.0s
GPT-4 Turbo	OpenAI	—	—	—	13.7	34 tok/s	0.9s
Ministral 3 14B	Mistral	—	—	—	16	133 tok/s	0.3s
Nova Pro	Amazon	—	—	—	13.5	—	—
Llama 3.1 Nemotron Instruct 70B	NVIDIA	—	—	—	13.4	43 tok/s	0.4s
Sonar	Perplexity	—	—	—	15.5	—	—
Llama Nemotron Super 49B v1.5 (Non-reasoning)	NVIDIA	—	—	—	14.6	67 tok/s	0.3s
Devstral Small 2	Mistral	—	—	—	19.5	77 tok/s	0.5s
Mistral Medium 3.1	Mistral	—	—	—	21.3	82 tok/s	0.4s
Gemini 1.5 Flash (Sep '24)	Google	—	—	—	13.8	—	—
Qwen3 14B (Non-reasoning)	Alibaba	—	—	—	12.8	66 tok/s	1.1s
Mistral Small 3.2	Mistral	—	—	—	15.1	166 tok/s	0.4s
Llama 3.1 Instruct 70B	Meta	—	—	—	12.5	31 tok/s	0.7s
Mistral Large 2 (Jul '24)	Mistral	—	—	—	13	—	—
Llama 3.2 Instruct 90B (Vision)	Meta	—	—	—	11.9	42 tok/s	0.5s
Ling-mini-2.0	InclusionAI	—	—	—	9.2	—	—
Reka Flash 3	Reka AI	—	—	—	9.5	96 tok/s	1.1s
Qwen3 4B 2507 Instruct	Alibaba	—	—	—	12.9	—	—
Hermes 4 - Llama-3.1 70B (Non-reasoning)	Nous Research	—	—	—	12.6	71 tok/s	0.6s
Mistral Small 3.1	Mistral	—	—	—	14.5	148 tok/s	0.4s
GPT-4.1 nano	OpenAI	—	—	—	13	195 tok/s	0.4s
Gemini 1.5 Pro (May '24)	Google	—	—	—	12	—	—
Olmo 3 7B Think	Allen Institute for AI	—	—	—	9.4	—	—
QwQ 32B-Preview	Alibaba	—	—	—	15.2	44 tok/s	0.5s
NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning)	NVIDIA	—	—	—	10.1	174 tok/s	0.7s
Mistral Small 3	Mistral	—	—	—	12.7	152 tok/s	0.5s
Ministral 3 8B	Mistral	—	—	—	14.8	189 tok/s	0.3s
Qwen3 8B (Non-reasoning)	Alibaba	—	—	—	10.6	89 tok/s	1.0s
Qwen2.5 Coder Instruct 32B	Alibaba	—	—	—	12.9	—	—
Qwen3 VL 4B Instruct	Alibaba	—	—	—	9.6	—	—
Claude 3.5 Haiku	Anthropic	—	—	—	18.7	—	—
Devstral Small (May '25)	Mistral	—	—	—	18	—	—
Qwen2.5 Turbo	Alibaba	—	—	—	12	68 tok/s	1.2s
Devstral Small (Jul '25)	Mistral	—	—	—	15.2	200 tok/s	0.4s
Qwen2 Instruct 72B	Alibaba	—	—	—	11.7	—	—
Granite 4.0 H Small	IBM	—	—	—	10.8	416 tok/s	8.7s
Mistral Saba	Mistral	—	—	—	12.1	—	—
Gemma 3 12B Instruct	Google	—	—	—	8.8	31 tok/s	24.2s
Qwen3 4B (Non-reasoning)	Alibaba	—	—	—	12.5	103 tok/s	1.1s
Kimi Linear 48B A3B Instruct	Kimi	—	—	—	14.4	—	—
Exaone 4.0 1.2B (Reasoning)	LG AI Research	—	—	—	8.3	—	—
Nova Lite	Amazon	—	—	—	12.7	228 tok/s	0.7s
NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning)	NVIDIA	—	—	—	13.2	75 tok/s	0.3s
DeepHermes 3 - Mistral 24B Preview (Non-reasoning)	Nous Research	—	—	—	10.9	—	—
Claude 3 Sonnet	Anthropic	—	—	—	10.3	—	—
Jamba Reasoning 3B	AI21 Labs	—	—	—	9.6	—	—
Jamba 1.7 Large	AI21 Labs	—	—	—	10.9	52 tok/s	1.0s
Gemini 1.5 Flash-8B	Google	—	—	—	11.1	—	—
Hermes 3 - Llama-3.1 70B	Nous Research	—	—	—	10.6	28 tok/s	0.4s
Jamba 1.5 Large	AI21 Labs	—	—	—	10.7	—	—
Qwen3 1.7B (Reasoning)	Alibaba	—	—	—	8	140 tok/s	1.1s
Gemini 1.5 Flash (May '24)	Google	—	—	—	10.5	—	—
Llama 3 Instruct 70B	Meta	—	—	—	8.9	39 tok/s	0.7s
Jamba 1.6 Large	AI21 Labs	—	—	—	10.6	53 tok/s	1.0s
Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning)	NVIDIA	—	—	—	14.4	—	—
GPT-5 nano (minimal)	OpenAI	—	—	—	13.8	145 tok/s	1.0s
Mixtral 8x22B Instruct	Mistral	—	—	—	9.8	—	—
DeepSeek R1 Distill Llama 8B	DeepSeek	—	—	—	12.1	—	—
Nova Micro	Amazon	—	—	—	10.3	328 tok/s	0.6s
Ministral 3 3B	Mistral	—	—	—	11.2	294 tok/s	0.3s
Olmo 3 7B Instruct	Allen Institute for AI	—	—	—	8.2	—	—
OLMo 2 32B	Allen Institute for AI	—	—	—	10.6	—	—
LFM2 8B A1B	Liquid AI	—	—	—	7	—	—
Exaone 4.0 1.2B (Non-reasoning)	LG AI Research	—	—	—	8.1	—	—
Claude 2.1	Anthropic	—	—	—	9.3	—	—
Mistral Medium	Mistral	—	—	—	9	75 tok/s	0.4s
Claude 2.0	Anthropic	—	—	—	9.1	—	—
Phi-4 Multimodal Instruct	Microsoft Azure	—	—	—	10	15 tok/s	0.2s
Gemma 3n E4B Instruct	Google	—	—	—	6.4	14 tok/s	0.3s
Llama 3.1 Instruct 8B	Meta	—	—	—	11.8	159 tok/s	0.4s
Gemma 3n E4B Instruct Preview (May '25)	Google	—	—	—	10.1	—	—
Granite 3.3 8B (Non-reasoning)	IBM	—	—	—	7	375 tok/s	20.3s
Qwen2.5 Coder Instruct 7B	Alibaba	—	—	—	10	—	—
Phi-4 Mini Instruct	Microsoft Azure	—	—	—	8.4	44 tok/s	0.7s
Llama 3.2 Instruct 11B (Vision)	Meta	—	—	—	8.7	77 tok/s	0.5s
GPT-3.5 Turbo	OpenAI	—	—	—	9	107 tok/s	0.5s
Granite 4.0 Micro	IBM	—	—	—	7.7	—	—
Phi-3 Mini Instruct 3.8B	Microsoft Azure	—	—	—	10.1	—	—
Command-R+ (Apr '24)	Cohere	—	—	—	8.3	—	—
Gemini 1.0 Pro	Google	—	—	—	8.5	—	—
LFM 40B	Liquid AI	—	—	—	8.8	—	—
Claude Instant	Anthropic	—	—	—	7.4	—	—
DeepSeek Coder V2 Lite Instruct	DeepSeek	—	—	—	8.5	—	—
Mistral Small (Feb '24)	Mistral	—	—	—	9	146 tok/s	0.4s
Gemma 3 4B Instruct	Google	—	—	—	6.3	33 tok/s	1.1s
Qwen3 1.7B (Non-reasoning)	Alibaba	—	—	—	6.8	141 tok/s	0.9s
Llama 3 Instruct 8B	Meta	—	—	—	6.4	83 tok/s	0.5s
Llama 2 Chat 70B	Meta	—	—	—	8.4	—	—
Llama 2 Chat 13B	Meta	—	—	—	8.4	—	—
Jamba 1.7 Mini	AI21 Labs	—	—	—	8.1	—	—
Mixtral 8x7B Instruct	Mistral	—	—	—	7.7	—	—
Gemma 3n E2B Instruct	Google	—	—	—	4.8	52 tok/s	0.4s
Jamba 1.5 Mini	AI21 Labs	—	—	—	8	—	—
Jamba 1.6 Mini	AI21 Labs	—	—	—	7.9	186 tok/s	0.8s
DeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning)	Nous Research	—	—	—	7.6	—	—
Molmo 7B-D	Allen Institute for AI	—	—	—	9.2	—	—
Llama 3.2 Instruct 3B	Meta	—	—	—	9.7	54 tok/s	0.6s
Qwen3 0.6B (Reasoning)	Alibaba	—	—	—	6.5	185 tok/s	1.0s
Command-R (Mar '24)	Cohere	—	—	—	7.4	—	—
Granite 4.0 1B	IBM	—	—	—	7.3	—	—
OpenChat 3.5 (1210)	OpenChat	—	—	—	8.3	—	—
LFM2 2.6B	Liquid AI	—	—	—	8	—	—
OLMo 2 7B	Allen Institute for AI	—	—	—	9.3	—	—
Granite 4.0 H 1B	IBM	—	—	—	8	—	—
DeepSeek R1 Distill Qwen 1.5B	DeepSeek	—	—	—	9.1	—	—
LFM2 1.2B	Liquid AI	—	—	—	6.3	—	—
Mistral 7B Instruct	Mistral	—	—	—	7.4	193 tok/s	0.3s
Qwen3 0.6B (Non-reasoning)	Alibaba	—	—	—	5.7	190 tok/s	0.9s
Llama 3.2 Instruct 1B	Meta	—	—	—	6.3	87 tok/s	0.7s
Llama 2 Chat 7B	Meta	—	—	—	9.7	118 tok/s	1.5s
Gemma 3 1B Instruct	Google	—	—	—	5.5	51 tok/s	0.7s
Granite 4.0 H 350M	IBM	—	—	—	5.4	—	—
Granite 4.0 350M	IBM	—	—	—	6.1	—	—
Gemma 3 270M	Google	—	—	—	7.7	—	—
Standard	Google	—	—	—	—	—	—
Qwen3.5 Omni Flash	Alibaba	—	—	—	—	—	—
Octave 2	Hume AI	—	—	—	—	—	—
Nemotron Cascade 2 30B A3B	NVIDIA	—	—	—	28.4	—	—
Kimi K2.5 (Non-reasoning)	Kimi	—	—	—	37.3	31 tok/s	1.3s
Mercury 2	Inception	—	—	—	32.8	877 tok/s	4.4s
Molmo2-8B	Allen Institute for AI	—	—	—	7.3	—	—
MiMo-V2-Pro	Xiaomi	—	—	—	49.2	71 tok/s	1.9s
MiMo-V2-Omni-0327	Xiaomi	—	—	—	44.9	—	—
Sarvam 105B (high)	Sarvam	—	—	—	18.2	100 tok/s	1.3s
MiMo-V2-Omni	Xiaomi	—	—	—	43.4	—	—
MiMo-V2-Flash (Feb 2026)	Xiaomi	—	—	—	41.5	133 tok/s	1.3s
Neural2	Google	—	—	—	—	—	—
Sarvam 30B (high)	Sarvam	—	—	—	12.3	272 tok/s	1.2s
KAT Coder Pro V2	KwaiKAT	—	—	—	43.8	115 tok/s	1.8s
o1-preview	OpenAI	—	—	—	23.7	—	—
Olmo 3.1 32B Instruct	Allen Institute for AI	—	—	—	12.2	52 tok/s	0.3s
K2 Think V2	MBZUAI Institute of Foundation Models	—	—	—	24.1	—	—
LongCat Flash Lite	LongCat	—	—	—	23.9	146 tok/s	6.0s
Tri-21B-Think	Trillion Labs	—	—	—	18.6	—	—
Tri-21B-think Preview	Trillion Labs	—	—	—	20	—	—
Apertus 8B Instruct	Swiss AI Initiative	—	—	—	5.9	—	—
Nanbeige4.1-3B	Nanbeige	—	—	—	16.1	—	—
Apertus 70B Instruct	Swiss AI Initiative	—	—	—	7.7	—	—
Trinity Large Thinking	Arcee AI	—	—	—	31.9	126 tok/s	0.6s
GLM-5 (Reasoning)	Z AI	—	—	—	49.8	72 tok/s	0.9s
GLM 5V Turbo (Reasoning)	Z AI	—	—	—	42.9	—	—
GLM-5.1 (Reasoning)	Z AI	—	—	—	51.4	46 tok/s	1.0s
Step 3.5 Flash 2603	StepFun	—	—	—	38.5	188 tok/s	0.9s
GLM-5-Turbo	Z AI	—	—	—	46.8	—	—
GLM-5 (Non-reasoning)	Z AI	—	—	—	40.6	55 tok/s	1.4s
Tiny Aya Global	Cohere	—	—	—	4.7	—	—
Qwen3.5 2B (Non-reasoning)	Alibaba	—	—	—	14.7	241 tok/s	0.3s
Qwen3.5 397B A17B (Reasoning)	Alibaba	—	—	—	45	52 tok/s	1.5s
Qwen3.5 4B (Non-reasoning)	Alibaba	—	—	—	22.6	189 tok/s	0.3s
Qwen3.5 0.8B (Reasoning)	Alibaba	—	—	—	10.5	—	—
Qwen3.5 0.8B (Non-reasoning)	Alibaba	—	—	—	9.9	283 tok/s	0.3s
Step3 VL 10B	StepFun	—	—	—	15.4	—	—
Qwen3.5 9B (Reasoning)	Alibaba	—	—	—	32.4	125 tok/s	0.3s
Qwen3.6 Plus	Alibaba	—	—	—	50	52 tok/s	1.5s
Qwen3.5 4B (Reasoning)	Alibaba	—	—	—	27.1	186 tok/s	0.3s
Qwen3.5 27B (Non-reasoning)	Alibaba	—	—	—	37.2	89 tok/s	1.4s
Qwen3.5 Omni Flash	Alibaba	—	—	—	25.9	170 tok/s	1.0s
Qwen3.5 27B (Reasoning)	Alibaba	—	—	—	42.1	88 tok/s	1.4s
Qwen3.5 122B A10B (Reasoning)	Alibaba	—	—	—	41.6	162 tok/s	1.2s
Qwen3.5 122B A10B (Non-reasoning)	Alibaba	—	—	—	35.9	157 tok/s	1.2s
Qwen3.5 Omni Plus	Alibaba	—	—	—	38.6	51 tok/s	1.3s
Qwen3 Coder Next	Alibaba	—	—	—	28.3	152 tok/s	0.8s
Kimi K2.5 (Reasoning)	Kimi	—	—	—	46.8	33 tok/s	1.2s
Qwen3.5 2B (Reasoning)	Alibaba	—	—	—	16.3	—	—
Qwen3.5 35B A3B (Non-reasoning)	Alibaba	—	—	—	30.7	142 tok/s	1.1s
Qwen3.5 35B A3B (Reasoning)	Alibaba	—	—	—	37.1	145 tok/s	1.1s
Qwen3.5 397B A17B (Non-reasoning)	Alibaba	—	—	—	40.1	53 tok/s	1.5s
Qwen3 Max Thinking	Alibaba	—	—	—	39.9	34 tok/s	1.8s
Step 3.5 Flash	StepFun	—	—	—	37.8	169 tok/s	0.8s
Llama 65B	Meta	—	—	—	7.4	—	—
NVIDIA Nemotron 3 Nano 4B	NVIDIA	—	—	—	14.7	—	—
GPT-3.5 Turbo (0613)	OpenAI	—	—	—	—	—	—
o3-pro	OpenAI	—	—	—	40.7	19 tok/s	106.9s
GPT-5.2 Codex (xhigh)	OpenAI	—	—	—	49	110 tok/s	9.2s
Gemini 3.1 Flash TTS	Google	—	—	—	—	—	—
GPT-4o (Aug '24)	OpenAI	—	—	—	18.6	108 tok/s	0.5s
GPT-5.4 mini (medium)	OpenAI	—	—	—	37.7	177 tok/s	7.4s
NVIDIA Nemotron 3 Super 120B A12B (Reasoning)	NVIDIA	—	—	—	36	155 tok/s	1.2s
DeepSeek-V2.5	DeepSeek	—	—	—	12.3	—	—
o1-pro	OpenAI	—	—	—	25.8	—	—
Solar Open 100B (Reasoning)	Upstage	—	—	—	21.7	—	—
LFM2.5-VL-1.6B	Liquid AI	—	—	—	6.2	—	—
GPT-4.5 (Preview)	OpenAI	—	—	—	20	—	—
Solar Pro 3	Upstage	—	—	—	25.9	—	—
GPT-4o Realtime (Dec '24)	OpenAI	—	—	—	—	—	—
MiniMax-M2.7	MiniMax	—	—	—	49.6	49 tok/s	1.7s
LFM2 24B A2B	Liquid AI	—	—	—	10.5	148 tok/s	0.3s
GPT-4o mini Realtime (Dec '24)	OpenAI	—	—	—	—	—	—
GPT-5.4 nano (Non-Reasoning)	OpenAI	—	—	—	24.4	154 tok/s	0.5s
LFM2.5-1.2B-Thinking	Liquid AI	—	—	—	8.1	—	—
Gemini 2.0 Flash-Lite (Preview)	Google	—	—	—	14.5	—	—
Fish Audio S2 Pro	Fish Audio	—	—	—	—	—	—
LFM2.5-1.2B-Instruct	Liquid AI	—	—	—	8	—	—
GPT-4	OpenAI	—	—	—	12.8	37 tok/s	0.8s
Gemini 2.0 Flash Thinking Experimental (Dec '24)	Google	—	—	—	12.3	—	—
Gemini 1.0 Ultra	Google	—	—	—	10.1	—	—
PALM-2	Google	—	—	—	8.6	—	—
Claude 3 Haiku	Anthropic	—	—	—	12.3	132 tok/s	0.5s
Claude 4.1 Opus (Non-reasoning)	Anthropic	—	—	—	36	36 tok/s	1.4s
Grok 4.20 0309 v2 (Non-reasoning)	xAI	—	—	—	29	162 tok/s	0.4s
Grok 4.20 0309 v2 (Reasoning)	xAI	—	—	—	49.3	225 tok/s	14.9s
R1 1776	Perplexity	—	—	—	12	—	—
Codestral	Mistral AI	$0.3	$0.9	32K	—	180 tok/s	—
DeepSeek-V2.5 (Dec '24)	DeepSeek	—	—	—	12.5	—	—
DeepSeek-Coder-V2	DeepSeek	—	—	—	10.6	—	—
DeepSeek LLM 67B Chat (V1)	DeepSeek	—	—	—	8.4	—	—
Gemini 3.1 Pro Preview	Google	—	—	—	57.2	130 tok/s	24.6s
Gemini 3.1 Flash-Lite Preview	Google	—	—	—	33.5	338 tok/s	5.3s
Sonar Reasoning	Perplexity	—	—	—	17.9	—	—
Sonar Reasoning Pro	Perplexity	—	—	—	24.6	—	—
Grok 3 Reasoning Beta	xAI	—	—	—	21.6	—	—
Grok 4.20 0309 (Reasoning)	xAI	—	—	—	48.5	215 tok/s	18.3s
Grok 4.20 0309 (Non-reasoning)	xAI	—	—	—	29.7	172 tok/s	0.4s
Magpie-Multilingual 357M (Feb 2026)	NVIDIA	—	—	—	—	—	—
Solar Mini	Upstage	—	—	—	11.9	92 tok/s	1.5s
MiniMax-M2.5	MiniMax	—	—	—	41.9	68 tok/s	1.8s
Mistral Small 4 (Reasoning)	Mistral	—	—	—	27.8	175 tok/s	0.5s
Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort)	Anthropic	—	—	—	51.7	71 tok/s	54.0s
Claude Sonnet 4.6 (Non-reasoning, Low Effort)	Anthropic	—	—	—	42.6	53 tok/s	1.0s
Reka Flash (Sep '24)	Reka AI	—	—	—	12	86 tok/s	1.3s
Claude Opus 4.6 (Adaptive Reasoning, Max Effort)	Anthropic	—	—	—	53	57 tok/s	12.3s
Gemma 4 E4B (Reasoning)	Google	—	—	—	18.8	—	—
Gemma 4 E4B (Non-reasoning)	Google	—	—	—	14.8	—	—
Gemma 4 E2B (Reasoning)	Google	—	—	—	15.2	—	—
Magpie Multilingual	NVIDIA	—	—	—	—	—	—
GLM-4.7-Flash (Reasoning)	Z AI	—	—	—	30.1	88 tok/s	0.9s
Gemma 4 E2B (Non-reasoning)	Google	—	—	—	12.1	—	—
GLM-4.7-Flash (Non-reasoning)	Z AI	—	—	—	22.1	139 tok/s	1.3s
Grok-1	xAI	—	—	—	11.7	—	—
Qwen1.5 Chat 110B	Alibaba	—	—	—	9.5	—	—
Gemma 4 26B A4B (Non-reasoning)	Google	—	—	—	27.1	—	—
Gemini 3 Deep Think	Google	—	—	—	—	—	—
Gemma 4 31B (Non-reasoning)	Google	—	—	—	32.3	—	—
Gemma 4 31B (Reasoning)	Google	—	—	—	39.2	35 tok/s	1.0s
Muse Spark	Meta	—	—	—	52.1	—	—
GPT-5.4 (Non-reasoning)	OpenAI	—	—	—	35.4	61 tok/s	0.7s
Arctic Instruct	Snowflake	—	—	—	8.8	—	—
Qwen Chat 72B	Alibaba	—	—	—	8.8	—	—
GPT-5.3 Codex (xhigh)	OpenAI	—	—	—	53.6	90 tok/s	71.3s
Gemini 2.5 Flash Lite TTS	Google	—	—	—	—	—	—
Gemini 2.5 Flash TTS (Dec 2025)	Google	—	—	—	—	—	—
GPT-5.4 nano (medium)	OpenAI	—	—	—	38.1	161 tok/s	3.1s
Inworld TTS 1.5 Max	Inworld	—	—	—	—	—	—
Eleven v3	ElevenLabs	—	—	—	—	—	—
Inworld TTS 1 Max	Inworld	—	—	—	—	—	—
Speech 2.8 Turbo	MiniMax	—	—	—	—	—	—
Step TTS 2 (Mar 2026)	StepFun	—	—	—	—	—	—
Speech 2.6 HD	MiniMax	—	—	—	—	—	—
Speech 2.6 Turbo	MiniMax	—	—	—	—	—	—
Inworld TTS 1	Inworld	—	—	—	—	—	—
Speech-02-HD	MiniMax	—	—	—	—	—	—
Azure HD 2.5	Microsoft Azure	—	—	—	—	—	—
Multilingual v2	ElevenLabs	—	—	—	—	—	—
Step Audio EditX (Mar 2026)	StepFun	—	—	—	—	—	—
Speech-02-Turbo	MiniMax	—	—	—	—	—	—
TTS-1	OpenAI	—	—	—	—	—	—
TTS-1 HD	OpenAI	—	—	—	—	—	—
Turbo v2.5	ElevenLabs	—	—	—	—	—	—
Flash v2.5	ElevenLabs	—	—	—	—	—	—
Sonic 3	Cartesia	—	—	—	—	—	—
OpenAudio S1	Fish Audio	—	—	—	—	—	—
SIMBA 1.6	Speechify	—	—	—	—	—	—
Studio	Google	—	—	—	—	—	—
T2A-01-HD	MiniMax	—	—	—	—	—	—
Kokoro 82M v1.0	Kokoro	—	—	—	—	—	—
Voxtral TTS	Mistral	—	—	—	—	—	—
Polly Generative	Amazon	—	—	—	—	—	—
AsyncFlow V2, async	async	—	—	—	—	—	—
Azure Neural	Microsoft Azure	—	—	—	—	—	—
Maya1	Maya Research	—	—	—	—	—	—
Inworld TTS 1.5 Mini	Inworld	—	—	—	—	—	—
Polly Long-Form	Amazon	—	—	—	—	—	—
Chatterbox HD	Resemble AI	—	—	—	—	—	—
Journey	Google	—	—	—	—	—	—
SIMBA 1.0	Speechify	—	—	—	—	—	—
MiMo-V2-TTS	Xiaomi	—	—	—	—	—	—
Gemini 2.5 Pro (Dec 2025)	Google	—	—	—	—	—	—
T2A-01-Turbo	MiniMax	—	—	—	—	—	—
Lightning v3.1	Smallest.ai	—	—	—	—	—	—
Octave TTS	Hume AI	—	—	—	—	—	—
Fish Speech 1.5	Fish Audio	—	—	—	—	—	—
MAI-Voice-1	Microsoft Azure	—	—	—	—	—	—
Chatterbox	Resemble AI	—	—	—	—	—	—
Magpie-Multilingual 357M	NVIDIA	—	—	—	—	—	—
Zonos-v0.1	Zyphra	—	—	—	—	—	—
LMNT	LMNT	—	—	—	—	—	—
VibeVoice 1.5B	Microsoft Azure	—	—	—	—	—	—
VibeVoice 7B	Microsoft Azure	—	—	—	—	—	—
Murf Speech Gen 2	Murf AI	—	—	—	—	—	—
OpenVoice v2	OpenVoice	—	—	—	—	—	—
Neuphonic TTS	Neuphonic	—	—	—	—	—	—
Qwen3 TTS Flash	Alibaba	—	—	—	—	—	—
Qwen3 TTS	Alibaba	—	—	—	—	—	—
XTTS v2	Coqui	—	—	—	—	—	—
StyleTTS 2	StyleTTS	—	—	—	—	—	—
WaveNet	Google	—	—	—	—	—	—
Polly Neural	Amazon	—	—	—	—	—	—
Claude Opus 4.7 (Adaptive Reasoning, Max Effort)	Anthropic	—	—	—	57.3	53 tok/s	7.5s
Sonic English (Oct 2024)	Cartesia	—	—	—	—	—	—
Qwen3.5 9B (Non-reasoning)	Alibaba	—	—	—	27.3	141 tok/s	0.3s
GPT-5.4 mini (Non-Reasoning)	OpenAI	—	—	—	23.3	164 tok/s	0.5s
Chirp 3: HD	Google	—	—	—	—	—	—
Falcon (Beta)	Murf AI	—	—	—	—	—	—
Polly Standard	Amazon	—	—	—	—	—	—
JT-MINI	China Mobile	—	—	—	25.4	—	—
GPT-5.4 Pro (xhigh)	OpenAI	—	—	—	—	—	—
Gemma 4 26B A4B (Reasoning)	Google	—	—	—	31.2	—	—
Mistral Small 4 (Non-reasoning)	Mistral	—	—	—	18.6	147 tok/s	0.4s
GPT-5.4 nano (xhigh)	OpenAI	—	—	—	44	163 tok/s	2.8s
Qwen Chat 14B	Alibaba	—	—	—	7.4	—	—
GPT-5.4 (xhigh)	OpenAI	—	—	—	56.8	85 tok/s	168.3s
GLM-5.1 (Non-reasoning)	Z AI	—	—	—	43.8	48 tok/s	1.3s
MetaVoice v1	MetaVoice	—	—	—	—	—	—
GPT-5.4 mini (xhigh)	OpenAI	—	—	—	48.9	188 tok/s	7.6s
DeepSeek-V2-Chat	DeepSeek	—	—	—	9.1	—	—
Qwen3.6 35B A3B (Reasoning)	Alibaba	—	—	—	43.5	239 tok/s	1.7s
Speech 2.8 HD	MiniMax	—	—	—	—	—	—

Model	Input Cost	Output Cost	Total/Month	vs Cheapest
Llama 3.3 70B Meta AI	$0.23	$0.46	$0.69	✓ Best value
DeepSeek R2 DeepSeek	$0.55	$1.09	$1.65	2.4× more
GPT-4.1 OpenAI	$2.00	$4.00	$6.00	8.7× more
Claude Sonnet 4.6 Anthropic	$3.00	$7.50	$10.50	15.2× more
GPT-4o OpenAI	$5.00	$7.50	$12.50	18.1× more
Claude Opus 4.6 Anthropic	$15.00	$37.50	$52.50	76.1× more

AI Model Comparison

Estimate Your Monthly Cost