AI Model Comparison

Compare pricing, benchmarks, and capabilities across 559 AI models

559 models tracked0 open source

All Language Models Text → Image Text → Video Text → Speech Image → Video

Type

Provider

All AI21 Labs Alibaba Alibaba Cloud Allen Institute for AI Amazon Anthropic Arcee AI Baidu ByteDance Seed Cartesia China Mobile Cohere Coqui Databricks Deep Cogito DeepSeek ElevenLabs Fish Audio Google Google DeepMind Hume AI IBM Inception InclusionAI Inworld Kimi Kokoro Korea Telecom KwaiKAT LG AI Research LMNT Liquid AI LongCat MBZUAI Institute of Foundation Models Maya Research Meta Meta AI MetaVoice Microsoft Microsoft Azure MiniMax Mistral Mistral AI Motif Technologies Murf AI NVIDIA Nanbeige Naver Neuphonic Nous Research OpenAI OpenChat OpenVoice Perplexity Prime Intellect Reka AI Resemble AI Rime Sarvam ServiceNow Smallest.ai Snowflake Speechify StepFun StyleTTS Swiss AI Initiative TII TII UAE Trillion Labs Upstage Xiaomi Z AI Zyphra async xAI

Price

Any Free <$1/M <$5/M <$20/M

Sort

Best Benchmark Cheapest First Most Expensive Largest Context Fastest

Clear all filters

Model	Provider	Input $/1M↕	Output $/1M↕	Context↕	Intelligence↑	Speed↕	Latency
DeepSeek R2 ★	DeepSeek	$0.55	$2.19	128K	91%	60 tok/s	—
GPT-4.1 ★	OpenAI	$2	$8	1M	90.5%	80 tok/s	—
Claude Opus 4.6 ★	Anthropic	$15	$75	200K	88.7%	60 tok/s	—
GPT-4o ★	OpenAI	$5	$15	128K	87.2%	120 tok/s	—
Claude Sonnet 4.6 ★	Anthropic	$3	$15	200K	86.8%	100 tok/s	—
o3	OpenAI	$10	$40	200K	96.7%	40 tok/s	—
o4-mini	OpenAI	$1.1	$4.4	200K	93.4%	100 tok/s	—
Gemini 3 Ultra	Google DeepMind	$7	$21	1M	90.1%	70 tok/s	—
Claude Opus 4.5 (Reasoning)	Anthropic	—	—	—	49.7	72 tok/s	11.7s
Gemini 3 Pro Preview (low)	Google	—	—	—	41.3	—	—
Gemini 3 Flash Preview (Reasoning)	Google	—	—	—	46.4	195 tok/s	5.9s
Claude Opus 4.5 (Non-reasoning)	Anthropic	—	—	—	43.1	63 tok/s	1.1s
Claude 4.5 Sonnet (Reasoning)	Anthropic	—	—	—	43	59 tok/s	10.4s
Claude 4.1 Opus (Reasoning)	Anthropic	—	—	—	42	42 tok/s	8.0s
MiniMax-M2.1	MiniMax	—	—	—	39.4	59 tok/s	2.4s
Grok 3	xAI	$3	$15	131K	87.5%	90 tok/s	—
GPT-5 Codex (high)	OpenAI	—	—	—	44.6	207 tok/s	11.4s
GPT-5.1 (high)	OpenAI	—	—	—	47.7	118 tok/s	25.1s
GPT-5.2 (xhigh)	OpenAI	—	—	—	51.3	72 tok/s	81.3s
GPT-5 (high)	OpenAI	—	—	—	44.6	86 tok/s	99.7s
Grok 4	xAI	—	—	—	41.5	64 tok/s	7.4s
GPT-5 (medium)	OpenAI	—	—	—	42	95 tok/s	40.4s
Gemini 3 Pro	Google DeepMind	$3.5	$10.5	1M	87%	100 tok/s	—
Claude 4 Opus (Reasoning)	Anthropic	—	—	—	39	41 tok/s	8.0s
Qwen3-Max	Alibaba Cloud	$0.4	$1.2	32K	87%	90 tok/s	—
GPT-5 (low)	OpenAI	—	—	—	39.2	75 tok/s	10.3s
Claude 4 Opus (Non-reasoning)	Anthropic	—	—	—	33	37 tok/s	1.4s
Gemini 2.5 Pro Preview (Mar' 25)	Google	—	—	—	30.3	—	—
DeepSeek V3.2 (Reasoning)	DeepSeek	—	—	—	41.7	29 tok/s	1.4s
DeepSeek V3.2 Speciale	DeepSeek	—	—	—	29.4	—	—
GPT-5.2 (medium)	OpenAI	—	—	—	46.6	—	—
GLM-4.7 (Reasoning)	Z AI	—	—	—	42.1	109 tok/s	0.7s
Gemini 2.5 Pro	Google	—	—	—	34.6	127 tok/s	22.0s
Claude 4.5 Sonnet (Non-reasoning)	Anthropic	—	—	—	37.1	56 tok/s	1.2s
GPT-5.1 Codex (high)	OpenAI	—	—	—	43.1	167 tok/s	6.7s
Cogito v2.1 (Reasoning)	Deep Cogito	—	—	—	85%	57 tok/s	0.5s
Grok 4 Fast (Reasoning)	xAI	—	—	—	35.1	216 tok/s	3.4s
Doubao Seed Code	ByteDance Seed	—	—	—	33.5	—	—
DeepSeek V3.2 Exp (Reasoning)	DeepSeek	—	—	—	32.9	30 tok/s	1.4s
Kimi K2 Thinking	Kimi	—	—	—	40.9	41 tok/s	1.1s
DeepSeek V3.1 Terminus (Reasoning)	DeepSeek	—	—	—	33.9	—	—
Grok 4.1 Fast (Reasoning)	xAI	—	—	—	38.6	142 tok/s	9.2s
DeepSeek R1 0528 (May '25)	DeepSeek	—	—	—	27.1	—	—
DeepSeek V3.1 (Reasoning)	DeepSeek	—	—	—	27.7	—	—
Qwen3 235B A22B 2507 (Reasoning)	Alibaba	—	—	—	29.5	51 tok/s	1.3s
MiMo-V2-Flash (Reasoning)	Xiaomi	—	—	—	39.2	123 tok/s	1.8s
Qwen3 Max (Preview)	Alibaba	—	—	—	26.1	47 tok/s	1.8s
DeepSeek V3.2 (Non-reasoning)	DeepSeek	—	—	—	32.1	30 tok/s	1.3s
Claude 4 Sonnet (Non-reasoning)	Anthropic	—	—	—	33	52 tok/s	0.8s
DeepSeek R1 (Jan '25)	DeepSeek	—	—	—	18.8	—	—
DeepSeek V3.1 Terminus (Non-reasoning)	DeepSeek	—	—	—	28.5	—	—
DeepSeek V3.2 Exp (Non-reasoning)	DeepSeek	—	—	—	28.4	31 tok/s	1.3s
K-EXAONE (Reasoning)	LG AI Research	—	—	—	32.1	—	—
Qwen3 VL 235B A22B (Reasoning)	Alibaba	—	—	—	27.6	45 tok/s	1.2s
Gemini 2.5 Pro Preview (May' 25)	Google	—	—	—	29.5	—	—
Gemini 2.5 Flash Preview (Sep '25) (Reasoning)	Google	—	—	—	31.1	—	—
Claude 4 Sonnet (Reasoning)	Anthropic	—	—	—	38.7	59 tok/s	8.5s
Mistral Large	Mistral AI	$2	$6	128K	84%	90 tok/s	—
Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning)	Google	—	—	—	25.7	—	—
Claude 3.7 Sonnet (Reasoning)	Anthropic	—	—	—	34.7	—	—
GPT-5 mini (high)	OpenAI	—	—	—	41.2	74 tok/s	91.5s
GLM-4.5 (Reasoning)	Z AI	—	—	—	26.4	38 tok/s	0.9s
o1	OpenAI	—	—	—	30.8	112 tok/s	23.6s
Llama 3.1 Nemotron Ultra 253B v1 (Reasoning)	NVIDIA	—	—	—	15	42 tok/s	0.7s
Grok 3 mini Reasoning (high)	xAI	—	—	—	32.1	216 tok/s	0.4s
GPT-5 mini (medium)	OpenAI	—	—	—	38.9	77 tok/s	20.0s
Nova 2.0 Pro Preview (medium)	Amazon	—	—	—	35.7	120 tok/s	17.9s
GLM-4.6 (Reasoning)	Z AI	—	—	—	32.5	36 tok/s	0.9s
Grok 3 Mini	xAI	$0.3	$0.5	131K	83%	160 tok/s	—
DeepSeek V3.1 (Non-reasoning)	DeepSeek	—	—	—	28.1	—	—
Gemini 2.5 Flash (Reasoning)	Google	—	—	—	27	205 tok/s	13.3s
Hermes 4 - Llama-3.1 405B (Reasoning)	Nous Research	—	—	—	18.6	32 tok/s	0.8s
Qwen3 235B A22B (Reasoning)	Alibaba	—	—	—	19.8	65 tok/s	1.3s
ERNIE 5.0 Thinking Preview	Baidu	—	—	—	29.1	—	—
Qwen3 235B A22B 2507 Instruct	Alibaba	—	—	—	25	70 tok/s	1.2s
MiniMax M1 80k	MiniMax	—	—	—	24.4	—	—
EXAONE 4.0 32B (Reasoning)	LG AI Research	—	—	—	16.7	—	—
Qwen3 VL 32B (Reasoning)	Alibaba	—	—	—	24.7	97 tok/s	1.4s
Seed-OSS-36B-Instruct	ByteDance Seed	—	—	—	25.2	42 tok/s	1.8s
Qwen3 VL 235B A22B Instruct	Alibaba	—	—	—	20.8	57 tok/s	1.2s
Kimi K2 0905	Kimi	—	—	—	30.9	22 tok/s	2.1s
GLM-4.5-Air	Z AI	—	—	—	23.2	65 tok/s	1.3s
INTELLECT-3	Prime Intellect	—	—	—	22.2	—	—
MiniMax-M2	MiniMax	—	—	—	36.1	61 tok/s	2.2s
DeepSeek V3 0324	DeepSeek	—	—	—	22.3	—	—
Magistral Medium 1.2	Mistral	—	—	—	27.1	95 tok/s	0.4s
GPT-4o mini	OpenAI	$0.15	$0.6	128K	82%	200 tok/s	—
Gemini 3 Flash	Google DeepMind	$0.075	$0.3	1M	82%	250 tok/s	—
Qwen3 Max Thinking (Preview)	Alibaba	—	—	—	32.5	43 tok/s	1.8s
Nova 2.0 Lite (high)	Amazon	—	—	—	34.5	195 tok/s	21.4s
Nova 2.0 Pro Preview (low)	Amazon	—	—	—	31.9	143 tok/s	6.8s
GPT-5 (ChatGPT)	OpenAI	—	—	—	21.8	158 tok/s	0.6s
Kimi K2	Kimi	—	—	—	26.3	35 tok/s	1.3s
GPT-5.1 Codex mini (high)	OpenAI	—	—	—	38.6	197 tok/s	5.9s
Qwen3 Next 80B A3B (Reasoning)	Alibaba	—	—	—	26.7	164 tok/s	1.1s
Ling-1T	InclusionAI	—	—	—	19	—	—
Qwen3 Next 80B A3B Instruct	Alibaba	—	—	—	20.1	166 tok/s	1.0s
Ring-1T	InclusionAI	—	—	—	22.8	—	—
gpt-oss-120B (high)	OpenAI	—	—	—	33.3	215 tok/s	0.5s
Qwen3 30B A3B 2507 (Reasoning)	Alibaba	—	—	—	22.4	148 tok/s	1.1s
GPT-5.2 (Non-reasoning)	OpenAI	—	—	—	33.6	63 tok/s	0.8s
Mi:dm K 2.5 Pro Preview	Korea Telecom	—	—	—	81%	—	—
Qwen3 VL 30B A3B (Reasoning)	Alibaba	—	—	—	19.7	127 tok/s	1.0s
Hermes 4 - Llama-3.1 70B (Reasoning)	Nous Research	—	—	—	16	62 tok/s	0.6s
Llama Nemotron Super 49B v1.5 (Reasoning)	NVIDIA	—	—	—	18.7	60 tok/s	0.3s
Llama 4 Maverick	Meta	—	—	—	18.4	115 tok/s	0.6s
GPT-5 (minimal)	OpenAI	—	—	—	23.9	74 tok/s	1.1s
K-EXAONE (Non-reasoning)	LG AI Research	—	—	—	23.4	—	—
Nova 2.0 Omni (medium)	Amazon	—	—	—	28	—	—
Nova 2.0 Lite (medium)	Amazon	—	—	—	29.7	177 tok/s	13.8s
Gemini 2.5 Flash (Non-reasoning)	Google	—	—	—	20.6	180 tok/s	0.5s
Mistral Large 3	Mistral	—	—	—	22.8	56 tok/s	0.6s
Gemini 2.0 Pro Experimental (Feb '25)	Google	—	—	—	18.1	—	—
Gemini 2.5 Flash-Lite Preview (Sep '25) (Reasoning)	Google	—	—	—	21.6	—	—
KAT-Coder-Pro V1	KwaiKAT	—	—	—	36	112 tok/s	1.0s
Solar Pro 2 (Reasoning)	Upstage	—	—	—	14.9	—	—
MiniMax M1 40k	MiniMax	—	—	—	20.9	—	—
Mi:dm K 2.5 Pro	Korea Telecom	—	—	—	23.1	—	—
Claude 3.7 Sonnet (Non-reasoning)	Anthropic	—	—	—	30.8	—	—
Motif-2-12.7B-Reasoning	Motif Technologies	—	—	—	19.1	—	—
Claude 4.5 Haiku (Non-reasoning)	Anthropic	—	—	—	31.1	120 tok/s	0.5s
Gemini 2.5 Flash-Lite Preview (Sep '25) (Non-reasoning)	Google	—	—	—	19.4	—	—
Qwen3 32B (Reasoning)	Alibaba	—	—	—	16.5	103 tok/s	1.1s
Gemini 2.5 Flash Preview (Reasoning)	Google	—	—	—	24.3	—	—
GPT-4o (March 2025, chatgpt-4o-latest)	OpenAI	—	—	—	18.6	—	—
o3-mini (high)	OpenAI	—	—	—	25.2	149 tok/s	27.7s
Gemini 2.0 Flash Thinking Experimental (Jan '25)	Google	—	—	—	19.6	—	—
DeepSeek R1 Distill Llama 70B	DeepSeek	—	—	—	16	41 tok/s	0.5s
Nova 2.0 Omni (low)	Amazon	—	—	—	23.2	—	—
GPT-5.1 (Non-reasoning)	OpenAI	—	—	—	27.4	108 tok/s	0.8s
GLM-4.6V (Reasoning)	Z AI	—	—	—	23.4	27 tok/s	1.2s
Qwen3 Coder 480B A35B Instruct	Alibaba	—	—	—	24.8	65 tok/s	1.7s
Grok Code Fast 1	xAI	—	—	—	28.7	185 tok/s	5.4s
Nova 2.0 Lite (low)	Amazon	—	—	—	24.6	210 tok/s	5.1s
NVIDIA Nemotron 3 Nano 30B A3B (Reasoning)	NVIDIA	—	—	—	24.3	133 tok/s	1.3s
Llama 3.3 Nemotron Super 49B v1 (Reasoning)	NVIDIA	—	—	—	18.5	—	—
K2-V2 (high)	MBZUAI Institute of Foundation Models	—	—	—	20.6	—	—
HyperCLOVA X SEED Think (32B)	Naver	—	—	—	23.7	—	—
Apriel-v1.6-15B-Thinker	ServiceNow	—	—	—	27.6	—	—
Ring-flash-2.0	InclusionAI	—	—	—	14	87 tok/s	1.4s
Qwen3 Omni 30B A3B (Reasoning)	Alibaba	—	—	—	15.6	93 tok/s	1.0s
o3-mini	OpenAI	—	—	—	25.9	151 tok/s	8.1s
GLM-4.5V (Reasoning)	Z AI	—	—	—	15.1	45 tok/s	1.0s
GLM-4.7 (Non-reasoning)	Z AI	—	—	—	34.2	106 tok/s	0.7s
Qwen3 VL 32B Instruct	Alibaba	—	—	—	17.2	83 tok/s	1.3s
ERNIE 4.5 300B A47B	Baidu	—	—	—	15	29 tok/s	1.8s
GLM-4.6 (Non-reasoning)	Z AI	—	—	—	30.2	67 tok/s	0.9s
Command R+	Cohere	$2.5	$10	128K	78%	80 tok/s	—
GPT-4.1 mini	OpenAI	—	—	—	22.9	90 tok/s	0.6s
Qwen3 30B A3B 2507 Instruct	Alibaba	—	—	—	15	92 tok/s	1.3s
Gemini 2.0 Flash (experimental)	Google	—	—	—	16.8	—	—
Ling-flash-2.0	InclusionAI	—	—	—	15.7	94 tok/s	1.5s
Gemini 2.5 Flash Preview (Non-reasoning)	Google	—	—	—	17.8	—	—
GPT-5 mini (minimal)	OpenAI	—	—	—	20.7	96 tok/s	1.1s
gpt-oss-120B (low)	OpenAI	—	—	—	24.5	218 tok/s	0.5s
Gemini 2.0 Flash (Feb '25)	Google	—	—	—	18.5	—	—
Qwen3 30B A3B (Reasoning)	Alibaba	—	—	—	15.3	70 tok/s	1.2s
GPT-5 nano (high)	OpenAI	—	—	—	26.8	144 tok/s	100.6s
GPT-5 nano (medium)	OpenAI	—	—	—	25.9	145 tok/s	50.0s
GPT-4o (ChatGPT)	OpenAI	—	—	—	14.1	—	—
Nova 2.0 Pro Preview (Non-reasoning)	Amazon	—	—	—	23.1	151 tok/s	0.7s
Magistral Small 1.2	Mistral	—	—	—	18.2	188 tok/s	0.4s
Claude 3.5 Sonnet (Oct '24)	Anthropic	—	—	—	15.9	—	—
Solar Pro 2 (Preview) (Reasoning)	Upstage	—	—	—	18.8	—	—
EXAONE 4.0 32B (Non-reasoning)	LG AI Research	—	—	—	11.7	—	—
Qwen3 14B (Reasoning)	Alibaba	—	—	—	16.2	65 tok/s	1.1s
Apriel-v1.5-15B-Thinker	ServiceNow	—	—	—	28.3	—	—
Mistral Medium 3	Mistral	—	—	—	18.8	62 tok/s	0.5s
Sonar Pro	Perplexity	—	—	—	15.2	—	—
Devstral 2	Mistral	—	—	—	22	79 tok/s	0.5s
Claude 4.5 Haiku (Reasoning)	Anthropic	—	—	—	37.1	156 tok/s	10.0s
Olmo 3 32B Think	Allen Institute for AI	—	—	—	12.1	—	—
NVIDIA Nemotron Nano 12B v2 VL (Reasoning)	NVIDIA	—	—	—	14.9	151 tok/s	0.5s
Qwen3 235B A22B (Non-reasoning)	Alibaba	—	—	—	17	63 tok/s	1.2s
Gemini 2.5 Flash-Lite (Reasoning)	Google	—	—	—	17.6	295 tok/s	12.3s
Qwen3 VL 30B A3B Instruct	Alibaba	—	—	—	16.1	123 tok/s	1.0s
K2-V2 (medium)	MBZUAI Institute of Foundation Models	—	—	—	18.7	—	—
Olmo 3.1 32B Think	Allen Institute for AI	—	—	—	13.9	—	—
Qwen2.5 Max	Alibaba	—	—	—	16.3	46 tok/s	1.1s
QwQ 32B	Alibaba	—	—	—	19.7	33 tok/s	0.4s
Claude Haiku 4.5	Anthropic	$0.8	$4	200K	75.2%	250 tok/s	—
Claude 3.5 Sonnet (June '24)	Anthropic	—	—	—	14.2	—	—
Magistral Small 1	Mistral	—	—	—	16.8	—	—
GLM-4.6V (Non-reasoning)	Z AI	—	—	—	17.1	23 tok/s	5.9s
Magistral Medium 1	Mistral	—	—	—	18.8	—	—
Solar Pro 2 (Non-reasoning)	Upstage	—	—	—	13.6	—	—
Llama 4 Scout	Meta	—	—	—	13.5	137 tok/s	0.5s
Gemini 1.5 Pro (Sep '24)	Google	—	—	—	16	—	—
gpt-oss-20B (high)	OpenAI	—	—	—	24.5	252 tok/s	0.3s
Qwen3 VL 8B (Reasoning)	Alibaba	—	—	—	16.7	135 tok/s	1.1s
GLM-4.5V (Non-reasoning)	Z AI	—	—	—	12.7	39 tok/s	29.9s
DeepSeek R1 Distill Qwen 14B	DeepSeek	—	—	—	15.8	—	—
Qwen3 4B 2507 (Reasoning)	Alibaba	—	—	—	18.2	—	—
MiMo-V2-Flash (Non-reasoning)	Xiaomi	—	—	—	30.4	124 tok/s	1.5s
GPT-4o (May '24)	OpenAI	—	—	—	14.5	101 tok/s	0.5s
Qwen3 8B (Reasoning)	Alibaba	—	—	—	13.2	91 tok/s	1.0s
NVIDIA Nemotron Nano 9B V2 (Reasoning)	NVIDIA	—	—	—	14.8	117 tok/s	0.3s
DeepSeek R1 Distill Qwen 32B	DeepSeek	—	—	—	17.2	42 tok/s	0.5s
NVIDIA Nemotron Nano 9B V2 (Non-reasoning)	NVIDIA	—	—	—	13.2	153 tok/s	0.7s
Nova 2.0 Lite (Non-reasoning)	Amazon	—	—	—	18	182 tok/s	0.8s
DeepSeek R1 0528 Qwen3 8B	DeepSeek	—	—	—	16.4	—	—
o1-mini	OpenAI	—	—	—	20.4	—	—
Grok 4.1 Fast (Non-reasoning)	xAI	—	—	—	23.6	131 tok/s	0.4s
Llama 3.1 Instruct 405B	Meta	—	—	—	17.4	31 tok/s	0.7s
Solar Pro 2 (Preview) (Non-reasoning)	Upstage	—	—	—	16	—	—
Hermes 4 - Llama-3.1 405B (Non-reasoning)	Nous Research	—	—	—	17.6	32 tok/s	0.9s
Qwen3 32B (Non-reasoning)	Alibaba	—	—	—	14.5	102 tok/s	1.2s
Qwen3 Omni 30B A3B Instruct	Alibaba	—	—	—	10.7	106 tok/s	1.1s
Grok 4 Fast (Non-reasoning)	xAI	—	—	—	23.1	196 tok/s	0.4s
Falcon-H1R-7B	TII UAE	—	—	—	15.8	—	—
Nova Premier	Amazon	—	—	—	19	70 tok/s	1.2s
Gemini 2.0 Flash-Lite (Feb '25)	Google	—	—	—	14.7	—	—
Gemini 3.1 Flash-Lite	Google DeepMind	$0.01	$0.04	1M	72%	500 tok/s	—
Command R	Cohere	$0.15	$0.6	128K	72%	150 tok/s	—
Mistral Small	Mistral AI	$0.1	$0.3	32K	72%	200 tok/s	—
Nova 2.0 Omni (Non-reasoning)	Amazon	—	—	—	16.6	227 tok/s	0.9s
Qwen2.5 Instruct 72B	Alibaba	—	—	—	15.6	55 tok/s	1.2s
gpt-oss-20B (low)	OpenAI	—	—	—	20.8	261 tok/s	0.4s
Llama 3.1 Tulu3 405B	Allen Institute for AI	—	—	—	14.1	—	—
Gemini 2.5 Flash-Lite (Non-reasoning)	Google	—	—	—	12.7	260 tok/s	0.4s
K2-V2 (low)	MBZUAI Institute of Foundation Models	—	—	—	14.4	—	—
Grok 2 (Dec '24)	xAI	—	—	—	13.9	—	—
Command A	Cohere	—	—	—	13.5	40 tok/s	0.6s
Qwen3 Coder 30B A3B Instruct	Alibaba	—	—	—	20	113 tok/s	1.4s
Qwen3 30B A3B (Non-reasoning)	Alibaba	—	—	—	12.5	67 tok/s	1.2s
Llama 3.3 Instruct 70B	Meta	—	—	—	14.5	96 tok/s	0.6s
Devstral Medium	Mistral	—	—	—	18.7	145 tok/s	0.5s
Llama 3.3 Nemotron Super 49B v1 (Non-reasoning)	NVIDIA	—	—	—	14.3	—	—
Sarvam M (Reasoning)	Sarvam	—	—	—	8.4	—	—
Mistral Large 2 (Nov '24)	Mistral	—	—	—	15.1	41 tok/s	0.5s
Qwen2.5 Instruct 32B	Alibaba	—	—	—	13.2	—	—
Qwen3 4B (Reasoning)	Alibaba	—	—	—	14.2	104 tok/s	1.0s
Grok Beta	xAI	—	—	—	13.3	—	—
Qwen3 VL 4B (Reasoning)	Alibaba	—	—	—	13.7	—	—
Pixtral Large	Mistral	—	—	—	14	51 tok/s	0.5s
Claude 3 Opus	Anthropic	—	—	—	18	—	—
Llama 3.1 Nemotron Instruct 70B	NVIDIA	—	—	—	13.4	46 tok/s	0.3s
Ministral 3 14B	Mistral	—	—	—	16	99 tok/s	0.3s
GPT-4 Turbo	OpenAI	—	—	—	13.7	32 tok/s	1.2s
Sonar	Perplexity	—	—	—	15.5	—	—
Qwen3 VL 8B Instruct	Alibaba	—	—	—	14.3	148 tok/s	0.9s
Llama Nemotron Super 49B v1.5 (Non-reasoning)	NVIDIA	—	—	—	14.6	58 tok/s	0.3s
Nova Pro	Amazon	—	—	—	13.5	—	—
Gemini 1.5 Flash (Sep '24)	Google	—	—	—	13.8	—	—
Llama 3.1 Instruct 70B	Meta	—	—	—	12.5	31 tok/s	0.8s
Mistral Small 3.2	Mistral	—	—	—	15.1	155 tok/s	0.3s
Mistral Medium 3.1	Mistral	—	—	—	21.3	89 tok/s	0.4s
Qwen3 14B (Non-reasoning)	Alibaba	—	—	—	12.8	65 tok/s	1.0s
Mistral Large 2 (Jul '24)	Mistral	—	—	—	13	—	—
Devstral Small 2	Mistral	—	—	—	19.5	80 tok/s	0.7s
Llama 3.2 Instruct 90B (Vision)	Meta	—	—	—	11.9	42 tok/s	0.5s
Qwen3 4B 2507 Instruct	Alibaba	—	—	—	12.9	—	—
Reka Flash 3	Reka AI	—	—	—	9.5	94 tok/s	1.3s
Ling-mini-2.0	InclusionAI	—	—	—	9.2	—	—
GPT-4.1 nano	OpenAI	—	—	—	13	200 tok/s	0.4s
Olmo 3 7B Think	Allen Institute for AI	—	—	—	9.4	—	—
Gemini 1.5 Pro (May '24)	Google	—	—	—	12	—	—
Mistral Small 3.1	Mistral	—	—	—	14.5	153 tok/s	0.5s
Hermes 4 - Llama-3.1 70B (Non-reasoning)	Nous Research	—	—	—	12.6	63 tok/s	0.6s
QwQ 32B-Preview	Alibaba	—	—	—	15.2	43 tok/s	0.5s
Mistral Small 3	Mistral	—	—	—	12.7	154 tok/s	0.5s
NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning)	NVIDIA	—	—	—	10.1	175 tok/s	0.7s
Qwen3 8B (Non-reasoning)	Alibaba	—	—	—	10.6	94 tok/s	0.9s
Ministral 3 8B	Mistral	—	—	—	14.8	180 tok/s	0.3s
Qwen2.5 Coder Instruct 32B	Alibaba	—	—	—	12.9	—	—
Qwen2.5 Turbo	Alibaba	—	—	—	12	68 tok/s	1.2s
Claude 3.5 Haiku	Anthropic	—	—	—	18.7	—	—
Qwen3 VL 4B Instruct	Alibaba	—	—	—	9.6	—	—
Devstral Small (May '25)	Mistral	—	—	—	18	—	—
Qwen2 Instruct 72B	Alibaba	—	—	—	11.7	—	—
Granite 4.0 H Small	IBM	—	—	—	10.8	453 tok/s	8.7s
Devstral Small (Jul '25)	Mistral	—	—	—	15.2	202 tok/s	0.4s
Mistral Saba	Mistral	—	—	—	12.1	—	—
Gemma 3 12B Instruct	Google	—	—	—	8.8	30 tok/s	10.2s
Exaone 4.0 1.2B (Reasoning)	LG AI Research	—	—	—	8.3	—	—
Kimi Linear 48B A3B Instruct	Kimi	—	—	—	14.4	—	—
Nova Lite	Amazon	—	—	—	12.7	221 tok/s	0.7s
Qwen3 4B (Non-reasoning)	Alibaba	—	—	—	12.5	105 tok/s	1.0s
Claude 3 Sonnet	Anthropic	—	—	—	10.3	—	—
Jamba 1.7 Large	AI21 Labs	—	—	—	10.9	49 tok/s	1.1s
Jamba Reasoning 3B	AI21 Labs	—	—	—	9.6	—	—
NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning)	NVIDIA	—	—	—	13.2	78 tok/s	0.3s
DeepHermes 3 - Mistral 24B Preview (Non-reasoning)	Nous Research	—	—	—	10.9	—	—
Llama 3 Instruct 70B	Meta	—	—	—	8.9	42 tok/s	0.7s
Gemini 1.5 Flash-8B	Google	—	—	—	11.1	—	—
Hermes 3 - Llama-3.1 70B	Nous Research	—	—	—	10.6	28 tok/s	0.4s
Qwen3 1.7B (Reasoning)	Alibaba	—	—	—	8	138 tok/s	1.0s
Jamba 1.5 Large	AI21 Labs	—	—	—	10.7	—	—
Gemini 1.5 Flash (May '24)	Google	—	—	—	10.5	—	—
Jamba 1.6 Large	AI21 Labs	—	—	—	10.6	48 tok/s	0.9s
Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning)	NVIDIA	—	—	—	14.4	—	—
GPT-5 nano (minimal)	OpenAI	—	—	—	13.8	142 tok/s	1.0s
DeepSeek R1 Distill Llama 8B	DeepSeek	—	—	—	12.1	—	—
Mixtral 8x22B Instruct	Mistral	—	—	—	9.8	—	—
Nova Micro	Amazon	—	—	—	10.3	314 tok/s	0.6s
Olmo 3 7B Instruct	Allen Institute for AI	—	—	—	8.2	—	—
Ministral 3 3B	Mistral	—	—	—	11.2	307 tok/s	0.3s
LFM2 8B A1B	Liquid AI	—	—	—	7	—	—
OLMo 2 32B	Allen Institute for AI	—	—	—	10.6	—	—
Claude 2.1	Anthropic	—	—	—	9.3	—	—
Exaone 4.0 1.2B (Non-reasoning)	LG AI Research	—	—	—	8.1	—	—
Mistral Medium	Mistral	—	—	—	9	89 tok/s	0.4s
Phi-4 Multimodal Instruct	Microsoft Azure	—	—	—	10	16 tok/s	0.4s
Claude 2.0	Anthropic	—	—	—	9.1	—	—
Gemma 3n E4B Instruct	Google	—	—	—	6.4	14 tok/s	0.4s
Gemma 3n E4B Instruct Preview (May '25)	Google	—	—	—	10.1	—	—
Llama 3.1 Instruct 8B	Meta	—	—	—	11.8	170 tok/s	0.4s
Phi-4 Mini Instruct	Microsoft Azure	—	—	—	8.4	44 tok/s	0.3s
Granite 3.3 8B (Non-reasoning)	IBM	—	—	—	7	427 tok/s	7.3s
Qwen2.5 Coder Instruct 7B	Alibaba	—	—	—	10	—	—
Llama 3.2 Instruct 11B (Vision)	Meta	—	—	—	8.7	79 tok/s	0.5s
GPT-3.5 Turbo	OpenAI	—	—	—	9	89 tok/s	0.5s
Granite 4.0 Micro	IBM	—	—	—	7.7	—	—
Phi-3 Mini Instruct 3.8B	Microsoft Azure	—	—	—	10.1	—	—
Claude Instant	Anthropic	—	—	—	7.4	—	—
Gemini 1.0 Pro	Google	—	—	—	8.5	—	—
LFM 40B	Liquid AI	—	—	—	8.8	—	—
DeepSeek Coder V2 Lite Instruct	DeepSeek	—	—	—	8.5	—	—
Command-R+ (Apr '24)	Cohere	—	—	—	8.3	—	—
Mistral Small (Feb '24)	Mistral	—	—	—	9	154 tok/s	0.5s
Gemma 3 4B Instruct	Google	—	—	—	6.3	30 tok/s	1.1s
Qwen3 1.7B (Non-reasoning)	Alibaba	—	—	—	6.8	141 tok/s	0.9s
Llama 2 Chat 13B	Meta	—	—	—	8.4	—	—
Llama 2 Chat 70B	Meta	—	—	—	8.4	—	—
Llama 3 Instruct 8B	Meta	—	—	—	6.4	82 tok/s	0.5s
Mixtral 8x7B Instruct	Mistral	—	—	—	7.7	—	—
Jamba 1.7 Mini	AI21 Labs	—	—	—	8.1	—	—
Gemma 3n E2B Instruct	Google	—	—	—	4.8	51 tok/s	0.5s
Molmo 7B-D	Allen Institute for AI	—	—	—	9.2	—	—
Jamba 1.5 Mini	AI21 Labs	—	—	—	8	—	—
Jamba 1.6 Mini	AI21 Labs	—	—	—	7.9	178 tok/s	0.8s
DeepHermes 3 - Llama-3.1 8B Preview (Non-reasoning)	Nous Research	—	—	—	7.6	—	—
Llama 3.2 Instruct 3B	Meta	—	—	—	9.7	53 tok/s	0.6s
Qwen3 0.6B (Reasoning)	Alibaba	—	—	—	6.5	189 tok/s	0.9s
Command-R (Mar '24)	Cohere	—	—	—	7.4	—	—
Granite 4.0 1B	IBM	—	—	—	7.3	—	—
OpenChat 3.5 (1210)	OpenChat	—	—	—	8.3	—	—
LFM2 2.6B	Liquid AI	—	—	—	8	—	—
Granite 4.0 H 1B	IBM	—	—	—	8	—	—
OLMo 2 7B	Allen Institute for AI	—	—	—	9.3	—	—
DeepSeek R1 Distill Qwen 1.5B	DeepSeek	—	—	—	9.1	—	—
LFM2 1.2B	Liquid AI	—	—	—	6.3	—	—
Mistral 7B Instruct	Mistral	—	—	—	7.4	190 tok/s	0.3s
Qwen3 0.6B (Non-reasoning)	Alibaba	—	—	—	5.7	194 tok/s	0.9s
Llama 3.2 Instruct 1B	Meta	—	—	—	6.3	88 tok/s	0.6s
Llama 2 Chat 7B	Meta	—	—	—	9.7	108 tok/s	12.6s
Gemma 3 1B Instruct	Google	—	—	—	5.5	48 tok/s	0.6s
Granite 4.0 H 350M	IBM	—	—	—	5.4	—	—
Granite 4.0 350M	IBM	—	—	—	6.1	—	—
Gemma 3 270M	Google	—	—	—	7.7	—	—
Gemini 3.1 Flash TTS	Google	—	—	—	—	—	—
GPT-5.4 nano (xhigh)	OpenAI	—	—	—	44	157 tok/s	2.5s
Mercury 2	Inception	—	—	—	32.8	872 tok/s	4.7s
NVIDIA Nemotron 3 Nano 4B	NVIDIA	—	—	—	14.7	—	—
Gemma 4 31B (Non-reasoning)	Google	—	—	—	32.3	—	—
Molmo2-8B	Allen Institute for AI	—	—	—	7.3	—	—
MiMo-V2-Flash (Feb 2026)	Xiaomi	—	—	—	41.5	127 tok/s	1.5s
MiMo-V2-Omni	Xiaomi	—	—	—	43.4	—	—
MiMo-V2-Pro	Xiaomi	—	—	—	49.2	67 tok/s	2.1s
KAT Coder Pro V2	KwaiKAT	—	—	—	43.8	114 tok/s	1.8s
MiMo-V2-Omni-0327	Xiaomi	—	—	—	44.9	—	—
Sarvam 30B (high)	Sarvam	—	—	—	12.3	294 tok/s	1.2s
Sarvam 105B (high)	Sarvam	—	—	—	18.2	124 tok/s	1.2s
K2 Think V2	MBZUAI Institute of Foundation Models	—	—	—	24.1	—	—
Step3 VL 10B	StepFun	—	—	—	15.4	—	—
o1-preview	OpenAI	—	—	—	23.7	—	—
Olmo 3.1 32B Instruct	Allen Institute for AI	—	—	—	12.2	54 tok/s	0.3s
LongCat Flash Lite	LongCat	—	—	—	23.9	115 tok/s	3.9s
Tri-21B-think Preview	Trillion Labs	—	—	—	20	—	—
Fish Audio S2 Pro	Fish Audio	—	—	—	—	—	—
Nanbeige4.1-3B	Nanbeige	—	—	—	16.1	—	—
Tri-21B-Think	Trillion Labs	—	—	—	18.6	—	—
Apertus 70B Instruct	Swiss AI Initiative	—	—	—	7.7	—	—
Apertus 8B Instruct	Swiss AI Initiative	—	—	—	5.9	—	—
Trinity Large Thinking	Arcee AI	—	—	—	31.9	127 tok/s	0.6s
GLM-5 (Non-reasoning)	Z AI	—	—	—	40.6	53 tok/s	1.4s
GLM-5.1 (Reasoning)	Z AI	—	—	—	51.4	43 tok/s	1.2s
GLM-5-Turbo	Z AI	—	—	—	46.8	—	—
GLM 5V Turbo (Reasoning)	Z AI	—	—	—	42.9	—	—
Tiny Aya Global	Cohere	—	—	—	4.7	—	—
GLM-5 (Reasoning)	Z AI	—	—	—	49.8	67 tok/s	0.9s
Qwen3.5 397B A17B (Reasoning)	Alibaba	—	—	—	45	52 tok/s	1.5s
Qwen3.5 0.8B (Reasoning)	Alibaba	—	—	—	10.5	—	—
Qwen3.5 2B (Non-reasoning)	Alibaba	—	—	—	14.7	232 tok/s	0.3s
Qwen3.5 0.8B (Non-reasoning)	Alibaba	—	—	—	9.9	285 tok/s	0.3s
Qwen3.5 4B (Non-reasoning)	Alibaba	—	—	—	22.6	178 tok/s	0.3s
Kimi K2.5 (Non-reasoning)	Kimi	—	—	—	37.3	32 tok/s	1.4s
Qwen3 Coder Next	Alibaba	—	—	—	28.3	165 tok/s	0.8s
Qwen3.5 9B (Reasoning)	Alibaba	—	—	—	32.4	56 tok/s	0.4s
Qwen3.5 2B (Reasoning)	Alibaba	—	—	—	16.3	—	—
Qwen3.5 35B A3B (Reasoning)	Alibaba	—	—	—	37.1	149 tok/s	1.2s
Qwen3.5 27B (Non-reasoning)	Alibaba	—	—	—	37.2	92 tok/s	1.4s
Qwen3.5 122B A10B (Reasoning)	Alibaba	—	—	—	41.6	159 tok/s	1.1s
Qwen3 Max Thinking	Alibaba	—	—	—	39.9	36 tok/s	1.7s
Step 3.5 Flash 2603	StepFun	—	—	—	38.5	186 tok/s	0.8s
Step 3.5 Flash	StepFun	—	—	—	37.8	163 tok/s	0.8s
Nemotron Cascade 2 30B A3B	NVIDIA	—	—	—	28.4	—	—
Qwen3.5 Omni Plus	Alibaba	—	—	—	38.6	55 tok/s	1.3s
Qwen3.5 4B (Reasoning)	Alibaba	—	—	—	27.1	177 tok/s	0.3s
Arcana v3	Rime	—	—	—	—	—	—
Magpie Multilingual	NVIDIA	—	—	—	—	—	—
Qwen3.6 Plus	Alibaba	—	—	—	50	53 tok/s	1.6s
Qwen3.5 397B A17B (Non-reasoning)	Alibaba	—	—	—	40.1	52 tok/s	1.4s
Qwen3.5 122B A10B (Non-reasoning)	Alibaba	—	—	—	35.9	152 tok/s	1.1s
Qwen3.5 Omni Flash	Alibaba	—	—	—	25.9	170 tok/s	1.2s
Qwen3.5 27B (Reasoning)	Alibaba	—	—	—	42.1	92 tok/s	1.4s
Llama 65B	Meta	—	—	—	7.4	—	—
Kimi K2.5 (Reasoning)	Kimi	—	—	—	46.8	32 tok/s	1.3s
Qwen3.5 35B A3B (Non-reasoning)	Alibaba	—	—	—	30.7	153 tok/s	1.1s
GPT-3.5 Turbo (0613)	OpenAI	—	—	—	—	—	—
DeepSeek-V2.5	DeepSeek	—	—	—	12.3	—	—
o3-pro	OpenAI	—	—	—	40.7	19 tok/s	95.4s
LFM2 24B A2B	Liquid AI	—	—	—	10.5	163 tok/s	0.3s
o1-pro	OpenAI	—	—	—	25.8	—	—
GPT-4o (Aug '24)	OpenAI	—	—	—	18.6	108 tok/s	0.6s
Solar Open 100B (Reasoning)	Upstage	—	—	—	21.7	—	—
NVIDIA Nemotron 3 Super 120B A12B (Reasoning)	NVIDIA	—	—	—	36	154 tok/s	1.1s
GPT-5.2 Codex (xhigh)	OpenAI	—	—	—	49	107 tok/s	7.4s
GPT-4o Realtime (Dec '24)	OpenAI	—	—	—	—	—	—
GPT-4	OpenAI	—	—	—	12.8	35 tok/s	0.8s
MiniMax-M2.7	MiniMax	—	—	—	49.6	47 tok/s	1.6s
LFM2.5-1.2B-Thinking	Liquid AI	—	—	—	8.1	—	—
GPT-4o mini Realtime (Dec '24)	OpenAI	—	—	—	—	—	—
GPT-4.5 (Preview)	OpenAI	—	—	—	20	—	—
Gemini 2.0 Flash-Lite (Preview)	Google	—	—	—	14.5	—	—
LFM2.5-1.2B-Instruct	Liquid AI	—	—	—	8	—	—
Solar Pro 3	Upstage	—	—	—	25.9	—	—
LFM2.5-VL-1.6B	Liquid AI	—	—	—	6.2	—	—
Gemini 1.0 Ultra	Google	—	—	—	10.1	—	—
Gemini 2.0 Flash Thinking Experimental (Dec '24)	Google	—	—	—	12.3	—	—
PALM-2	Google	—	—	—	8.6	—	—
Grok 4.20 0309 v2 (Reasoning)	xAI	—	—	—	49.3	175 tok/s	15.5s
Grok 4.20 0309 v2 (Non-reasoning)	xAI	—	—	—	29	177 tok/s	0.4s
Qwen3.6 Max Preview	Alibaba	—	—	—	51.8	57 tok/s	1.9s
Claude 3 Haiku	Anthropic	—	—	—	12.3	131 tok/s	0.5s
R1 1776	Perplexity	—	—	—	12	—	—
Codestral	Mistral AI	$0.3	$0.9	32K	—	180 tok/s	—
Claude 4.1 Opus (Non-reasoning)	Anthropic	—	—	—	36	39 tok/s	1.4s
Claude Opus 4.7 (Non-reasoning, High Effort)	Anthropic	—	—	—	51.8	53 tok/s	1.2s
DeepSeek-V2.5 (Dec '24)	DeepSeek	—	—	—	12.5	—	—
DeepSeek-Coder-V2	DeepSeek	—	—	—	10.6	—	—
DeepSeek LLM 67B Chat (V1)	DeepSeek	—	—	—	8.4	—	—
Gemini 3.1 Pro Preview	Google	—	—	—	57.2	124 tok/s	28.7s
Magpie-Multilingual 357M (Feb 2026)	NVIDIA	—	—	—	—	—	—
Claude Sonnet 4.6 (Non-reasoning, Low Effort)	Anthropic	—	—	—	42.6	60 tok/s	1.0s
Grok 4.20 0309 (Non-reasoning)	xAI	—	—	—	29.7	164 tok/s	0.4s
Sonar Reasoning	Perplexity	—	—	—	17.9	—	—
Mistral Small 4 (Reasoning)	Mistral	—	—	—	27.8	173 tok/s	0.5s
Claude Sonnet 4.6 (Adaptive Reasoning, Max Effort)	Anthropic	—	—	—	51.7	72 tok/s	46.6s
Grok 4.20 0309 (Reasoning)	xAI	—	—	—	48.5	183 tok/s	16.1s
Grok 3 Reasoning Beta	xAI	—	—	—	21.6	—	—
Solar Mini	Upstage	—	—	—	11.9	87 tok/s	1.4s
MiniMax-M2.5	MiniMax	—	—	—	41.9	59 tok/s	2.1s
Gemma 4 E2B (Non-reasoning)	Google	—	—	—	12.1	—	—
Gemma 4 E4B (Non-reasoning)	Google	—	—	—	14.8	—	—
Claude Opus 4.6 (Adaptive Reasoning, Max Effort)	Anthropic	—	—	—	53	53 tok/s	11.7s
Sonar Reasoning Pro	Perplexity	—	—	—	24.6	—	—
Reka Flash (Sep '24)	Reka AI	—	—	—	12	85 tok/s	1.3s
Gemma 4 E2B (Reasoning)	Google	—	—	—	15.2	—	—
Gemini 3.1 Flash-Lite Preview	Google	—	—	—	33.5	319 tok/s	5.7s
GLM-4.7-Flash (Non-reasoning)	Z AI	—	—	—	22.1	105 tok/s	1.0s
GLM-4.7-Flash (Reasoning)	Z AI	—	—	—	30.1	91 tok/s	0.9s
Gemma 4 E4B (Reasoning)	Google	—	—	—	18.8	—	—
GPT-5.4 mini (medium)	OpenAI	—	—	—	37.7	181 tok/s	6.3s
GPT-5.4 mini (xhigh)	OpenAI	—	—	—	48.9	189 tok/s	6.9s
Gemini 2.5 Flash Lite TTS	Google	—	—	—	—	—	—
Grok-1	xAI	—	—	—	11.7	—	—
Gemini 3 Deep Think	Google	—	—	—	—	—	—
Gemma 4 26B A4B (Non-reasoning)	Google	—	—	—	27.1	—	—
Muse Spark	Meta	—	—	—	52.1	—	—
Qwen Chat 72B	Alibaba	—	—	—	8.8	—	—
Gemma 4 31B (Reasoning)	Google	—	—	—	39.2	35 tok/s	1.0s
Arctic Instruct	Snowflake	—	—	—	8.8	—	—
GPT-5.4 nano (medium)	OpenAI	—	—	—	38.1	158 tok/s	3.8s
Qwen1.5 Chat 110B	Alibaba	—	—	—	9.5	—	—
Gemini 2.5 Flash TTS (Dec 2025)	Google	—	—	—	—	—	—
Inworld TTS 1.5 Max	Inworld	—	—	—	—	—	—
Eleven v3	ElevenLabs	—	—	—	—	—	—
GPT-5.4 nano (Non-Reasoning)	OpenAI	—	—	—	24.4	161 tok/s	0.6s
Inworld TTS 1 Max	Inworld	—	—	—	—	—	—
Speech 2.6 HD	MiniMax	—	—	—	—	—	—
Speech 2.8 Turbo	MiniMax	—	—	—	—	—	—
Speech 2.6 Turbo	MiniMax	—	—	—	—	—	—
Inworld TTS 1	Inworld	—	—	—	—	—	—
Speech-02-HD	MiniMax	—	—	—	—	—	—
Azure HD 2.5	Microsoft Azure	—	—	—	—	—	—
Multilingual v2	ElevenLabs	—	—	—	—	—	—
Speech-02-Turbo	MiniMax	—	—	—	—	—	—
TTS-1	OpenAI	—	—	—	—	—	—
Step Audio EditX (Mar 2026)	StepFun	—	—	—	—	—	—
Turbo v2.5	ElevenLabs	—	—	—	—	—	—
Flash v2.5	ElevenLabs	—	—	—	—	—	—
TTS-1 HD	OpenAI	—	—	—	—	—	—
Sonic 3	Cartesia	—	—	—	—	—	—
OpenAudio S1	Fish Audio	—	—	—	—	—	—
Studio	Google	—	—	—	—	—	—
Kokoro 82M v1.0	Kokoro	—	—	—	—	—	—
T2A-01-HD	MiniMax	—	—	—	—	—	—
SIMBA 1.6	Speechify	—	—	—	—	—	—
Polly Generative	Amazon	—	—	—	—	—	—
AsyncFlow V2, async	async	—	—	—	—	—	—
Maya1	Maya Research	—	—	—	—	—	—
Voxtral TTS	Mistral	—	—	—	—	—	—
Azure Neural	Microsoft Azure	—	—	—	—	—	—
Inworld TTS 1.5 Mini	Inworld	—	—	—	—	—	—
Step TTS 2 (Mar 2026)	StepFun	—	—	—	—	—	—
Chatterbox HD	Resemble AI	—	—	—	—	—	—
Journey	Google	—	—	—	—	—	—
SIMBA 1.0	Speechify	—	—	—	—	—	—
MAI-Voice-1	Microsoft Azure	—	—	—	—	—	—
Octave TTS	Hume AI	—	—	—	—	—	—
T2A-01-Turbo	MiniMax	—	—	—	—	—	—
MiMo-V2-TTS	Xiaomi	—	—	—	—	—	—
Fish Speech 1.5	Fish Audio	—	—	—	—	—	—
Lightning v3.1	Smallest.ai	—	—	—	—	—	—
Chatterbox	Resemble AI	—	—	—	—	—	—
Gemini 2.5 Pro (Dec 2025)	Google	—	—	—	—	—	—
Magpie-Multilingual 357M	NVIDIA	—	—	—	—	—	—
Zonos-v0.1	Zyphra	—	—	—	—	—	—
LMNT	LMNT	—	—	—	—	—	—
VibeVoice 7B	Microsoft Azure	—	—	—	—	—	—
Murf Speech Gen 2	Murf AI	—	—	—	—	—	—
VibeVoice 1.5B	Microsoft Azure	—	—	—	—	—	—
OpenVoice v2	OpenVoice	—	—	—	—	—	—
Neuphonic TTS	Neuphonic	—	—	—	—	—	—
Qwen3 TTS	Alibaba	—	—	—	—	—	—
XTTS v2	Coqui	—	—	—	—	—	—
Qwen3 TTS Flash	Alibaba	—	—	—	—	—	—
StyleTTS 2	StyleTTS	—	—	—	—	—	—
WaveNet	Google	—	—	—	—	—	—
Polly Neural	Amazon	—	—	—	—	—	—
Claude Opus 4.7 (Adaptive Reasoning, Max Effort)	Anthropic	—	—	—	57.3	57 tok/s	11.6s
Sonic English (Oct 2024)	Cartesia	—	—	—	—	—	—
Polly Long-Form	Amazon	—	—	—	—	—	—
Falcon (Beta)	Murf AI	—	—	—	—	—	—
Polly Standard	Amazon	—	—	—	—	—	—
GPT-5.4 (xhigh)	OpenAI	—	—	—	56.8	81 tok/s	157.8s
Mistral Small 4 (Non-reasoning)	Mistral	—	—	—	18.6	149 tok/s	0.5s
GPT-5.4 (Non-reasoning)	OpenAI	—	—	—	35.4	62 tok/s	0.7s
JT-MINI	China Mobile	—	—	—	25.4	—	—
GLM-5.1 (Non-reasoning)	Z AI	—	—	—	43.8	47 tok/s	2.1s
GPT-5.3 Codex (xhigh)	OpenAI	—	—	—	53.6	85 tok/s	60.3s
Qwen3.5 9B (Non-reasoning)	Alibaba	—	—	—	27.3	143 tok/s	0.3s
GPT-5.4 Pro (xhigh)	OpenAI	—	—	—	—	—	—
Gemma 4 26B A4B (Reasoning)	Google	—	—	—	31.2	—	—
Qwen Chat 14B	Alibaba	—	—	—	7.4	—	—
Chirp 3: HD	Google	—	—	—	—	—	—
MetaVoice v1	MetaVoice	—	—	—	—	—	—
GPT-5.4 mini (Non-Reasoning)	OpenAI	—	—	—	23.3	176 tok/s	0.6s
DeepSeek-V2-Chat	DeepSeek	—	—	—	9.1	—	—
Kimi K2.6	Kimi	—	—	—	53.9	135 tok/s	0.8s
Qwen3.6 35B A3B (Reasoning)	Alibaba	—	—	—	43.5	238 tok/s	1.7s
Speech 2.8 HD	MiniMax	—	—	—	—	—	—
Standard	Google	—	—	—	—	—	—
Qwen3.5 Omni Flash	Alibaba	—	—	—	—	—	—
Qwen3.6 35B A3B (Non-reasoning)	Alibaba	—	—	—	31.5	193 tok/s	1.5s
Octave 2	Hume AI	—	—	—	—	—	—
Neural2	Google	—	—	—	—	—	—
Ling 2.6 Flash	InclusionAI	—	—	—	26.2	202 tok/s	0.8s

Model	Input Cost	Output Cost	Total/Month	vs Cheapest
DeepSeek R2 DeepSeek	$0.55	$1.09	$1.65	✓ Best value
GPT-4.1 OpenAI	$2.00	$4.00	$6.00	3.6× more
Claude Sonnet 4.6 Anthropic	$3.00	$7.50	$10.50	6.4× more
GPT-4o OpenAI	$5.00	$7.50	$12.50	7.6× more
o3 OpenAI	$10.00	$20.00	$30.00	18.2× more
Claude Opus 4.6 Anthropic	$15.00	$37.50	$52.50	31.9× more

AI Model Comparison

Estimate Your Monthly Cost