Compare pricing, benchmarks, and capabilities across 72 AI models
| Model | Provider | Type | ELO Rank↑ | ELO Score | Released |
|---|---|---|---|---|---|
Magpie-Multilingual 357M (Feb 2026) | NVIDIA | text-to-speech | #25 | 1063 | — |
Inworld TTS 1.5 Max | Inworld | text-to-speech | #1 | 1212 | — |
Eleven v3 | ElevenLabs | text-to-speech | #3 | 1179 | — |
Inworld TTS 1 Max | Inworld | text-to-speech | #5 | 1164 | — |
Speech 2.6 HD | MiniMax | text-to-speech | #9 | 1133 | — |
Speech 2.8 Turbo | MiniMax | text-to-speech | #8 | 1149 | — |
Speech 2.6 Turbo | MiniMax | text-to-speech | #10 | 1131 | — |
Inworld TTS 1 | Inworld | text-to-speech | #12 | 1120 | — |
Speech-02-HD | MiniMax | text-to-speech | #13 | 1119 | — |
Azure HD 2.5 | Microsoft Azure | text-to-speech | #14 | 1116 | — |
Multilingual v2 | ElevenLabs | text-to-speech | #15 | 1108 | — |
Speech-02-Turbo | MiniMax | text-to-speech | #16 | 1101 | — |
TTS-1 | OpenAI | text-to-speech | #18 | 1101 | — |
Step Audio EditX (Mar 2026) | StepFun | text-to-speech | #17 | 1101 | — |
Turbo v2.5 | ElevenLabs | text-to-speech | #19 | 1099 | — |
Flash v2.5 | ElevenLabs | text-to-speech | #21 | 1089 | — |
TTS-1 HD | OpenAI | text-to-speech | #20 | 1099 | — |
Sonic 3 | Cartesia | text-to-speech | #23 | 1069 | — |
OpenAudio S1 | Fish Audio | text-to-speech | #24 | 1065 | — |
Studio | text-to-speech | #26 | 1062 | — | |
Kokoro 82M v1.0 | Kokoro | text-to-speech | #29 | 1056 | — |
T2A-01-HD | MiniMax | text-to-speech | #27 | 1060 | — |
SIMBA 1.6 | Speechify | text-to-speech | #28 | 1058 | — |
Polly Generative | Amazon | text-to-speech | #30 | 1055 | — |
AsyncFlow V2, async | async | text-to-speech | #31 | 1051 | — |
Maya1 | Maya Research | text-to-speech | #32 | 1051 | — |
Voxtral TTS | Mistral | text-to-speech | #35 | 1044 | — |
Azure Neural | Microsoft Azure | text-to-speech | #33 | 1049 | — |
Inworld TTS 1.5 Mini | Inworld | text-to-speech | #6 | 1159 | — |
Step TTS 2 (Mar 2026) | StepFun | text-to-speech | #7 | 1153 | — |
Chatterbox HD | Resemble AI | text-to-speech | #39 | 1036 | — |
Journey | text-to-speech | #41 | 1029 | — | |
SIMBA 1.0 | Speechify | text-to-speech | #42 | 1025 | — |
MAI-Voice-1 | Microsoft Azure | text-to-speech | #43 | 1024 | — |
Octave TTS | Hume AI | text-to-speech | #47 | 1019 | — |
T2A-01-Turbo | MiniMax | text-to-speech | #45 | 1022 | — |
MiMo-V2-TTS | Xiaomi | text-to-speech | #46 | 1021 | — |
Fish Speech 1.5 | Fish Audio | text-to-speech | #49 | 1014 | — |
Lightning v3.1 | Smallest.ai | text-to-speech | #48 | 1015 | — |
Chatterbox | Resemble AI | text-to-speech | #50 | 1007 | — |
Gemini 2.5 Pro (Dec 2025) | text-to-speech | #44 | 1024 | — | |
Magpie-Multilingual 357M | NVIDIA | text-to-speech | #51 | 1002 | — |
Zonos-v0.1 | Zyphra | text-to-speech | #52 | 1000 | — |
LMNT | LMNT | text-to-speech | #54 | 967 | — |
VibeVoice 7B | Microsoft Azure | text-to-speech | #55 | 960 | — |
Murf Speech Gen 2 | Murf AI | text-to-speech | #57 | 956 | — |
VibeVoice 1.5B | Microsoft Azure | text-to-speech | #56 | 958 | — |
OpenVoice v2 | OpenVoice | text-to-speech | #58 | 951 | — |
Neuphonic TTS | Neuphonic | text-to-speech | #60 | 938 | — |
Qwen3 TTS | Alibaba | text-to-speech | #61 | 936 | — |
XTTS v2 | Coqui | text-to-speech | #63 | 885 | — |
Qwen3 TTS Flash | Alibaba | text-to-speech | #62 | 931 | — |
StyleTTS 2 | StyleTTS | text-to-speech | #64 | 880 | — |
WaveNet | text-to-speech | #65 | 872 | — | |
Polly Neural | Amazon | text-to-speech | #66 | 867 | — |
Sonic English (Oct 2024) | Cartesia | text-to-speech | #34 | 1046 | — |
Polly Long-Form | Amazon | text-to-speech | #37 | 1042 | — |
Falcon (Beta) | Murf AI | text-to-speech | #69 | 816 | — |
Polly Standard | Amazon | text-to-speech | #70 | 801 | — |
Chirp 3: HD | text-to-speech | #38 | 1041 | — | |
MetaVoice v1 | MetaVoice | text-to-speech | #71 | 767 | — |
Speech 2.8 HD | MiniMax | text-to-speech | #4 | 1165 | — |
Standard | text-to-speech | #67 | 844 | — | |
Qwen3.5 Omni Flash | Alibaba | text-to-speech | #16 | 1105 | — |
Octave 2 | Hume AI | text-to-speech | #36 | 1044 | — |
Neural2 | text-to-speech | #68 | 843 | — | |
Gemini 3.1 Flash TTS | text-to-speech | #2 | 1206 | — | |
Fish Audio S2 Pro | Fish Audio | text-to-speech | #11 | 1130 | — |
Arcana v3 | Rime | text-to-speech | #53 | 975 | — |
Magpie Multilingual | NVIDIA | text-to-speech | #59 | 946 | — |
Gemini 2.5 Flash Lite TTS | text-to-speech | #22 | 1079 | — | |
Gemini 2.5 Flash TTS (Dec 2025) | text-to-speech | #40 | 1034 | — |
Enter your expected usage to compare costs across models
e.g. 1,000,000 = ~750,000 words
Usually 30–50% of input volume
6 models selected
Prices are approximate and may vary. Check provider documentation for current pricing.