Compare pricing, benchmarks, and capabilities across 5 AI models
| Model | Provider | Input $/1M↕ | Output $/1M↕ | Context↕ | Intelligence↑ | Speed↕ | Latency | API |
|---|---|---|---|---|---|---|---|---|
Step 3.5 Flash | StepFun | — | — | — | 37.8 | 163 tok/s | 0.8s | |
Step 3.5 Flash 2603 | StepFun | — | — | — | 38.5 | 186 tok/s | 0.8s | |
Step3 VL 10B | StepFun | — | — | — | 15.4 | — | — | |
Step TTS 2 (Mar 2026) | StepFun | — | — | — | — | — | — | |
Step Audio EditX (Mar 2026) | StepFun | — | — | — | — | — | — |
Enter your expected usage to compare costs across models
e.g. 1,000,000 = ~750,000 words
Usually 30–50% of input volume
5 models selected
Prices are approximate and may vary. Check provider documentation for current pricing.