Compare pricing, benchmarks, and capabilities across 19 AI models
| Model | Provider | Input $/1M↕ | Output $/1M↕ | Context↕ | Intelligence↑ | Speed↕ | Latency | API |
|---|---|---|---|---|---|---|---|---|
Llama 3.1 Nemotron Ultra 253B v1 (Reasoning) | NVIDIA | — | — | — | 15 | 42 tok/s | 0.7s | |
Llama Nemotron Super 49B v1.5 (Reasoning) | NVIDIA | — | — | — | 18.7 | 60 tok/s | 0.3s | |
NVIDIA Nemotron 3 Nano 30B A3B (Reasoning) | NVIDIA | — | — | — | 24.3 | 133 tok/s | 1.3s | |
Llama 3.3 Nemotron Super 49B v1 (Reasoning) | NVIDIA | — | — | — | 18.5 | — | — | |
NVIDIA Nemotron Nano 12B v2 VL (Reasoning) | NVIDIA | — | — | — | 14.9 | 151 tok/s | 0.5s | |
NVIDIA Nemotron Nano 9B V2 (Non-reasoning) | NVIDIA | — | — | — | 13.2 | 153 tok/s | 0.7s | |
NVIDIA Nemotron Nano 9B V2 (Reasoning) | NVIDIA | — | — | — | 14.8 | 117 tok/s | 0.3s | |
Llama 3.3 Nemotron Super 49B v1 (Non-reasoning) | NVIDIA | — | — | — | 14.3 | — | — | |
Llama 3.1 Nemotron Instruct 70B | NVIDIA | — | — | — | 13.4 | 46 tok/s | 0.3s | |
Llama Nemotron Super 49B v1.5 (Non-reasoning) | NVIDIA | — | — | — | 14.6 | 58 tok/s | 0.3s | |
NVIDIA Nemotron Nano 12B v2 VL (Non-reasoning) | NVIDIA | — | — | — | 10.1 | 175 tok/s | 0.7s | |
NVIDIA Nemotron 3 Nano 30B A3B (Non-reasoning) | NVIDIA | — | — | — | 13.2 | 78 tok/s | 0.3s | |
Llama 3.1 Nemotron Nano 4B v1.1 (Reasoning) | NVIDIA | — | — | — | 14.4 | — | — | |
Magpie Multilingual | NVIDIA | — | — | — | — | — | — | |
NVIDIA Nemotron 3 Super 120B A12B (Reasoning) | NVIDIA | — | — | — | 36 | 154 tok/s | 1.1s | |
Nemotron Cascade 2 30B A3B | NVIDIA | — | — | — | 28.4 | — | — | |
NVIDIA Nemotron 3 Nano 4B | NVIDIA | — | — | — | 14.7 | — | — | |
Magpie-Multilingual 357M | NVIDIA | — | — | — | — | — | — | |
Magpie-Multilingual 357M (Feb 2026) | NVIDIA | — | — | — | — | — | — |
Enter your expected usage to compare costs across models
e.g. 1,000,000 = ~750,000 words
Usually 30–50% of input volume
6 models selected
Prices are approximate and may vary. Check provider documentation for current pricing.