OpenAI compatible API. Attested gateway. Public status.
Nebius Token Factory
Nebius Token Factory models on TrustedRouter with prices, routes, policy notes, and source links.
1 URLbase_url migration
100smodels and routes
0prompt logs by default
nebius
No logs
| Provider | Nebius Token Factory |
|---|---|
| Models | 20 public models |
| Prepaid routes | 18 |
| BYOK routes | 20 |
| Zero data retention | yes |
| Confidential compute | not claimed |
| Provider E2EE | not claimed |
| Policy note | Marked ZDR via TrustedRouter's arrangement — Nebius RETAINS inputs/outputs by default (for speculative decoding); zero retention is an opt-in control, which the deployed Nebius account has enabled. Nebius does not train on customer data. Policy source |
Measured performance
277 samplesContinuously sampled across Nebius Token Factory's routed models — p50 TTFT, throughput, and success rate. Unsupported route and probe-configuration rows are separated from provider downtime. No prompt or output content stored.
| p50 TTFT | 1191 ms |
|---|---|
| Throughput | — |
| Uptime | 52.71% |
| Model | p50 TTFT | p50 TTFB | Throughput | Uptime | Config excluded | Samples |
|---|---|---|---|---|---|---|
| nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 | 796 ms | 691 ms | — | 52.94% | — | 17 |
| Qwen/Qwen3-30B-A3B-Instruct-2507 | 811 ms | 749 ms | — | 75.00% | — | 16 |
| NousResearch/Hermes-4-405B | 855 ms | 834 ms | — | 57.89% | — | 19 |
| openai/gpt-oss-120b | 882 ms | 828 ms | — | 68.75% | — | 16 |
| Qwen/Qwen2.5-VL-72B-Instruct | 886 ms | 885 ms | — | 57.89% | — | 19 |
| Qwen/Qwen3-235B-A22B-Instruct-2507 | 887 ms | 782 ms | — | 30.00% | — | 10 |
| google/gemma-3-27b-it | 1188 ms | 1106 ms | — | 61.90% | — | 21 |
| meta-llama/Llama-3.3-70B-Instruct | 1191 ms | 1087 ms | — | 56.52% | — | 23 |
| Qwen/Qwen3-32B | 1192 ms | 1089 ms | — | 68.00% | — | 25 |
| NousResearch/Hermes-4-70B | 1368 ms | 1265 ms | — | 53.33% | — | 15 |
| nvidia/nemotron-3-super-120b-a12b | 1405 ms | 1404 ms | — | 64.29% | — | 14 |
| deepseek-ai/DeepSeek-V4-Pro | 1540 ms | 1436 ms | — | 72.73% | — | 11 |
| Qwen/Qwen3-Next-80B-A3B-Thinking | 1681 ms | 1660 ms | — | 78.57% | — | 14 |
| zai-org/GLM-5.1 | 4112 ms | 4063 ms | — | 52.63% | — | 19 |
| Qwen/Qwen3.5-397B-A17B | — | — | — | 0.00% | 5 probe_config_error |
14 |
| MiniMaxAI/MiniMax-M2.5 | — | — | — | 0.00% | 13 probe_config_error |
9 |
| nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B | — | — | — | 0.00% | 8 probe_config_error |
7 |
| nvidia/Nemotron-3-Nano-Omni | — | — | — | 0.00% | 5 probe_config_error |
8 |
Provider models
Models served by Nebius Token Factory.
Each row links to pricing, provider, benchmark, and API pages for the model.
| Model | Context | Endpoints | Prompt | Completion | Routes |
|---|---|---|---|---|---|
MiniMaxAI/MiniMax-M2.5MiniMax M2.5 |
204,800 | 2 | $0.33/1M | $1.32/1M | prepaid BYOK |
NousResearch/Hermes-4-405BHermes 4 405B |
131,072 | 2 | $1.1/1M | $3.3/1M | prepaid BYOK |
NousResearch/Hermes-4-70BHermes 4 70B |
131,072 | 2 | $0.143/1M | $0.44/1M | prepaid BYOK |
Qwen/Qwen2.5-VL-72B-InstructQwen2.5 VL 72B Instruct |
32,768 | 2 | $0.22/1M | $0.77/1M | prepaid BYOK |
Qwen/Qwen3-235B-A22B-Instruct-2507Qwen3 235B A22B Instruct 2507 |
131,072 | 2 | $0.22/1M | $0.66/1M | prepaid BYOK |
Qwen/Qwen3-30B-A3B-Instruct-2507Qwen3 30B A3B Instruct 2507 |
131,072 | 2 | $0.11/1M | $0.33/1M | prepaid BYOK |
Qwen/Qwen3-32BQwen3 32B |
131,072 | 2 | $0.11/1M | $0.33/1M | prepaid BYOK |
Qwen/Qwen3-Next-80B-A3B-ThinkingQwen3 Next 80B A3B Thinking |
131,072 | 2 | $0.165/1M | $1.65/1M | prepaid BYOK |
Qwen/Qwen3.5-397B-A17BQwen3.5 397B A17B |
262,144 | 2 | $0.66/1M | $3.96/1M | prepaid BYOK |
deepseek-ai/DeepSeek-V4-ProDeepSeek V4 Pro |
1,048,576 | 2 | $1.859/1M | $3.718/1M | prepaid BYOK |
google/gemma-2-2b-itgemma 2 2b it |
8,192 | 1 | $0.022/1M | $0.066/1M | BYOK |
google/gemma-3-27b-itGoogle: Gemma 3 27B |
131,072 | 2 | $0.1309/1M | $0.22/1M | prepaid BYOK |
meta-llama/Llama-3.3-70B-InstructLlama 3.3 70B Instruct |
131,072 | 2 | $0.143/1M | $0.44/1M | prepaid BYOK |
meta-llama/Meta-Llama-3.1-8B-InstructMeta Llama 3.1 8B Instruct |
128,000 | 1 | $0.022/1M | $0.066/1M | BYOK |
nvidia/Llama-3_1-Nemotron-Ultra-253B-v1Llama 3_1 Nemotron Ultra 253B v1 |
128,000 | 2 | $0.66/1M | $1.98/1M | prepaid BYOK |
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3BNVIDIA Nemotron 3 Nano 30B A3B |
131,072 | 2 | $0.11/1M | $0.33/1M | prepaid BYOK |
nvidia/Nemotron-3-Nano-OmniNemotron 3 Nano Omni |
131,072 | 2 | $0.165/1M | $0.495/1M | prepaid BYOK |
nvidia/nemotron-3-super-120b-a12bnemotron 3 super 120b a12b |
131,072 | 2 | $0.66/1M | $1.98/1M | prepaid BYOK |
openai/gpt-oss-120bOpenAI: gpt-oss-120b |
131,072 | 2 | $0.165/1M | $0.66/1M | prepaid BYOK |
zai-org/GLM-5.1GLM 5.1 |
204,800 | 2 | $1.54/1M | $4.84/1M | prepaid BYOK |