OpenAI compatible API. Attested gateway. Public status.

Nebius Token Factory

Nebius Token Factory models on TrustedRouter with prices, routes, policy notes, and source links.

Verify gateway
1 URLbase_url migration
100smodels and routes
0prompt logs by default

nebius

No logs

All providers

ProviderNebius Token Factory
Models20 public models
Prepaid routes18
BYOK routes20
Zero data retentionyes
Confidential computenot claimed
Provider E2EEnot claimed
Policy noteMarked ZDR via TrustedRouter's arrangement — Nebius RETAINS inputs/outputs by default (for speculative decoding); zero retention is an opt-in control, which the deployed Nebius account has enabled. Nebius does not train on customer data.
Policy source

Measured performance

277 samples

Continuously sampled across Nebius Token Factory's routed models — p50 TTFT, throughput, and success rate. Unsupported route and probe-configuration rows are separated from provider downtime. No prompt or output content stored.

p50 TTFT1191 ms
Throughput
Uptime52.71%
Modelp50 TTFTp50 TTFBThroughputUptimeConfig excludedSamples
nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 796 ms 691 ms 52.94% 17
Qwen/Qwen3-30B-A3B-Instruct-2507 811 ms 749 ms 75.00% 16
NousResearch/Hermes-4-405B 855 ms 834 ms 57.89% 19
openai/gpt-oss-120b 882 ms 828 ms 68.75% 16
Qwen/Qwen2.5-VL-72B-Instruct 886 ms 885 ms 57.89% 19
Qwen/Qwen3-235B-A22B-Instruct-2507 887 ms 782 ms 30.00% 10
google/gemma-3-27b-it 1188 ms 1106 ms 61.90% 21
meta-llama/Llama-3.3-70B-Instruct 1191 ms 1087 ms 56.52% 23
Qwen/Qwen3-32B 1192 ms 1089 ms 68.00% 25
NousResearch/Hermes-4-70B 1368 ms 1265 ms 53.33% 15
nvidia/nemotron-3-super-120b-a12b 1405 ms 1404 ms 64.29% 14
deepseek-ai/DeepSeek-V4-Pro 1540 ms 1436 ms 72.73% 11
Qwen/Qwen3-Next-80B-A3B-Thinking 1681 ms 1660 ms 78.57% 14
zai-org/GLM-5.1 4112 ms 4063 ms 52.63% 19
Qwen/Qwen3.5-397B-A17B 0.00% 5 probe_config_error 14
MiniMaxAI/MiniMax-M2.5 0.00% 13 probe_config_error 9
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B 0.00% 8 probe_config_error 7
nvidia/Nemotron-3-Nano-Omni 0.00% 5 probe_config_error 8

Full provider & model leaderboard.

Provider models

Models served by Nebius Token Factory.

Each row links to pricing, provider, benchmark, and API pages for the model.

Model Context Endpoints Prompt Completion Routes
MiniMaxAI/MiniMax-M2.5
MiniMax M2.5
204,800 2 $0.33/1M $1.32/1M prepaid BYOK
NousResearch/Hermes-4-405B
Hermes 4 405B
131,072 2 $1.1/1M $3.3/1M prepaid BYOK
NousResearch/Hermes-4-70B
Hermes 4 70B
131,072 2 $0.143/1M $0.44/1M prepaid BYOK
Qwen/Qwen2.5-VL-72B-Instruct
Qwen2.5 VL 72B Instruct
32,768 2 $0.22/1M $0.77/1M prepaid BYOK
Qwen/Qwen3-235B-A22B-Instruct-2507
Qwen3 235B A22B Instruct 2507
131,072 2 $0.22/1M $0.66/1M prepaid BYOK
Qwen/Qwen3-30B-A3B-Instruct-2507
Qwen3 30B A3B Instruct 2507
131,072 2 $0.11/1M $0.33/1M prepaid BYOK
Qwen/Qwen3-32B
Qwen3 32B
131,072 2 $0.11/1M $0.33/1M prepaid BYOK
Qwen/Qwen3-Next-80B-A3B-Thinking
Qwen3 Next 80B A3B Thinking
131,072 2 $0.165/1M $1.65/1M prepaid BYOK
Qwen/Qwen3.5-397B-A17B
Qwen3.5 397B A17B
262,144 2 $0.66/1M $3.96/1M prepaid BYOK
deepseek-ai/DeepSeek-V4-Pro
DeepSeek V4 Pro
1,048,576 2 $1.859/1M $3.718/1M prepaid BYOK
google/gemma-2-2b-it
gemma 2 2b it
8,192 1 $0.022/1M $0.066/1M BYOK
google/gemma-3-27b-it
Google: Gemma 3 27B
131,072 2 $0.1309/1M $0.22/1M prepaid BYOK
meta-llama/Llama-3.3-70B-Instruct
Llama 3.3 70B Instruct
131,072 2 $0.143/1M $0.44/1M prepaid BYOK
meta-llama/Meta-Llama-3.1-8B-Instruct
Meta Llama 3.1 8B Instruct
128,000 1 $0.022/1M $0.066/1M BYOK
nvidia/Llama-3_1-Nemotron-Ultra-253B-v1
Llama 3_1 Nemotron Ultra 253B v1
128,000 2 $0.66/1M $1.98/1M prepaid BYOK
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B
NVIDIA Nemotron 3 Nano 30B A3B
131,072 2 $0.11/1M $0.33/1M prepaid BYOK
nvidia/Nemotron-3-Nano-Omni
Nemotron 3 Nano Omni
131,072 2 $0.165/1M $0.495/1M prepaid BYOK
nvidia/nemotron-3-super-120b-a12b
nemotron 3 super 120b a12b
131,072 2 $0.66/1M $1.98/1M prepaid BYOK
openai/gpt-oss-120b
OpenAI: gpt-oss-120b
131,072 2 $0.165/1M $0.66/1M prepaid BYOK
zai-org/GLM-5.1
GLM 5.1
204,800 2 $1.54/1M $4.84/1M prepaid BYOK

Sign in

Choose a sign in method.