OpenAI compatible API. Attested gateway. Public status.

Nebius Token Factory

Nebius Token Factory models on TrustedRouter with prices, routes, policy notes, and source links.

Verify gateway

1 URLbase_url migration

100smodels and routes

0prompt logs by default

`nebius`

No logs

All providers

Provider	Nebius Token Factory
Models	20 public models
Prepaid routes	18
BYOK routes	20
Zero data retention	yes
Confidential compute	not claimed
Provider E2EE	not claimed
Policy note	Marked ZDR via TrustedRouter's arrangement — Nebius RETAINS inputs/outputs by default (for speculative decoding); zero retention is an opt-in control, which the deployed Nebius account has enabled. Nebius does not train on customer data. Policy source

Measured performance

277 samples

Continuously sampled across Nebius Token Factory's routed models — p50 TTFT, throughput, and success rate. Unsupported route and probe-configuration rows are separated from provider downtime. No prompt or output content stored.

p50 TTFT	1191 ms
Throughput	—
Uptime	52.71%

Model	p50 TTFT	p50 TTFB	Throughput	Uptime	Config excluded	Samples
nvidia/Llama-3_1-Nemotron-Ultra-253B-v1	796 ms	691 ms	—	52.94%	—	17
Qwen/Qwen3-30B-A3B-Instruct-2507	811 ms	749 ms	—	75.00%	—	16
NousResearch/Hermes-4-405B	855 ms	834 ms	—	57.89%	—	19
openai/gpt-oss-120b	882 ms	828 ms	—	68.75%	—	16
Qwen/Qwen2.5-VL-72B-Instruct	886 ms	885 ms	—	57.89%	—	19
Qwen/Qwen3-235B-A22B-Instruct-2507	887 ms	782 ms	—	30.00%	—	10
google/gemma-3-27b-it	1188 ms	1106 ms	—	61.90%	—	21
meta-llama/Llama-3.3-70B-Instruct	1191 ms	1087 ms	—	56.52%	—	23
Qwen/Qwen3-32B	1192 ms	1089 ms	—	68.00%	—	25
NousResearch/Hermes-4-70B	1368 ms	1265 ms	—	53.33%	—	15
nvidia/nemotron-3-super-120b-a12b	1405 ms	1404 ms	—	64.29%	—	14
deepseek-ai/DeepSeek-V4-Pro	1540 ms	1436 ms	—	72.73%	—	11
Qwen/Qwen3-Next-80B-A3B-Thinking	1681 ms	1660 ms	—	78.57%	—	14
zai-org/GLM-5.1	4112 ms	4063 ms	—	52.63%	—	19
Qwen/Qwen3.5-397B-A17B	—	—	—	0.00%	5 `probe_config_error`	14
MiniMaxAI/MiniMax-M2.5	—	—	—	0.00%	13 `probe_config_error`	9
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B	—	—	—	0.00%	8 `probe_config_error`	7
nvidia/Nemotron-3-Nano-Omni	—	—	—	0.00%	5 `probe_config_error`	8

Full provider & model leaderboard.

Provider models

Models served by Nebius Token Factory.

Each row links to pricing, provider, benchmark, and API pages for the model.

Model	Context	Endpoints	Prompt	Completion	Routes
`MiniMaxAI/MiniMax-M2.5` MiniMax M2.5 benchmarks performance api	204,800	2	$0.33/1M	$1.32/1M	prepaid BYOK
`NousResearch/Hermes-4-405B` Hermes 4 405B benchmarks performance api	131,072	2	$1.1/1M	$3.3/1M	prepaid BYOK
`NousResearch/Hermes-4-70B` Hermes 4 70B benchmarks performance api	131,072	2	$0.143/1M	$0.44/1M	prepaid BYOK
`Qwen/Qwen2.5-VL-72B-Instruct` Qwen2.5 VL 72B Instruct benchmarks performance api	32,768	2	$0.22/1M	$0.77/1M	prepaid BYOK
`Qwen/Qwen3-235B-A22B-Instruct-2507` Qwen3 235B A22B Instruct 2507 benchmarks performance api	131,072	2	$0.22/1M	$0.66/1M	prepaid BYOK
`Qwen/Qwen3-30B-A3B-Instruct-2507` Qwen3 30B A3B Instruct 2507 benchmarks performance api	131,072	2	$0.11/1M	$0.33/1M	prepaid BYOK
`Qwen/Qwen3-32B` Qwen3 32B benchmarks performance api	131,072	2	$0.11/1M	$0.33/1M	prepaid BYOK
`Qwen/Qwen3-Next-80B-A3B-Thinking` Qwen3 Next 80B A3B Thinking benchmarks performance api	131,072	2	$0.165/1M	$1.65/1M	prepaid BYOK
`Qwen/Qwen3.5-397B-A17B` Qwen3.5 397B A17B benchmarks performance api	262,144	2	$0.66/1M	$3.96/1M	prepaid BYOK
`deepseek-ai/DeepSeek-V4-Pro` DeepSeek V4 Pro benchmarks performance api	1,048,576	2	$1.859/1M	$3.718/1M	prepaid BYOK
`google/gemma-2-2b-it` gemma 2 2b it benchmarks performance api	8,192	1	$0.022/1M	$0.066/1M	BYOK
`google/gemma-3-27b-it` Google: Gemma 3 27B benchmarks performance api	131,072	2	$0.1309/1M	$0.22/1M	prepaid BYOK
`meta-llama/Llama-3.3-70B-Instruct` Llama 3.3 70B Instruct benchmarks performance api	131,072	2	$0.143/1M	$0.44/1M	prepaid BYOK
`meta-llama/Meta-Llama-3.1-8B-Instruct` Meta Llama 3.1 8B Instruct benchmarks performance api	128,000	1	$0.022/1M	$0.066/1M	BYOK
`nvidia/Llama-3_1-Nemotron-Ultra-253B-v1` Llama 3_1 Nemotron Ultra 253B v1 benchmarks performance api	128,000	2	$0.66/1M	$1.98/1M	prepaid BYOK
`nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B` NVIDIA Nemotron 3 Nano 30B A3B benchmarks performance api	131,072	2	$0.11/1M	$0.33/1M	prepaid BYOK
`nvidia/Nemotron-3-Nano-Omni` Nemotron 3 Nano Omni benchmarks performance api	131,072	2	$0.165/1M	$0.495/1M	prepaid BYOK
`nvidia/nemotron-3-super-120b-a12b` nemotron 3 super 120b a12b benchmarks performance api	131,072	2	$0.66/1M	$1.98/1M	prepaid BYOK
`openai/gpt-oss-120b` OpenAI: gpt-oss-120b benchmarks performance api	131,072	2	$0.165/1M	$0.66/1M	prepaid BYOK
`zai-org/GLM-5.1` GLM 5.1 benchmarks performance api	204,800	2	$1.54/1M	$4.84/1M	prepaid BYOK