Incorrect Provider Resolution for :cheapest Model Variant (openai/gpt-oss-20b)
We are observing an inconsistency with the :cheapest routing behavior for the openai/gpt-oss-20b model. When invoking the model using the :cheapest variant, requests are being routed to the fireworks-ai provider, even though hyperbolic is currently listed as the lowest-cost provider for this model on the Hugging Face Inference pricing page.
This can be verified at:
where Hyperbolic is shown as the cheapest available provider.
However, the following request consistently resolves to Fireworks AI:
curl https://huggingface.co/static-proxy/router.huggingface.co/v1/chat/completions \
-H "Authorization: <redacted-token>" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "user", "content": "Hi there buddy" }
],
"model": "openai/gpt-oss-20b:cheapest",
"stream": false
}' -i
The response headers confirm this routing to Fireworks AI: x-inference-provider: fireworks-ai
HTTP/2 200
content-type: application/json
date: Mon, 22 Dec 2025 19:04:24 GMT
x-ratelimit-remaining-tokens-generated: 600000
x-ratelimit-remaining-tokens-prompt: 59925
x-powered-by: huggingface-moon
vary: Origin
access-control-allow-origin: *
access-control-expose-headers: X-Repo-Commit,X-Request-Id,X-Error-Code,X-Error-Message,X-Total-Count,ETag,Link,Accept-Ranges,Content-Range,X-Linked-Size,X-Linked-ETag,X-Xet-Hash
x-robots-tag: none
x-request-id: 1aaebb16-ebf7-42c0-8f44-6743c40fc694
cross-origin-opener-policy: same-origin
referrer-policy: strict-origin-when-cross-origin
x-inference-provider: fireworks-ai <------- HERE
cf-cache-status: DYNAMIC
cf-ray: 9b21e27a7a858de7-IAD
fireworks-prompt-tokens: 75
fireworks-sampling-options: {"max_tokens": 2048, "temperature": 1.0, "top_k": 50, "top_p": 1.0, "min_p": 0.0, "typical_p": 1.0, "frequency_penalty": 0.0, "presence_penalty": 0.0, "repetition_penalty": 1.0, "mirostat_target": null, "mirostat_lr": 0.1}
fireworks-server-processing-time: 1.265
fireworks-server-time-to-first-token: 0.139
fireworks-speculation-prompt-matched-tokens: 0
server: cloudflare
via: 1.1 google, 1.1 a05ab23a60026e7a94dfc15016962b24.cloudfront.net (CloudFront)
x-envoy-upstream-service-time: 1271
x-ratelimit-limit-requests: 6000
x-ratelimit-limit-tokens-generated: 600000
x-ratelimit-limit-tokens-prompt: 60000
x-ratelimit-over-limit: no
x-ratelimit-remaining-requests: 5999
x-cache: Miss from cloudfront
x-amz-cf-pop: DXB53-P2
x-amz-cf-id: X6LPyjWO5rxJoeGTPaN_P-_n2ktrOQ88gXPR5sJcITbwLKgf50Jnyg==
# Response
{"id":"1aaebb16-ebf7-42c0-8f44-6743c40fc694","object":"chat.completion","created":1766430263,"model":"accounts/fireworks/models/gpt-oss-20b","choices":[{"index":0,"message":{"role":"assistant","content":"Hello! 👋 How can I help you today?","reasoning_content":"The user says \"Hi there buddy\". This is greeting. The assistant should respond politely. According to system message: \"You are ChatGPT, a large language model trained by OpenAI.\" There's also context on how to respond when missing info: if the user requests something that the assistant cannot do: reply with \"I'm sorry, but I can't assist with that.\", but the user is just greeting. So a friendly response. Then maybe ask what they need. \nAdditionally guidelines: \"When user says Hi: Reply with a greeting. ... You can ask if you'd like me to answer any question.\" There's no conflicting instructions. So answer: \"Hello! How can I help you today?\" or something. The prompt doesn't ask to start with anything else. Just respond with greeting and ask what they want."},"finish_reason":"stop"}],"usage":{"prompt_tokens":75,"total_tokens":258,"completion_tokens":183,"prompt_tokens_details":{"cached_tokens":0}}}
Can someone please help me understand this?
Thanks in advance!

