Hmm… Is that just how :cheapest behaves…?
What you’re seeing is consistent with how HF Inference Providers is documented to work once you include the words “available right now” in the definition of :cheapest.
1) What :cheapest is supposed to do
Hugging Face documents :cheapest as a selection policy that “selects the provider with the lowest price per output token.” (Hugging Face)
So, in the abstract, if Hyperbolic is the lowest output $/1M for openai/gpt-oss-20b, it should win.
2) Why you can still land on Fireworks even if Hyperbolic is “cheapest” on the catalog page
There are several “filters” before pricing is even considered. The catalog UI is a static comparison table. The router is a runtime system that applies eligibility and availability constraints.
A. Failover on availability and health (most common)
HF explicitly describes automatic failover: if the chosen provider is “flagged as unavailable by our validation system,” requests are routed to alternatives. (Hugging Face)
So the sequence can be:
- Compute cheapest candidate (Hyperbolic, by output token price).
- Check if Hyperbolic is currently considered available for that model + endpoint + your auth context.
- If not available, fallback to the next eligible provider (Fireworks in your case).
- Router returns
x-inference-provider: fireworks-ai.
This exactly matches your symptom of “consistently resolves to Fireworks.” It suggests Hyperbolic is being excluded at runtime, not that the suffix is ignored.
B. Provider eligibility is not universal (account, org, and settings constraints)
Even when a provider is listed on the public pricing/catalog page, it may be disabled for you:
- HF routes “by default” following your Inference Provider settings preference order. (Hugging Face)
- Organizations can disable a set of Inference Providers in org settings. (Hugging Face)
- Billing mode matters. “Routed by HF” vs “Custom Provider Key” changes what HF can do on your behalf. (Hugging Face)
Docs do not spell out whether :cheapest ignores provider allow/deny lists, but in practice routers almost always compute “cheapest among eligible providers.” If Hyperbolic is disabled (by you or your org), it will never be selected.
C. Endpoint or feature compatibility filtering (chat vs “text generation” differences)
Your request uses the OpenAI-compatible chat endpoint (/v1/chat/completions), which HF notes is “chat tasks only.” (Hugging Face)
Even if a provider serves the model, it can still be excluded for a specific endpoint if the mapping for that provider-model pair does not support that task or required parameters (structured outputs, tools, etc.). The catalog table mixes multiple capability signals (Tools, Structured) and can differ by provider row. (Hugging Face)
In your specific payload you are not using tools or structured outputs. But task-level mapping mismatches still happen in practice.
D. Catalog price labels can lag backend routing metadata
The “cheapest” badge and displayed prices come from provider metadata used in the Hub UI. Providers register pricing metadata and HF uses it for comparison and selection features like :cheapest. (Hugging Face)
If backend routing uses a cached snapshot or temporarily overrides a provider (maintenance, incident), UI and router can diverge.
3) Confirm what is happening in your case (fast checks)
Run these in order. Each one narrows the cause.
Step 1. Force Hyperbolic explicitly
If this fails, then :cheapest is doing the right thing by skipping Hyperbolic.
curl https://huggingface.co/static-proxy/router.huggingface.co/v1/chat/completions \
-H "Authorization: Bearer $HF_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-oss-20b:hyperbolic",
"messages": [{"role":"user","content":"ping"}],
"stream": false
}' -i
Interpretation:
- 200 OK +
x-inference-provider: hyperbolic: Hyperbolic works. If:cheapeststill picks Fireworks, that is likely a routing bug or an “eligibility” filter (Step 2). - 4xx/5xx: Hyperbolic is unavailable to you for that route. That explains Fireworks fallback.
Step 2. Check your provider allowlist / org policy
If you are in an org, confirm the org didn’t disable Hyperbolic. HF explicitly allows org admins to disable providers. (Hugging Face)
Also check your personal Inference Provider settings and whether Hyperbolic is enabled and not blocked by preference configuration. (Hugging Face)
Step 3. Ask the router what it thinks is available
HF documents that the router exposes GET /v1/models to list available models across providers. (Hugging Face)
curl https://huggingface.co/static-proxy/router.huggingface.co/v1/models \
-H "Authorization: Bearer $HF_TOKEN"
Look for:
- Whether
openai/gpt-oss-20bappears once or multiple times. - Whether the router exposes any metadata about “cheapest/fastest” or provider availability.
Step 4. Inspect provider mapping via Hub API (debug view)
HF’s Hub API supports an inferenceProviderMapping expansion that exposes which providers are mapped for a model and can include status-like fields (live vs staging patterns exist in HF’s provider ecosystem docs). (Hugging Face)
Try:
curl -s \
"https://huggingface.co/api/models/openai/gpt-oss-20b?expand=inferenceProviderMapping" \
| jq .
You are looking for:
- Whether Hyperbolic is present in the mapping.
- Any “status” fields or hints that it is not live for chat completion.
4) Practical workarounds
If you need the cheapest provider deterministically
Use an explicit provider suffix instead of :cheapest:
openai/gpt-oss-20b:hyperbolic
This bypasses price routing logic and avoids “cheapest but filtered out” ambiguity. HF documents explicit provider selection via suffix. (Hugging Face)
If you want “cheapest with safe fallback”
Set provider preference order so Hyperbolic is first, then a fallback provider second, and call without :cheapest so you get “first available” behavior. That behavior is explicitly documented. (Hugging Face)
This is not identical to :cheapest, but it is operationally stable.
5) If this is a bug, what to include in a report
To make HF able to reproduce quickly, include:
x-request-id(you already have it)- exact timestamp and region POP (
cf-rayhints at this) - result of
:hyperbolicforced call - output of
/v1/modelsfiltered for that model - output of Hub API
expand=inferenceProviderMapping
That distinguishes “router bug” from “provider unavailable/disabled.”
Good references and background reading (high-signal)
- HF Inference Providers overview and routing policies (
:cheapest,:fastest, explicit provider, failover): https://huggingface.co/docs/inference-providers/en/index (Hugging Face) - HF Pricing and Billing (HF-routed vs custom provider keys, org-level provider disabling): https://huggingface.co/docs/inference-providers/en/pricing (Hugging Face)
- Inspect guide that restates the meaning of
:cheapest(“lowest price per output token”): https://huggingface.co/docs/inference-providers/en/guides/evaluation-inspect-ai (Hugging Face) - Hub API / provider mapping discussion showing
expand=inferenceProviderMappingin the wild: https://github.com/crewAIInc/crewAI/issues/3038 (GitHub) - HF forum thread mentioning the router suffixes and how users select cheapest/fastest: https://huggingface.co/static-proxy/discuss.huggingface.co/t/batch-inference-with-huggingface-hub-for-serverless-providers/170649 (Hugging Face Forums)
Summary
:cheapestmeans “lowest output-token price,” but only among providers the router considers eligible and available. (Hugging Face)- If Hyperbolic is unavailable, disabled, or incompatible for that endpoint, failover can route you to Fireworks. (Hugging Face)
- Verify by forcing
:hyperbolic, then check/v1/modelsandexpand=inferenceProviderMapping. (Hugging Face)