How are you deploying models without inference providers?

I’ve noticed some models on Hugging Face don’t have an attached inference provider. For those using these models in real projects, how are you deploying them today?

1 Like

I think using a pay-as-you-go Inference Endpoint is the simplest approach. There also seem to be several similar services offered by other companies.

I mean , my question was they aren’t providers available for thousands of middle on HF. Not sure how they get severed unless it’s either locally or On-Prem.

1 Like

For the most part, they get served locally. But I think it also depends greatly on the project though. The average user on hugging face is browsing models they can download to run on llama.cpp, ollama, LM Studio, or other local inference apps for themselves.

If a user doesn’t have the compute on prem to run the models they want to locally, then they could download the model from hugging face, rent their own servers and run inference that way.

The inference providers on hugging face just provide a curated selection of open source models, but you can run inference on them literally however you want to. That’s the beauty of open source.

2 Likes

Ah, got it! It would be really nice to have a provider for these models. So many of them. It’s probably not practical but still .. thank you though .

2 Likes

We’ve seen a similar split in practice. A lot of models without attached providers end up being used either locally (llama.cpp / Ollama / LM Studio) or served by teams on their own infrastructure once they move past experimentation.

In many cases the lack of a default provider is intentional. It gives teams flexibility to deploy based on their own cost, latency, and control requirements rather than a one-size-fits-all endpoint.

1 Like