Skip to content

Revamp Inference Providers doc #1652

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Apr 1, 2025
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/api-inference/_redirects.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
quicktour: index
detailed_parameters: parameters
parallelism: getting_started
usage: getting_started
detailed_parameters: tasks/index
parallelism: index
usage: index
faq: index
rate-limits: pricing
46 changes: 23 additions & 23 deletions docs/api-inference/_toctree.yml
Original file line number Diff line number Diff line change
@@ -1,27 +1,33 @@
- sections:
- title: Get Started
sections:
- local: index
title: Serverless Inference API
- local: getting-started
title: Getting Started
- local: supported-models
title: Supported Models
title: Inference Providers
- local: pricing
title: Pricing and Rate limits
title: Pricing and Billing
- local: hub-integration
title: Hub integration
- local: security
title: Security
title: Getting Started
- sections:
- local: parameters
title: Parameters
- sections:
- local: tasks/audio-classification
title: Audio Classification
- local: tasks/automatic-speech-recognition
title: Automatic Speech Recognition
- title: API Reference
sections:
- local: tasks/index
title: Index
- local: hub-api
title: Hub API
- title: Popular Tasks
sections:
- local: tasks/chat-completion
title: Chat Completion
- local: tasks/feature-extraction
title: Feature Extraction
- local: tasks/text-to-image
title: Text to Image
- title: Other Tasks
sections:
- local: tasks/audio-classification
title: Audio Classification
- local: tasks/automatic-speech-recognition
title: Automatic Speech Recognition
- local: tasks/fill-mask
title: Fill Mask
- local: tasks/image-classification
Expand All @@ -30,8 +36,6 @@
title: Image Segmentation
- local: tasks/image-to-image
title: Image to Image
- local: tasks/image-text-to-text
title: Image-Text to Text
- local: tasks/object-detection
title: Object Detection
- local: tasks/question-answering
Expand All @@ -44,13 +48,9 @@
title: Text Classification
- local: tasks/text-generation
title: Text Generation
- local: tasks/text-to-image
title: Text to Image
- local: tasks/token-classification
title: Token Classification
- local: tasks/translation
title: Translation
- local: tasks/zero-shot-classification
title: Zero Shot Classification
title: Detailed Task Parameters
title: API Reference
title: Zero Shot Classification
95 changes: 0 additions & 95 deletions docs/api-inference/getting-started.md

This file was deleted.

173 changes: 173 additions & 0 deletions docs/api-inference/hub-api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
# Hub API

The Hub provides a few APIs to interact with Inference Providers. Here is a list of them:

## List models

To list models powered by a provider, use the `inference_provider` query parameter:

```sh
# List all models served by Fireworks AI
~ curl -s https://huggingface.co/api/models?inference_provider=fireworks-ai | jq ".[].id"
"deepseek-ai/DeepSeek-V3-0324"
"deepseek-ai/DeepSeek-R1"
"Qwen/QwQ-32B"
"deepseek-ai/DeepSeek-V3"
...
```

It can be combined with other filters to e.g. select only `text-to-image` models:

```sh
# List text-to-image models served by Fal AI
~ curl -s https://huggingface.co/api/models?inference_provider=fal-ai&pipeline_tag=text-to-image | jq ".[].id"
"black-forest-labs/FLUX.1-dev"
"stabilityai/stable-diffusion-3.5-large"
"black-forest-labs/FLUX.1-schnell"
"stabilityai/stable-diffusion-3.5-large-turbo"
...
```

Pass a comma-separated list of providers to select multiple:

```sh
# List image-text-to-text models served by Novita or Sambanova
~ curl -s https://huggingface.co/api/models?inference_provider=sambanova,novita&pipeline_tag=image-text-to-text | jq ".[].id"
"meta-llama/Llama-3.2-11B-Vision-Instruct"
"meta-llama/Llama-3.2-90B-Vision-Instruct"
"Qwen/Qwen2-VL-72B-Instruct"
```

Finally, you can select all models served by at least one inference provider:

```sh
# List text-to-video models served by any provider
~ curl -s https://huggingface.co/api/models?inference_provider=all&pipeline_tag=text-to-video | jq ".[].id"
"Wan-AI/Wan2.1-T2V-14B"
"Lightricks/LTX-Video"
"tencent/HunyuanVideo"
"Wan-AI/Wan2.1-T2V-1.3B"
"THUDM/CogVideoX-5b"
"genmo/mochi-1-preview"
"BagOu22/Lora_HKLPAZ"
```

## Get model status

To find an inference provider for a specific model, request the `inference` attribute in the model info endpoint:

<inferencesnippet>

<curl>

```sh
# Get google/gemma-3-27b-it inference status (warm)
~ curl -s https://huggingface.co/api/models/google/gemma-3-27b-it?expand[]=inference
{
"_id": "67c35b9bb236f0d365bf29d3",
"id": "google/gemma-3-27b-it",
"inference": "warm"
}
```
</curl>

<python>

In the `huggingface_hub`, use `model_info` with the expand parameter:

```py
>>> from huggingface_hub import model_info

>>> info = model_info("google/gemma-3-27b-it", expand="inference")
>>> info.inference
'warm'
```

</python>

</inferencesnippet>

Inference status is either "warm" or undefined:

<inferencesnippet>

<curl>

```sh
# Get inference status (not warm)
~ curl -s https://huggingface.co/api/models/manycore-research/SpatialLM-Llama-1B?expand[]=inference
{
"_id": "67d3b141d8b6e20c6d009c8b",
"id": "manycore-research/SpatialLM-Llama-1B"
}
```

</curl>

<python>

In the `huggingface_hub`, use `model_info` with the expand parameter:

```py
>>> from huggingface_hub import model_info

>>> info = model_info("manycore-research/SpatialLM-Llama-1B", expand="inference")
>>> info.inference_provider_mapping
None
```

</python>

</inferencesnippet>

## Get model providers

If you are interested by a specific model and want to check the list of providers serving it, you can request the `inferenceProviderMapping` attribute in the model info endpoint:

<inferencesnippet>

<curl>

```sh
# List google/gemma-3-27b-it providers
~ curl -s https://huggingface.co/api/models/google/gemma-3-27b-it?expand[]=inferenceProviderMapping
{
"_id": "67c35b9bb236f0d365bf29d3",
"id": "google/gemma-3-27b-it",
"inferenceProviderMapping": {
"hf-inference": {
"status": "live",
"providerId": "google/gemma-3-27b-it",
"task": "conversational"
},
"nebius": {
"status": "live",
"providerId": "google/gemma-3-27b-it-fast",
"task": "conversational"
}
}
}
```
</curl>

<python>

In the `huggingface_hub`, use `model_info` with the expand parameter:

```py
>>> from huggingface_hub import model_info

>>> info = model_info("google/gemma-3-27b-it", expand="inferenceProviderMapping")
>>> info.inference_provider_mapping
{
'hf-inference': InferenceProviderMapping(status='live', provider_id='google/gemma-3-27b-it', task='conversational'),
'nebius': InferenceProviderMapping(status='live', provider_id='google/gemma-3-27b-it-fast', task='conversational'),
}
```

</python>

</inferencesnippet>


Each provider serving the model shows a status (`staging` or `live`), the related task (here, `conversational`) and the providerId. In practice, this information is relevant for the JS and Python clients.
Loading