huggingface · Wauplin · Apr 1, 2025 · Mar 26, 2025 · Mar 28, 2025 · Mar 28, 2025
diff --git a/docs/api-inference/_redirects.yml b/docs/api-inference/_redirects.yml
@@ -1,6 +1,6 @@
 quicktour: index
-detailed_parameters: parameters
-parallelism: getting_started
-usage: getting_started
+detailed_parameters: tasks/index
+parallelism: index
+usage: index
 faq: index
 rate-limits: pricing
diff --git a/docs/api-inference/_toctree.yml b/docs/api-inference/_toctree.yml
@@ -1,27 +1,33 @@
-- sections:
+- title: Get Started
+  sections:
   - local: index
-    title: Serverless Inference API
-  - local: getting-started
-    title: Getting Started
-  - local: supported-models
-    title: Supported Models
+    title: Inference Providers
   - local: pricing
-    title: Pricing and Rate limits
+    title: Pricing and Billing
+  - local: hub-integration
+    title: Hub integration
   - local: security
     title: Security
-  title: Getting Started
-- sections:
-  - local: parameters
-    title: Parameters
-  - sections:
-    - local: tasks/audio-classification
-      title: Audio Classification
-    - local: tasks/automatic-speech-recognition
-      title: Automatic Speech Recognition
+- title: API Reference
+  sections:
+  - local: tasks/index
+    title: Index
+  - local: hub-api
+    title: Hub API
+  - title: Popular Tasks
+    sections:
     - local: tasks/chat-completion
       title: Chat Completion
     - local: tasks/feature-extraction
       title: Feature Extraction
+    - local: tasks/text-to-image
+      title: Text to Image
+  - title: Other Tasks
+    sections:
+    - local: tasks/audio-classification
+      title: Audio Classification
+    - local: tasks/automatic-speech-recognition
+      title: Automatic Speech Recognition
     - local: tasks/fill-mask
       title: Fill Mask
     - local: tasks/image-classification
@@ -30,8 +36,6 @@
       title: Image Segmentation
     - local: tasks/image-to-image
       title: Image to Image
-    - local: tasks/image-text-to-text
-      title: Image-Text to Text
     - local: tasks/object-detection
       title: Object Detection
     - local: tasks/question-answering
@@ -44,13 +48,9 @@
       title: Text Classification
     - local: tasks/text-generation
       title: Text Generation
-    - local: tasks/text-to-image
-      title: Text to Image
     - local: tasks/token-classification
       title: Token Classification
     - local: tasks/translation
       title: Translation
     - local: tasks/zero-shot-classification
-      title: Zero Shot Classification
-    title: Detailed Task Parameters
-  title: API Reference
+      title: Zero Shot Classification
diff --git a/docs/api-inference/getting-started.md b/docs/api-inference/getting-started.md
diff --git a/docs/api-inference/hub-api.md b/docs/api-inference/hub-api.md
@@ -0,0 +1,173 @@
+# Hub API
+
+The Hub provides a few APIs to interact with Inference Providers. Here is a list of them:
+
+## List models
+
+To list models powered by a provider, use the `inference_provider` query parameter:
+
+```sh
+# List all models served by Fireworks AI
+~ curl -s https://huggingface.co/api/models?inference_provider=fireworks-ai | jq ".[].id"
+"deepseek-ai/DeepSeek-V3-0324"
+"deepseek-ai/DeepSeek-R1"
+"Qwen/QwQ-32B"
+"deepseek-ai/DeepSeek-V3"
+...
+```
+
+It can be combined with other filters to e.g. select only `text-to-image` models:
+
+```sh
+# List text-to-image models served by Fal AI
+~ curl -s https://huggingface.co/api/models?inference_provider=fal-ai&pipeline_tag=text-to-image | jq ".[].id"
+"black-forest-labs/FLUX.1-dev"
+"stabilityai/stable-diffusion-3.5-large"
+"black-forest-labs/FLUX.1-schnell"
+"stabilityai/stable-diffusion-3.5-large-turbo"
+...
+```
+
+Pass a comma-separated list of providers to select multiple:
+
+```sh
+# List image-text-to-text models served by Novita or Sambanova
+~ curl -s https://huggingface.co/api/models?inference_provider=sambanova,novita&pipeline_tag=image-text-to-text | jq ".[].id"
+"meta-llama/Llama-3.2-11B-Vision-Instruct"
+"meta-llama/Llama-3.2-90B-Vision-Instruct"
+"Qwen/Qwen2-VL-72B-Instruct"
+```
+
+Finally, you can select all models served by at least one inference provider:
+
+```sh
+# List text-to-video models served by any provider
+~ curl -s https://huggingface.co/api/models?inference_provider=all&pipeline_tag=text-to-video | jq ".[].id"
+"Wan-AI/Wan2.1-T2V-14B"
+"Lightricks/LTX-Video"
+"tencent/HunyuanVideo"
+"Wan-AI/Wan2.1-T2V-1.3B"
+"THUDM/CogVideoX-5b"
+"genmo/mochi-1-preview"
+"BagOu22/Lora_HKLPAZ"
+```
+
+## Get model status
+
+To find an inference provider for a specific model, request the `inference` attribute in the model info endpoint:
+
+<inferencesnippet>
+
+<curl>
+
+```sh
+# Get google/gemma-3-27b-it inference status (warm)
+~ curl -s https://huggingface.co/api/models/google/gemma-3-27b-it?expand[]=inference
+{
+"_id": "67c35b9bb236f0d365bf29d3",
+"id": "google/gemma-3-27b-it",
+"inference": "warm"
+}
+```
+</curl>
+
+<python>
+
+In the `huggingface_hub`, use `model_info` with the expand parameter:
+
+```py
+>>> from huggingface_hub import model_info
+
+>>> info = model_info("google/gemma-3-27b-it", expand="inference")
+>>> info.inference
+'warm'
+```
+
+</python>
+
+</inferencesnippet>
+
+Inference status is either "warm" or undefined:
+
+<inferencesnippet>
+
+<curl>
+
+```sh
+# Get inference status (not warm)
+~ curl -s https://huggingface.co/api/models/manycore-research/SpatialLM-Llama-1B?expand[]=inference
+{
+"_id": "67d3b141d8b6e20c6d009c8b",
+"id": "manycore-research/SpatialLM-Llama-1B"
+}
+```
+
+</curl>
+
+<python>
+
+In the `huggingface_hub`, use `model_info` with the expand parameter:
+
+```py
+>>> from huggingface_hub import model_info
+
+>>> info = model_info("manycore-research/SpatialLM-Llama-1B", expand="inference")
+>>> info.inference_provider_mapping
+None
+```
+
+</python>
+
+</inferencesnippet>
+
+## Get model providers
+
+If you are interested by a specific model and want to check the list of providers serving it, you can request the `inferenceProviderMapping` attribute in the model info endpoint:
+
+<inferencesnippet>
+
+<curl>
+
+```sh
+# List google/gemma-3-27b-it providers
+~ curl -s https://huggingface.co/api/models/google/gemma-3-27b-it?expand[]=inferenceProviderMapping
+{
+    "_id": "67c35b9bb236f0d365bf29d3",
+    "id": "google/gemma-3-27b-it",
+    "inferenceProviderMapping": {
+        "hf-inference": {
+            "status": "live",
+            "providerId": "google/gemma-3-27b-it",
+            "task": "conversational"
+        },
+        "nebius": {
+            "status": "live",
+            "providerId": "google/gemma-3-27b-it-fast",
+            "task": "conversational"
+        }
+    }
+}
+```
+</curl>
+
+<python>
+
+In the `huggingface_hub`, use `model_info` with the expand parameter:
+
+```py
+>>> from huggingface_hub import model_info
+
+>>> info = model_info("google/gemma-3-27b-it", expand="inferenceProviderMapping")
+>>> info.inference_provider_mapping
+{
+    'hf-inference': InferenceProviderMapping(status='live', provider_id='google/gemma-3-27b-it', task='conversational'),
+    'nebius': InferenceProviderMapping(status='live', provider_id='google/gemma-3-27b-it-fast', task='conversational'),
+}
+```
+
+</python>
+
+</inferencesnippet>
+
+
+Each provider serving the model shows a status (`staging` or `live`), the related task (here, `conversational`) and the providerId. In practice, this information is relevant for the JS and Python clients.