Skip to content

Commit 075a266

Browse files
Update docs about inference endpoints (#432)
* Delete type and rename model in endpoint docs * Explain to pass either model_name or endpoint_name+reuse_existing * Fix legacy instance type and size in docs * Minor fix
1 parent c0966cf commit 075a266

File tree

3 files changed

+13
-12
lines changed

3 files changed

+13
-12
lines changed

docs/source/evaluate-the-model-on-a-server-or-container.mdx

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -26,22 +26,22 @@ __configuration file example:__
2626

2727
```yaml
2828
model:
29-
type: "endpoint"
3029
base_params:
31-
endpoint_name: "llama-2-7B-lighteval" # needs to be lower case without special characters
32-
model: "meta-llama/Llama-2-7b-hf"
30+
# Pass either model_name, or endpoint_name and true reuse_existing
31+
# endpoint_name: "llama-2-7B-lighteval" # needs to be lower case without special characters
32+
# reuse_existing: true # defaults to false; if true, ignore all params in instance, and don't delete the endpoint after evaluation
33+
model_name: "meta-llama/Llama-2-7b-hf"
3334
revision: "main"
3435
dtype: "float16" # can be any of "awq", "eetq", "gptq", "4bit' or "8bit" (will use bitsandbytes), "bfloat16" or "float16"
35-
reuse_existing: false # if true, ignore all params in instance, and don't delete the endpoint after evaluation
3636
instance:
3737
accelerator: "gpu"
3838
region: "eu-west-1"
3939
vendor: "aws"
40-
instance_size: "medium"
41-
instance_type: "g5.2xlarge"
40+
instance_type: "nvidia-a10g"
41+
instance_size: "x1"
4242
framework: "pytorch"
4343
endpoint_type: "protected"
44-
namespace: null # The namespace under which to launch the endopint. Defaults to the current user's namespace
44+
namespace: null # The namespace under which to launch the endpoint. Defaults to the current user's namespace
4545
image_url: null # Optionally specify the docker image to use when launching the endpoint model. E.g., launching models with later releases of the TGI container with support for newer models.
4646
env_vars:
4747
null # Optional environment variables to include when launching the endpoint. e.g., `MAX_INPUT_LENGTH: 2048`
@@ -58,7 +58,6 @@ __configuration file example:__
5858
5959
```yaml
6060
model:
61-
type: "tgi"
6261
instance:
6362
inference_server_address: ""
6463
inference_server_auth: null

examples/model_configs/endpoint_model.yaml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,17 @@
11
model:
22
base_params:
3-
model_name: "meta-llama/Llama-2-7b-hf" # the model name or the endpoint name if reuse_existing is true
3+
# Pass either model_name, or endpoint_name and true reuse_existing
4+
# endpoint_name: "llama-2-7B-lighteval" # needs to be lower case without special characters
5+
# reuse_existing: true # defaults to false; if true, ignore all params in instance, and don't delete the endpoint after evaluation
6+
model_name: "meta-llama/Llama-2-7b-hf"
47
revision: "main"
58
dtype: "float16" # can be any of "awq", "eetq", "gptq", "4bit' or "8bit" (will use bitsandbytes), "bfloat16" or "float16"
6-
reuse_existing: false # if true, ignore all params in instance, and don't delete the endpoint after evaluation
79
instance:
810
accelerator: "gpu"
911
region: "eu-west-1"
1012
vendor: "aws"
11-
instance_size: "x1"
1213
instance_type: "nvidia-a10g"
14+
instance_size: "x1"
1315
framework: "pytorch"
1416
endpoint_type: "protected"
1517
namespace: null # The namespace under which to launch the endopint. Defaults to the current user's namespace

src/lighteval/models/endpoints/endpoint_model.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ class InferenceEndpointModelConfig:
9595
endpoint_type: str = "protected"
9696
add_special_tokens: bool = True
9797
revision: str = "main"
98-
namespace: str = None # The namespace under which to launch the endopint. Defaults to the current user's namespace
98+
namespace: str = None # The namespace under which to launch the endpoint. Defaults to the current user's namespace
9999
image_url: str = None
100100
env_vars: dict = None
101101

0 commit comments

Comments
 (0)