Open
Description
Feature request
The prometheus metrics exposed lack labels like model_name to filter or group by the model_name. This creates an issue when multiple models are deployed using TGI, causing all the metrics to get aggregated.
Motivation
vLLM runtime, for example provides model_name with LLM metrics:
vllm:time_per_output_token_seconds_bucket{le="0.01",model_name="/mnt/models"} 80.0
vllm:generation_tokens_total{model_name="/mnt/models"} 102.0
vllm:prompt_tokens_total{model_name="/mnt/models"} 231.0
Your contribution
We can submit a PR for the change requested.