Ability to retrieve the protocol response headers in `InferenceClient` #2281

fxmarty · 2024-05-14T13:31:20Z

As per title, it would be helpful to be able to retrieve the header as is possible with curl --include.

Example of a useful response header:

(base) felix@azure-amd-mi300-dev-01:~$ curl 0.0.0.0:80/generate -X POST -d '{"inputs":"Today I am in Paris and","parameters":{"max_new_tokens": 3, "details": true}}' -H 'Content-Type: application/json' --include
HTTP/1.1 200 OK
content-type: application/json
x-compute-type: gpu+optimized
x-compute-time: 0.111191439
x-compute-characters: 23
x-total-time: 111
x-validation-time: 0
x-queue-time: 0
x-inference-time: 110
x-time-per-token: 36
x-prompt-tokens: 7
x-generated-tokens: 3
content-length: 318
access-control-allow-origin: *
vary: origin
vary: access-control-request-method
vary: access-control-request-headers
date: Tue, 14 May 2024 12:57:59 GMT

The text was updated successfully, but these errors were encountered:

Wauplin · 2024-05-21T15:26:47Z

Hi @fxmarty, thanks for the feature request. Any suggestion on how this information should/could be returned in the current InferenceClient framework? Open to suggestions on that.

Wauplin · 2024-06-11T14:15:25Z

I'm closing this issue since no new details have been provided. @fxmarty Happy to reopen it if you want. Just let me know what would be your use case for such a feature so that we can figure out what's the best way of supporting it.

fxmarty · 2024-06-12T12:19:52Z

@Wauplin the feature enabled by this is tracking of the response time from TGI from the client.

With stats like

x-compute-type: gpu+optimized
x-compute-time: 0.111191439
x-compute-characters: 23
x-total-time: 111
x-validation-time: 0
x-queue-time: 0
x-inference-time: 110
x-time-per-token: 36
x-prompt-tokens: 7
x-generated-tokens: 3

For example, in https://huggingface.co/spaces/fxmarty/tgi-mi300-demo-chat/blob/main/app.py, I wanted to use client.text_generation and give these stats to the user as well, but I couldn't unless using the rest API myself. Note that TGI does not give theses stats in the generate_stream endpoint.

Wauplin · 2024-06-12T12:22:48Z

@fxmarty thanks for the explanation! Any suggestion on how you would like this information to be returned in the current InferenceClient framework?

fxmarty mentioned this issue May 14, 2024

text generation details not working when stream=False huggingface/text-generation-inference#1876

Closed

4 tasks

Wauplin closed this as completed Jun 11, 2024

Wauplin closed this as not planned Won't fix, can't repro, duplicate, stale Jun 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to retrieve the protocol response headers in `InferenceClient` #2281

Ability to retrieve the protocol response headers in `InferenceClient` #2281

fxmarty commented May 14, 2024

Wauplin commented May 21, 2024

Wauplin commented Jun 11, 2024

fxmarty commented Jun 12, 2024

Wauplin commented Jun 12, 2024

Ability to retrieve the protocol response headers in InferenceClient #2281

Ability to retrieve the protocol response headers in InferenceClient #2281

Comments

fxmarty commented May 14, 2024

Wauplin commented May 21, 2024

Wauplin commented Jun 11, 2024

fxmarty commented Jun 12, 2024

Wauplin commented Jun 12, 2024

Ability to retrieve the protocol response headers in `InferenceClient` #2281

Ability to retrieve the protocol response headers in `InferenceClient` #2281