[Question]: Reproduce end2end latency results of LLMLingua-2 #193

cornzz · 2024-10-23T12:51:00Z

Describe the issue

@pzs19
I would like to reproduce and expand the end2end latency benchmark results of the LLMLingua-2 paper and was therefore wondering if you could provide more details on your experiment setup? Specifically:

Which target LLM was evaluated (and how was it set up, was vLLM or similar used?)
For the result in Table 5, which prompt length was used, what was the prompt?
Whats the definition of end2end latency? From the beginning of compression until the generation of the first token or until the full response is generated?
What was max_token set to, and did you enforce the generation of a minimum number of tokens?

Thanks a lot!

The text was updated successfully, but these errors were encountered:

pzs19 · 2024-11-11T07:48:59Z

Thank you for raising the questions. There is point to point response:

The target LLM is GPT-3.5-Turbo-0613, so vllm is not used.
The latency experiment is conducted on the summarization task of MeetingBank, the prompt follows the main experiment.
End2end latency counts from the beginning of compression until the full response is generated.
We set the "max_token" to 400, following the main experiment.

cornzz · 2024-11-11T11:03:54Z

Thank you very much! 🙂

cornzz · 2024-11-13T11:04:33Z

@pzs19 @iofu728 sorry, a follow up question: which LLM was used for compression in the end-to-end latency benchmark of the original LLMLingua paper? Under "Implementation Details" it says

In our experiments, we utilize either Alpaca-7B4 or GPT2-Alpaca as the small pre-trained language model M𝑠 for compression.

however, as far as I can see, it is not specified which of those two models was used for the end-to-end latency benchmark.
Actually it is not specified which compressor was used for the other benchmarks (gsm8k etc.) either, so that would be another question.

cornzz added the question Further information is requested label Oct 23, 2024

cornzz changed the title ~~[Question]: Reproduce end2end benchmarking of LLMLingua-2~~ [Question]: Reproduce end2end latency results of LLMLingua-2 Oct 23, 2024

cornzz closed this as completed Nov 11, 2024

cornzz reopened this Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: Reproduce end2end latency results of LLMLingua-2 #193

[Question]: Reproduce end2end latency results of LLMLingua-2 #193

cornzz commented Oct 23, 2024

pzs19 commented Nov 11, 2024

cornzz commented Nov 11, 2024

cornzz commented Nov 13, 2024 •

edited

Loading

[Question]: Reproduce end2end latency results of LLMLingua-2 #193

[Question]: Reproduce end2end latency results of LLMLingua-2 #193

Comments

cornzz commented Oct 23, 2024

Describe the issue

pzs19 commented Nov 11, 2024

cornzz commented Nov 11, 2024

cornzz commented Nov 13, 2024 • edited Loading

cornzz commented Nov 13, 2024 •

edited

Loading