You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@pzs19
I would like to reproduce and expand the end2end latency benchmark results of the LLMLingua-2 paper and was therefore wondering if you could provide more details on your experiment setup? Specifically:
Which target LLM was evaluated (and how was it set up, was vLLM or similar used?)
For the result in Table 5, which prompt length was used, what was the prompt?
Whats the definition of end2end latency? From the beginning of compression until the generation of the first token or until the full response is generated?
What was max_token set to, and did you enforce the generation of a minimum number of tokens?
Thanks a lot!
The text was updated successfully, but these errors were encountered:
cornzz
changed the title
[Question]: Reproduce end2end benchmarking of LLMLingua-2
[Question]: Reproduce end2end latency results of LLMLingua-2
Oct 23, 2024
@pzs19@iofu728 sorry, a follow up question: which LLM was used for compression in the end-to-end latency benchmark of the original LLMLingua paper? Under "Implementation Details" it says
In our experiments, we utilize either Alpaca-7B4 or GPT2-Alpaca as the small pre-trained language model M𝑠 for compression.
however, as far as I can see, it is not specified which of those two models was used for the end-to-end latency benchmark.
Actually it is not specified which compressor was used for the other benchmarks (gsm8k etc.) either, so that would be another question.
Describe the issue
@pzs19
I would like to reproduce and expand the end2end latency benchmark results of the LLMLingua-2 paper and was therefore wondering if you could provide more details on your experiment setup? Specifically:
max_token
set to, and did you enforce the generation of a minimum number of tokens?Thanks a lot!
The text was updated successfully, but these errors were encountered: