[Question]: longbench数据集测试对context和question的压缩 #163

Diana303068 · 2024-06-05T10:41:13Z

Describe the issue

longbench数据集中有的数据集的问题很长，仅把context压缩到2000token会超过4096个token的限制，请问这块是怎么处理的？

iofu728 · 2024-06-06T08:35:58Z

Hi @Diana303068, thanks for your support in LLMLingua.

In our experiments, if the prompt exceeds 4k tokens, we use the 16K API. You can also refer to the LongBench approach, which truncates the intermediate prompt.

def truncate_input(input: list, max_length: int, manner="middle"):
    if max_length < 0:
        return input
    if len(input) <= max_length:
        return input
    if manner == "middle":
        split = max_length // 2
        return input[0:split] + input[-split:]
    else:
        return None

Diana303068 · 2024-06-06T13:16:35Z

感谢回复。
那个我想请问一下llmlingua-2在longbench上进行测试few shot性能是在 "samsum"、 "trec"、 "triviaqa"、 "lsht"这几个测试数据集上的分数的平均吗？
还有我想问llmlingua-2在trec数据集上的分数是否有记录？

Diana303068 added the question Further information is requested label Jun 5, 2024

iofu728 self-assigned this Jun 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: longbench数据集测试对context和question的压缩 #163

[Question]: longbench数据集测试对context和question的压缩 #163

Diana303068 commented Jun 5, 2024

iofu728 commented Jun 6, 2024

Diana303068 commented Jun 6, 2024

[Question]: longbench数据集测试对context和question的压缩 #163

[Question]: longbench数据集测试对context和question的压缩 #163

Comments

Diana303068 commented Jun 5, 2024

Describe the issue

iofu728 commented Jun 6, 2024

Diana303068 commented Jun 6, 2024