[Bug]: Prompts smaller than iterative_size are not compressed #196

cornzz · 2024-11-14T17:11:55Z

Describe the bug

This concerns only LLMLingua / LongLLMLingua.

As part of #195 I noticed prompts smaller than iterative_size were not being compressed. The iterative_size parameter is 200 by default, causing the algorithm to ignore all tokens at certain prompt lengths below 200.

To be exact, it effectively means the following by default:

Prompts below 66 tokens are not compressed at all
For prompts between 66 and 98 tokens, compression starts working incresingly
At 99 tokens, compression drops back to none and it starts increasing again as prompt length increases to 200

Token lengths here are those produced by the compression model tokenizer.

Not sure if this is a bug or intended behaviour @iofu728?
The exact behaviour can be seen in this graph, the link to it and an explanation is below.

--
(From #61 I understand that iterative_size determines the length of the segments $s \in S$ from Eq. (5)?)

Why this is likely a bug:

Assume a 50 token prompt. First, end is set to the length of the prompt (compressed_input_ids is the original prompt here).

LLMLingua/llmlingua/prompt_compressor.py

Line 1561 in 2dbdbd3

end = min(iterative_size + start, compressed_input_ids.shape[1])

In the get_compressed_input() call, the end parameter is set to end - iterative_size + delta_end (delta_end being the prompt length + 2 here), resulting in a value of -98.

LLMLingua/llmlingua/prompt_compressor.py

Lines 1724 to 1729 in 2dbdbd3

    
           ) = self.get_compressed_input( 
        
               loss, 
        
               compressed_input_ids, 
        
               compressed_attention_mask, 
        
               end - iterative_size + delta_end, 
        
               iterative_size=delta_end,

In get_compressed_input(), the need_idx[end:] = 1 operation then causes all tokens to be kept (since end is -98), ignoring the result of the thresholding (need_idx signifies which tokens should be kept).

LLMLingua/llmlingua/prompt_compressor.py

Lines 1424 to 1428 in 2dbdbd3

    
           else: 
        
               need_idx = torch.concat([loss > threshold, loss[:1] > 0]) 
        
           need_idx[end:] = 1 
        
           need_idx[: end - iterative_size] = 1 
        
           loss = loss[need_idx[:-1]]

Normally, the operations in lines 1426 and 1427 limit compression to the segment that is currently being processed in the Iterative Token-level Prompt Compression algorithm. As demonstrated, this breaks the algorithm when the prompt is smaller than iterative_size.

Graph explanation

The exact behaviour, how much of the prompt is considered for compression at different prompt lengths, can be seen here:
https://www.desmos.com/calculator/d7dbbqsdbv

The x-axis is prompt length, y-axis signifies how many tokens of the prompt are considered for compression.

The $i$ variable is iterative_size
$s(x)$ is the initial value of end in iterative_compress_prompt()
$d$ is delta_end (and iterative_size inside get_compressed_input())
$f(x)$ is the end parameter in get_compressed_input()
$g(x)$ yields the number of tokens considered for compression after need_idx[end:] = 1

Only $n(x)$ is displayed here.

Steps to reproduce

You can reproduce this using the official LLMLingua demo, trying to compress the following context of length 100 with target_token set to -1 and ratio to 0.5 (question and instruction left empty):

This report provides background information and issues for Congress regarding China's actions in the South China Sea (SCS) and East China Sea (ECS), with a focus on implications for U.S. strategic and policy interests. Other CRS reports focus on other aspects of maritime territorial disputes involving China. The issue for Congress is how the United States should respond to China's actions in the SCS and ECS—particularly China's island-building and base-construction activities in the Spratly

The actual compression ratio will be 1.0x.
If you now remove the last word, "Spratly", suddenly compression works and the result is 2x compressed.
This is because now the token count dropped to 98 where, as previously mentioned, compression fully works. If you further reduce the number of words, the compression ratio will again fall, reaching 1.0x around 66 tokens prompt length.

How to fix:

I suppose a possible fix would be adding the following at the beginning of get_compressed_input():

        if end < iterative_size:
            end = iterative_size

This way, prompts shorter than iterative_size are still compressed. I don't think this introduces sideeffects for other cases, as end shouldn't be smaller than iterative_size other than in this specific case.

This screenshot shows the behaviour with this fix, $n(x)$ being the new end:

https://www.desmos.com/calculator/69cm0iasqz

Semi-related (bug?)

This line

LLMLingua/llmlingua/prompt_compressor.py

Line 1586 in 2dbdbd3

while end <= compressed_input_ids.shape[1]:

means that there will be a remaining segment at the end of a prompt that will be ignored in the compression, if prompt length is not divisible by iterative_size. This is because end is incremented by iterative_size after each iteration and if the size of the remaining segment is smaller than iterative_size it will be ignored.

An example for a prompt of length 500, rate 0.5 (at least 3 iterations would be needed to process all tokens):

First iteration: the first 200 tokens are compressed to 100 tokens, the prompt length is now 400; the get_compressed_input() call sets end to 100, which is then incremented to 300 in line 1742.
Second iteration: the next 200 tokens are compressed to 100 tokens, the prompt length is now 300; the get_compressed_input() call sets end to 200, which is then incremented to 400.
Third iteration: there is no third iteration since the prompt length is 300 which is smaller than end, which is now 400. Therefore the last 100 tokens are not processed and left uncompressed.

In this case, 20% of the prompt are ignored completely. This graph shows how the ignored percentage of the prompt changes with size https://www.desmos.com/calculator/8kohofzyb5

The effect of this can also be seen in the results of #195 where the achieved compression ratio of prompts between 250 and 750 tokens deviates quite a bit from the expected ratio, presumably because significant portions of the original prompts were ignored in the compression process:

Of course this can be diminished by setting a smaller iterative_size, but even with the default value there should be a way to process the remaining tokens at the end of the prompt?

I don't have a solution here, as the algorithm breaks if you simply do one more iteration and I don't have time to look into a proper solution...

Expected Behavior

Prompts should be compressed even when smaller than iterative_size

Logs

No response

Additional Information

No response

The text was updated successfully, but these errors were encountered:

…pressed (microsoft#196)

cornzz added the bug Something isn't working label Nov 14, 2024

cornzz mentioned this issue Nov 14, 2024

[Question]: Achieved compression rate with (Long)LLMLingua not meeting expectations? #195

Closed

cornzz changed the title ~~[Bug]: Prompts smaller than iterative_size are not compressed at all~~ [Bug]: Prompts smaller than iterative_size are not compressed Nov 14, 2024

cornzz added a commit to cornzz/LLMLingua that referenced this issue Nov 14, 2024

Fix(LLMLingua): fix prompts smaller than iterative_size not being com…

48b3f71

…pressed (microsoft#196)

cornzz linked a pull request Nov 14, 2024 that will close this issue

Fix prompts smaller than iterative_size not being compressed #198

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Prompts smaller than iterative_size are not compressed #196

[Bug]: Prompts smaller than iterative_size are not compressed #196

cornzz commented Nov 14, 2024 •

edited

Loading

[Bug]: Prompts smaller than iterative_size are not compressed #196

[Bug]: Prompts smaller than iterative_size are not compressed #196

Comments

cornzz commented Nov 14, 2024 • edited Loading

Describe the bug

Why this is likely a bug:

Graph explanation

Steps to reproduce

How to fix:

Semi-related (bug?)

Expected Behavior

Logs

Additional Information

cornzz commented Nov 14, 2024 •

edited

Loading