Support tracable dynamicKVcache #36311

tugsbayasgalan · 2025-02-20T21:47:18Z

What does this PR do?

Recently, #35873 was landed to support static KV cache with export. As a side effect, it now also made it easy to support dynamic KV cache. Since all HF models accept KVCache as extra input, we can represent dynamic KV cache as a pytree container. Credit goes to @xadupre who attempted this approach in pytorch/pytorch#147326.

cc @guangy10 @IlyasMoutawwakil

Rocketknight1 · 2025-02-21T15:31:27Z

cc @gante @zucchini-nlp as well!

guangy10

@tugsbayasgalan This is great! I think it can apply to StaticCache as well, right? By making it pytree friendly, we will have the option to trace it as an input arg instead of forcing the cache to be wrapped up. Of course, to make HF model consumable with llama_runner out-of-the-box, we would still wrap it during exporting, but I think if we apply this to StaticCache, we will have the new option to allow users compose the cache through the IO during inference.

guangy10 · 2025-02-22T00:08:21Z

tests/utils/test_cache_utils.py

+        past_key_values = DynamicCache()
+        ep = torch.export.export(
+            model,
+            (),
+            {
+                "input_ids": input_ids,
+                "attention_mask": attention_mask,
+                "past_key_values": past_key_values,
+                "use_cache": True,
+            },
+            strict=False,
+        )


@tugsbayasgalan Curious how DynamicCache would work differently than StaticCache in the sense of export. When we export with DynamicCache, isn't the size of it specialized to a fixed number (its capacity)?

Yep it is specialized to number of layers. As a result, the input and output spec for DynamicKVCache is bit different as the former holds 0 tensors and latter holds 2*num_layers tensors. To properly enable dynamic shapes for the concat dimension, i think we need to initialize 0 size tensors at the start of DynamicCache.

tugsbayasgalan added 3 commits February 20, 2025 13:42

Support tracable dynamicKVcache

f87cb89

Fix lint

a03b1fa

More fine grained test

5157593

guangy10 approved these changes Feb 22, 2025

View reviewed changes

guangy10 reviewed Feb 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support tracable dynamicKVcache #36311

Support tracable dynamicKVcache #36311

tugsbayasgalan commented Feb 20, 2025 •

edited

Loading

Rocketknight1 commented Feb 21, 2025

guangy10 left a comment

guangy10 Feb 22, 2025

tugsbayasgalan Feb 23, 2025

Support tracable dynamicKVcache #36311

Are you sure you want to change the base?

Support tracable dynamicKVcache #36311

Conversation

tugsbayasgalan commented Feb 20, 2025 • edited Loading

What does this PR do?

Rocketknight1 commented Feb 21, 2025

guangy10 left a comment

Choose a reason for hiding this comment

guangy10 Feb 22, 2025

Choose a reason for hiding this comment

tugsbayasgalan Feb 23, 2025

Choose a reason for hiding this comment

tugsbayasgalan commented Feb 20, 2025 •

edited

Loading