Hello,
I tried following your example shown here https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LLaVA-NeXT-Video/Fine_tune_LLaVa_NeXT_Video_with_HFTrainer.ipynb
Without changing a line of code, the tutorial currently emits the following error: ValueError: Video features and video tokens do not match: tokens: 1004, features 4608
I google'd around but didn't find if anyone has identified a clean solution for this yet. Wondering if you've had any luck
Thanks