Haocheng lu2 #37350

LuHC409 · 2025-04-07T17:19:38Z

What does this PR do?

This PR fixes an issue in the _preprocess function of the Qwen2VLImageProcessor class, located in:
transformers/src/transformers/models/qwen2_vl/image_processing_qwen2_vl.py
Previously, when the number of patches was not divisible by temporal_patch_size, the code mistakenly repeated the last patch temporal_patch_size - 1 times, which could cause overshooting. This PR corrects the padding logic by computing the exact number of repeats needed:
pad_len = temporal_patch_size - (patches.shape[0] % temporal_patch_size)
repeats = np.repeat(patches[-1][np.newaxis], pad_len, axis=0)
Motivation and context
This change ensures that the total number of temporal patches is always divisible by temporal_patch_size, without introducing unnecessary extra patches. It avoids shape mismatch or over-padding problems in the later reshape steps.

Local Testing
✅ I have tested this change locally and confirmed that all tests pass.

Fixes # (issue)
#37064

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
[yes ] Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

github-actions · 2025-04-07T17:19:49Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

qubvel

Thanks, looks good to me!

zucchini-nlp

Thanks! Can you also update the test here, so it checks that video processing works for any input shape?

Existing test:

transformers/tests/models/qwen2_vl/test_image_processing_qwen2_vl.py

Lines 277 to 288 in e2b0224

    
           def test_video_inputs(self): 
        
               for image_processing_class in self.image_processor_list: 
        
                   image_processing = image_processing_class(**self.image_processor_dict) 
        
                   expected_dims_by_frames = {1: 34300, 2: 34300, 3: 68600, 4: 68600, 5: 102900, 6: 102900} 
        
                   for num_frames, expected_dims in expected_dims_by_frames.items(): 
        
                       image_processor_tester = Qwen2VLImageProcessingTester(self, num_frames=num_frames) 
        
                       video_inputs = image_processor_tester.prepare_video_inputs(equal_resolution=True) 
        
                       prcocess_out = image_processing(None, videos=video_inputs, return_tensors="pt") 
        
                       encoded_video = prcocess_out.pixel_values_videos 
        
                       expected_output_video_shape = (expected_dims, 1176) 
        
                       self.assertEqual(tuple(encoded_video.shape), expected_output_video_shape)

HuggingFaceDocBuilderDev · 2025-04-07T18:45:27Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ailunc and others added 5 commits March 17, 2025 13:21

rewrite main method in Qwen2, making it more clear

7099188

Update broken link

363636f

Fix temporal patch padding logic in _preprocess of Qwen2VLImageProcessor

878cefb

Fix temporal patch padding logic in _preprocess of Qwen2VLImageProcessor

6411e2a

Fix temporal patch padding logic in _preprocess function

b8e1e1c

github-actions bot marked this pull request as draft April 7, 2025 17:19

Merge branch 'main' into HaochengLu2

e4d6801

LuHC409 marked this pull request as ready for review April 7, 2025 17:20

github-actions bot requested review from molbap and qubvel April 7, 2025 17:20

qubvel approved these changes Apr 7, 2025

View reviewed changes

zucchini-nlp reviewed Apr 7, 2025

View reviewed changes

JJJYmmm mentioned this pull request Apr 9, 2025

Temporal padding issue when temporal_patch_size != 2 QwenLM/Qwen2.5-VL#1027

Closed

zucchini-nlp mentioned this pull request May 8, 2025

Potential bug in Qwen 2/2.5 VL Image Preprocessor #38003

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Haocheng lu2 #37350

Haocheng lu2 #37350

LuHC409 commented Apr 7, 2025

Uh oh!

github-actions bot commented Apr 7, 2025

Uh oh!

qubvel left a comment •

edited

Loading

Uh oh!

zucchini-nlp left a comment

Uh oh!

HuggingFaceDocBuilderDev commented Apr 7, 2025

Uh oh!

Uh oh!

	def test_video_inputs(self):
	for image_processing_class in self.image_processor_list:
	image_processing = image_processing_class(**self.image_processor_dict)
	expected_dims_by_frames = {1: 34300, 2: 34300, 3: 68600, 4: 68600, 5: 102900, 6: 102900}

	for num_frames, expected_dims in expected_dims_by_frames.items():
	image_processor_tester = Qwen2VLImageProcessingTester(self, num_frames=num_frames)
	video_inputs = image_processor_tester.prepare_video_inputs(equal_resolution=True)
	prcocess_out = image_processing(None, videos=video_inputs, return_tensors="pt")
	encoded_video = prcocess_out.pixel_values_videos
	expected_output_video_shape = (expected_dims, 1176)
	self.assertEqual(tuple(encoded_video.shape), expected_output_video_shape)

Haocheng lu2 #37350

Are you sure you want to change the base?

Haocheng lu2 #37350

Conversation

LuHC409 commented Apr 7, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

github-actions bot commented Apr 7, 2025

Uh oh!

qubvel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Apr 7, 2025

Uh oh!

Uh oh!

qubvel left a comment •

edited

Loading