[LMM] Implement merged multimodal processor for whisper #13278

Isotr0py · 2025-02-14T10:13:43Z

TODO

Fix profiling issue
Add processor test

Signed-off-by: isotr0py <[email protected]>

github-actions · 2025-02-14T10:13:55Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: isotr0py <[email protected]>

Signed-off-by: Isotr0py <[email protected]>

DarkLight1337 · 2025-02-18T10:48:27Z

vllm/model_executor/models/whisper.py

+        prompt: Union[str, list[int]],
+        mm_data: MultiModalDataDict,
+    ) -> Union[str, list[int]]:
+        return [0]


cc @ywang96 is it safe to change the encoder prompt given the change in vllm/multimodal/profiling.py?

vllm/model_executor/models/whisper.py

Co-authored-by: Cyrus Leung <[email protected]>

tests/models/multimodal/processing/test_common.py

Co-authored-by: Cyrus Leung <[email protected]>

DarkLight1337

LGTM. Just need @ywang96 to comment on the profiling changes.

DarkLight1337 · 2025-02-19T15:55:55Z

After this PR, can you update the model development docs with some info on how to implement merged multi-modal processor for encoder-decoder models?

Isotr0py · 2025-02-19T16:38:20Z

can you update the model development docs with some info on how to implement merged multi-modal processor for encoder-decoder models?

Yes, I will add a doc using whisper and florence-2 as e2e example respectively after this PR and #13320 merged, because there is also a minor change on EncDecMultimodalProcessor to fit text-only x-attn VLM in that PR. (That PR should be ready before the end of this weekend)

Signed-off-by: Isotr0py <[email protected]>

Isotr0py added 3 commits February 14, 2025 12:54

init mm_processor

5f67ca0

Signed-off-by: isotr0py <[email protected]>

Merge branch 'vllm-project:main' into whisper-processor

f63cd62

fix multimodal processor

7b88db2

Signed-off-by: isotr0py <[email protected]>

Isotr0py added 2 commits February 14, 2025 18:15

clean up

0165f91

Signed-off-by: isotr0py <[email protected]>

add processor test

5b61f53

Signed-off-by: isotr0py <[email protected]>

This was referenced Feb 14, 2025

[RFC]: Multi-modality Support on vLLM #4194

Open

[RFC]: Merge input processor and input mapper for multi-modal models #10114

Open

Isotr0py added 4 commits February 18, 2025 13:09

Merge branch 'vllm-project:main' into whisper-processor

5961261

fix profiling

951801b

Signed-off-by: Isotr0py <[email protected]>

cleanup model impl

64ff420

Signed-off-by: Isotr0py <[email protected]>

clean profiling messages

bedbbd4

Signed-off-by: Isotr0py <[email protected]>

Isotr0py marked this pull request as ready for review February 18, 2025 10:42

Isotr0py requested review from DarkLight1337 and ywang96 as code owners February 18, 2025 10:42

DarkLight1337 reviewed Feb 18, 2025

View reviewed changes

vllm/model_executor/models/whisper.py Outdated Show resolved Hide resolved

Isotr0py and others added 2 commits February 18, 2025 19:10

Update vllm/model_executor/models/whisper.py

981700f

Co-authored-by: Cyrus Leung <[email protected]>

Merge branch 'main' into whisper-processor

b4ff004

DarkLight1337 reviewed Feb 19, 2025

View reviewed changes

tests/models/multimodal/processing/test_common.py Outdated Show resolved Hide resolved

Update tests/models/multimodal/processing/test_common.py

12486f9

Co-authored-by: Cyrus Leung <[email protected]>

DarkLight1337 approved these changes Feb 19, 2025

View reviewed changes

code format

b36312a

Signed-off-by: Isotr0py <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LMM] Implement merged multimodal processor for whisper #13278

[LMM] Implement merged multimodal processor for whisper #13278

Isotr0py commented Feb 14, 2025 •

edited

Loading

github-actions bot commented Feb 14, 2025

DarkLight1337 Feb 18, 2025 •

edited

Loading

DarkLight1337 left a comment

DarkLight1337 commented Feb 19, 2025

Isotr0py commented Feb 19, 2025 •

edited

Loading

[LMM] Implement merged multimodal processor for whisper #13278

Are you sure you want to change the base?

[LMM] Implement merged multimodal processor for whisper #13278

Conversation

Isotr0py commented Feb 14, 2025 • edited Loading

github-actions bot commented Feb 14, 2025

DarkLight1337 Feb 18, 2025 • edited Loading

Choose a reason for hiding this comment

DarkLight1337 left a comment

Choose a reason for hiding this comment

DarkLight1337 commented Feb 19, 2025

Isotr0py commented Feb 19, 2025 • edited Loading

Isotr0py commented Feb 14, 2025 •

edited

Loading

DarkLight1337 Feb 18, 2025 •

edited

Loading

Isotr0py commented Feb 19, 2025 •

edited

Loading