Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possibly use MLX for MacOS users with WhisperSpeech #111

Open
BBC-Esq opened this issue Feb 20, 2024 · 19 comments
Open

possibly use MLX for MacOS users with WhisperSpeech #111

BBC-Esq opened this issue Feb 20, 2024 · 19 comments

Comments

@BBC-Esq
Copy link
Contributor

BBC-Esq commented Feb 20, 2024

The purpose is the discuss possibly implementing MLX support for MacOS users. For example, currently Pytorch doesn't support the FFT operation whereas MLX does. This means that WhisperSpeech must put certain models and/or tensors etc. on CPU for MacOs users...whereas CUDA users have complete speedup.

Possibly Compatible with the operator WhisperSpeech uses

image

Option 1 - implement MLX just where MPS can't be used
Option 2 - completely replace MPS with MLX
Option 3 - replace MPS with MLX as much as possible based on the multiple models involved in WhisperSpeech and whether each specifically can be run with MLX.
Option 4 - offer MLX IN ADDITION to MPS for all MacOs users.

In general, MLX provides a 2-3x speedup compared to MPS across the board in most cases.

Here are some snippets from the Medium article:

Benchmark Setup

image

Linear Layer

image

Softmax

image

Sigmoid

image

Concatenation

image

Binary Cross Entropy

image

Sort

image

Conv2D

image

Unified Memory Gamechanger

image

MLX:

https://github.com/ml-explore/mlx

MLX Examples:

https://github.com/ml-explore/mlx-examples/tree/main/llms/llama

MLX Community:

https://huggingface.co/mlx-community

MLX Bark:

https://huggingface.co/mlx-community/mlx_bark (would beat all current implementations in WhisperSpeech currently, in MPS as far as speed that is)

Sample MLX Whisper Script:

https://github.com/ml-explore/mlx-examples/blob/main/whisper/whisper/transcribe.py

Example MLX Whisper model:

https://huggingface.co/mlx-community/whisper-large-v2-mlx

@signalprime
Copy link
Contributor

Great initiative @BBC-Esq ! I'll definitely circle back to this one as soon as possible

@BBC-Esq
Copy link
Contributor Author

BBC-Esq commented Feb 20, 2024

@signalprime It would take someone with more programming experience than me to implement, especially since I don't own a Mac, but thought I'd start the discussion anyways. Interested as always in what you find out.

@BBC-Esq
Copy link
Contributor Author

BBC-Esq commented Feb 22, 2024

UPDATE: Looks like Pytorch might be getting support sooner than later...

pytorch/pytorch@53bfae2

@signalprime
Copy link
Contributor

I'm definitely looking into it. Reviewing the Vocos model today

@BBC-Esq
Copy link
Contributor Author

BBC-Esq commented Feb 24, 2024

I'm definitely looking into it. Reviewing the Vocos model today

I'd love to learn if you want to keep me posted and teach me along the way, just FYI. This is not my profession but a hobby.

@signalprime
Copy link
Contributor

signalprime commented Feb 25, 2024

Absolutely @BBC-Esq, I will keep you in the loop about it. MLX mimics the pytorch API in most ways. I've been building models since before we had frameworks like TF and Torch, and in this case I'll be rebuilding the Vocos model using the MLX library. It just depends on time constraints.

I recently finished a long project with ML/RL in the finance domain and put in an application with Collabora last week. Would you put in a nice word for me @jpc?

@signalprime
Copy link
Contributor

signalprime commented Feb 25, 2024

I'm getting closer.. almost reached the end of the hole. We have a standard whisper model for MLX already established.

I was able to convert the Vocos model and weights to MLX, however ran into many issues with its feature extractor. MLX doesn't have weight_norm established yet. I've dug into the code, and debating when I have time to add the _weight_norm primitive to the C++ MLX library

https://github.com/pytorch/pytorch/blob/834c7a1d3ea07878ad87d127ee28606fc140b552/aten/src/ATen/native/WeightNorm.cpp#L50

I'd like to do a little more research before trying that because it could perhaps be handled another way, or not needed at all, kinda like a quick initial pass-through. I removed those references and there are some other issue, kinda out of energy for this today.

@BBC-Esq
Copy link
Contributor Author

BBC-Esq commented Feb 25, 2024

Interesting...

@signalprime
Copy link
Contributor

Good thing I waited. I got a response that it should be possible using existing ops.

Here is the whisper model in MLX format, which is used during voice cloning.

I was working with MLX conversions for all the parts of the Vocos model. Transferring weights wasn't an issue, but components used in the functions also likely need to be updated. I'm still becoming familiar, but it seems parts can be mixed and matched.. as in a tensor can be converted to an MX array and passed to a MX component and back to a tensor later. That would appear necessary since I wouldn't want to keep going further and further into torchaudio for example. Ideally we just put a replacement for the components where torch doesn't yet support the ops.

@BBC-Esq
Copy link
Contributor Author

BBC-Esq commented Feb 25, 2024

Good thing I waited. I got a response that it should be possible using existing ops.

Here is the whisper model in MLX format, which is used during voice cloning.

I was working with MLX conversions for all the parts of the Vocos model. Transferring weights wasn't an issue, but components used in the functions also likely need to be updated. I'm still becoming familiar, but it seems parts can be mixed and matched.. as in a tensor can be converted to an MX array and passed to a MX component and back to a tensor later. That would appear necessary since I wouldn't want to keep going further and further into torchaudio for example. Ideally we just put a replacement for the components where torch doesn't yet support the ops.

That's what my intuition was telling me based on what I read about MLX, but I am far from an expert and would have no way to verify it. My initial hypothesis was that it might be possible to use MLX for some (but not all) of the necessary operations (or whatever you call them), kind of mix and match like you were saying. Math is math...but again, this it totally a notice intuition kind of thing.

Let me know if I can help out any...

@BBC-Esq
Copy link
Contributor Author

BBC-Esq commented Feb 27, 2024

Not sure if it's relevant, but apparently aten::Lupsample_linear1d has been implemented on pytorch's working version (not included in a release yet though):

pytorch/pytorch#116630 (comment)

@BBC-Esq
Copy link
Contributor Author

BBC-Esq commented Feb 29, 2024

@signalprime how's it going? Any updates?

@signalprime
Copy link
Contributor

Hi @BBC-Esq I haven't had an opportunity to resume work on this unfortunately, my friend

@BBC-Esq
Copy link
Contributor Author

BBC-Esq commented Mar 5, 2024

Hey @signalprime I hope you don't stop working on this kind of stuff even if you don't get the job with Collabora. I enjoy working with ya and look forward to improving this all-around kick ass library. Just throwing that out there!

@signalprime
Copy link
Contributor

Likewise @BBC-Esq, I'll keep it on my mind and make time to return to the effort. Definitely not related to Collabora, rather the launch of another project, meetings, and the occasional things that pull us away from our desks. At the next go, I'll try the mixed approach where rather than converting everything to MLX we just use MLX ops for those times where coverage is still missing in torch. If that works it should keep things more simple. I've been spending a lot of time working with autonomous agents, and giving them a good voice, using whatever style we prefer, is an important feature.

@jpc
Copy link
Contributor

jpc commented Mar 6, 2024

@signalprime Sure, I'll see what I can do :)

@jpc
Copy link
Contributor

jpc commented Mar 13, 2024

@signalprime Btw. do you have a Discord? Maybe we could have a chat there?

@signalprime
Copy link
Contributor

@jpc yes absolutely, I sent you an email with details. Looking forward to it!

@touhi99
Copy link

touhi99 commented May 6, 2024

is it still working with MPS? i couldn't make it run the current main branch, its use CPU only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants