possibly use MLX for MacOS users with WhisperSpeech #111

BBC-Esq · 2024-02-20T14:29:05Z

The purpose is the discuss possibly implementing MLX support for MacOS users. For example, currently Pytorch doesn't support the FFT operation whereas MLX does. This means that WhisperSpeech must put certain models and/or tensors etc. on CPU for MacOs users...whereas CUDA users have complete speedup.

Possibly Compatible with the operator WhisperSpeech uses

Option 1 - implement MLX just where MPS can't be used
Option 2 - completely replace MPS with MLX
Option 3 - replace MPS with MLX as much as possible based on the multiple models involved in WhisperSpeech and whether each specifically can be run with MLX.
Option 4 - offer MLX IN ADDITION to MPS for all MacOs users.

In general, MLX provides a 2-3x speedup compared to MPS across the board in most cases.

Here are some snippets from the Medium article:

Benchmark Setup

Linear Layer

Softmax

Sigmoid

Concatenation

Binary Cross Entropy

Sort

Conv2D

Unified Memory Gamechanger

MLX:

https://github.com/ml-explore/mlx

MLX Examples:

https://github.com/ml-explore/mlx-examples/tree/main/llms/llama

MLX Community:

https://huggingface.co/mlx-community

MLX Bark:

https://huggingface.co/mlx-community/mlx_bark (would beat all current implementations in WhisperSpeech currently, in MPS as far as speed that is)

Sample MLX Whisper Script:

https://github.com/ml-explore/mlx-examples/blob/main/whisper/whisper/transcribe.py

Example MLX Whisper model:

https://huggingface.co/mlx-community/whisper-large-v2-mlx

signalprime · 2024-02-20T14:47:09Z

Great initiative @BBC-Esq ! I'll definitely circle back to this one as soon as possible

BBC-Esq · 2024-02-20T14:49:25Z

@signalprime It would take someone with more programming experience than me to implement, especially since I don't own a Mac, but thought I'd start the discussion anyways. Interested as always in what you find out.

BBC-Esq · 2024-02-22T15:55:37Z

UPDATE: Looks like Pytorch might be getting support sooner than later...

pytorch/pytorch@53bfae2

signalprime · 2024-02-24T20:31:50Z

I'm definitely looking into it. Reviewing the Vocos model today

BBC-Esq · 2024-02-24T20:34:54Z

I'm definitely looking into it. Reviewing the Vocos model today

I'd love to learn if you want to keep me posted and teach me along the way, just FYI. This is not my profession but a hobby.

signalprime · 2024-02-25T00:56:38Z

Absolutely @BBC-Esq, I will keep you in the loop about it. MLX mimics the pytorch API in most ways. I've been building models since before we had frameworks like TF and Torch, and in this case I'll be rebuilding the Vocos model using the MLX library. It just depends on time constraints.

I recently finished a long project with ML/RL in the finance domain and put in an application with Collabora last week. Would you put in a nice word for me @jpc?

signalprime · 2024-02-25T05:46:15Z

I'm getting closer.. almost reached the end of the hole. We have a standard whisper model for MLX already established.

I was able to convert the Vocos model and weights to MLX, however ran into many issues with its feature extractor. MLX doesn't have weight_norm established yet. I've dug into the code, and debating when I have time to add the _weight_norm primitive to the C++ MLX library

https://github.com/pytorch/pytorch/blob/834c7a1d3ea07878ad87d127ee28606fc140b552/aten/src/ATen/native/WeightNorm.cpp#L50

I'd like to do a little more research before trying that because it could perhaps be handled another way, or not needed at all, kinda like a quick initial pass-through. I removed those references and there are some other issue, kinda out of energy for this today.

BBC-Esq · 2024-02-25T10:10:44Z

Interesting...

signalprime · 2024-02-25T18:43:27Z

Good thing I waited. I got a response that it should be possible using existing ops.

Here is the whisper model in MLX format, which is used during voice cloning.

I was working with MLX conversions for all the parts of the Vocos model. Transferring weights wasn't an issue, but components used in the functions also likely need to be updated. I'm still becoming familiar, but it seems parts can be mixed and matched.. as in a tensor can be converted to an MX array and passed to a MX component and back to a tensor later. That would appear necessary since I wouldn't want to keep going further and further into torchaudio for example. Ideally we just put a replacement for the components where torch doesn't yet support the ops.

BBC-Esq · 2024-02-25T20:56:25Z

Good thing I waited. I got a response that it should be possible using existing ops.

Here is the whisper model in MLX format, which is used during voice cloning.

I was working with MLX conversions for all the parts of the Vocos model. Transferring weights wasn't an issue, but components used in the functions also likely need to be updated. I'm still becoming familiar, but it seems parts can be mixed and matched.. as in a tensor can be converted to an MX array and passed to a MX component and back to a tensor later. That would appear necessary since I wouldn't want to keep going further and further into torchaudio for example. Ideally we just put a replacement for the components where torch doesn't yet support the ops.

That's what my intuition was telling me based on what I read about MLX, but I am far from an expert and would have no way to verify it. My initial hypothesis was that it might be possible to use MLX for some (but not all) of the necessary operations (or whatever you call them), kind of mix and match like you were saying. Math is math...but again, this it totally a notice intuition kind of thing.

Let me know if I can help out any...

BBC-Esq · 2024-02-27T13:54:12Z

Not sure if it's relevant, but apparently aten::Lupsample_linear1d has been implemented on pytorch's working version (not included in a release yet though):

pytorch/pytorch#116630 (comment)

BBC-Esq · 2024-02-29T13:48:44Z

@signalprime how's it going? Any updates?

signalprime · 2024-03-05T17:57:27Z

Hi @BBC-Esq I haven't had an opportunity to resume work on this unfortunately, my friend

BBC-Esq · 2024-03-05T18:04:18Z

Hey @signalprime I hope you don't stop working on this kind of stuff even if you don't get the job with Collabora. I enjoy working with ya and look forward to improving this all-around kick ass library. Just throwing that out there!

signalprime · 2024-03-05T18:18:46Z

Likewise @BBC-Esq, I'll keep it on my mind and make time to return to the effort. Definitely not related to Collabora, rather the launch of another project, meetings, and the occasional things that pull us away from our desks. At the next go, I'll try the mixed approach where rather than converting everything to MLX we just use MLX ops for those times where coverage is still missing in torch. If that works it should keep things more simple. I've been spending a lot of time working with autonomous agents, and giving them a good voice, using whatever style we prefer, is an important feature.

jpc · 2024-03-06T08:35:42Z

@signalprime Sure, I'll see what I can do :)

jpc · 2024-03-13T16:28:27Z

@signalprime Btw. do you have a Discord? Maybe we could have a chat there?

signalprime · 2024-03-14T17:24:02Z

@jpc yes absolutely, I sent you an email with details. Looking forward to it!

touhi99 · 2024-05-06T09:49:53Z

is it still working with MPS? i couldn't make it run the current main branch, its use CPU only.

BBC-Esq mentioned this issue Feb 20, 2024

CPU + MPS Support #56

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

possibly use MLX for MacOS users with WhisperSpeech #111

possibly use MLX for MacOS users with WhisperSpeech #111

BBC-Esq commented Feb 20, 2024 •

edited

signalprime commented Feb 20, 2024

BBC-Esq commented Feb 20, 2024

BBC-Esq commented Feb 22, 2024

signalprime commented Feb 24, 2024

BBC-Esq commented Feb 24, 2024

signalprime commented Feb 25, 2024 •

edited

signalprime commented Feb 25, 2024 •

edited

BBC-Esq commented Feb 25, 2024

signalprime commented Feb 25, 2024

BBC-Esq commented Feb 25, 2024 •

edited

BBC-Esq commented Feb 27, 2024 •

edited

BBC-Esq commented Feb 29, 2024

signalprime commented Mar 5, 2024

BBC-Esq commented Mar 5, 2024

signalprime commented Mar 5, 2024

jpc commented Mar 6, 2024

jpc commented Mar 13, 2024

signalprime commented Mar 14, 2024

touhi99 commented May 6, 2024

possibly use MLX for MacOS users with WhisperSpeech #111

possibly use MLX for MacOS users with WhisperSpeech #111

Comments

BBC-Esq commented Feb 20, 2024 • edited

signalprime commented Feb 20, 2024

BBC-Esq commented Feb 20, 2024

BBC-Esq commented Feb 22, 2024

signalprime commented Feb 24, 2024

BBC-Esq commented Feb 24, 2024

signalprime commented Feb 25, 2024 • edited

signalprime commented Feb 25, 2024 • edited

BBC-Esq commented Feb 25, 2024

signalprime commented Feb 25, 2024

BBC-Esq commented Feb 25, 2024 • edited

BBC-Esq commented Feb 27, 2024 • edited

BBC-Esq commented Feb 29, 2024

signalprime commented Mar 5, 2024

BBC-Esq commented Mar 5, 2024

signalprime commented Mar 5, 2024

jpc commented Mar 6, 2024

jpc commented Mar 13, 2024

signalprime commented Mar 14, 2024

touhi99 commented May 6, 2024

BBC-Esq commented Feb 20, 2024 •

edited

signalprime commented Feb 25, 2024 •

edited

signalprime commented Feb 25, 2024 •

edited

BBC-Esq commented Feb 25, 2024 •

edited

BBC-Esq commented Feb 27, 2024 •

edited