Skip to content

Adds support for MacOS Arm CPU builds #15

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 27, 2024

Conversation

baileyheading
Copy link
Contributor

@baileyheading baileyheading commented Jul 14, 2024

  • Modification to main.cpp as per discussion EXC_BAD_ACCESS on run on macOS CPU only #12
  • Changes to CMakeLists to include metal (work in-progress) and to metal logging function import (commented out as not available, not sure where it's from)
  • New build instructions for mac os metal (does not fully build yet) -> to be reorganised as desired

@baileyheading baileyheading changed the title first commit of macos metal branch Beginning of work on mac os metal Jul 14, 2024
@balisujohn
Copy link
Owner

balisujohn commented Jul 14, 2024

Thanks for your contribution; lmk once this compiles on metal (op not implemented errors are fine), and compiles and passes tests on cpu on MacOS (ideally for both intel and arm), and I'll verify on Linux and merge.

Since it looks like the first missing op is in the diffusion model, you should be able to run the autoregressive test on metal, which might be of interest.

@baileyheading
Copy link
Contributor Author

Thanks for your contribution; lmk once this compiles on metal (op not implemented errors are fine), and compiles and passes tests on cpu on MacOS (ideally for both intel and arm), and I'll verify on Linux and merge.

Since it looks like the first missing op is in the diffusion model, you should be able to run the autoregressive test on metal, which might be of interest.

I typed the wrong wrong thing somehow. It compiles on metal! it just doesn't run tortoise yet because of the missing ops. You could probably tell this based on my progress in the issue. I'll confirm it runs on cpu for mac intel and mac arm yes. I'l give these tests a go to.

As for adding the missing ops, I found the files very hard to navigate and was getting confused with how things were referenced in the main.cpp versus in the ggml code. I would like to at least establish which ones are missing and how to put in the skeleton code for them so I can compile and test them + build an itemised list so that it's easier for others to see what needs to be done if they want to chip in / take over

@balisujohn
Copy link
Owner

balisujohn commented Jul 15, 2024

Sounds good to me. Worth noting that all remaining changes for metal support should be in the ggml submodule and you shouldn't need to modify main.cpp at all beyond the changes already in this PR.

@baileyheading
Copy link
Contributor Author

baileyheading commented Jul 15, 2024

The main confusion I had was working out the exact name of the operation. The message it gave back for:

cur = ggml_upscale_ext(ctx0, cur, output_sequence_length, 1024, 1, 1);

was that the operation was not supported, but metal appeared to have an upscaling operation. So it seems that there is a "ext" version and a not "ext" version, but then I couldn't find a differently named upscaling operation in CUDA either? (was looking in CUDA to find what the required one should be named for metal)

@baileyheading
Copy link
Contributor Author

baileyheading commented Jul 15, 2024

Did you add ggml_uscale_ext yourself? (referring to the comments on it)

    // nearest interpolate
    // multiplies ne0 and ne1 by scale factor
    // used in stable-diffusion
    GGML_API struct ggml_tensor * ggml_upscale(
            struct ggml_context * ctx,
            struct ggml_tensor  * a,
            int                   scale_factor);

    // nearest interpolate
    // nearest interpolate to specified dimensions
    // used in tortoise.cpp
    GGML_API struct ggml_tensor * ggml_upscale_ext(
            struct ggml_context * ctx,
            struct ggml_tensor  * a,
            int                   ne0,
            int                   ne1,
            int                   ne2,
            int                   ne3);

There's only one op but two function definitions for upscale?
image

@balisujohn
Copy link
Owner

balisujohn commented Jul 15, 2024

I did yeah but I think ggerganov added the metal changes. ggml-org/ggml#814

You can check ggml.c ggml_upscale_ext is an alternate wrapper to ggml_upscale for the backend op ggml_upscale_impl (which corresponds to the op GGML_OP_UPSCALE across backends.

So it shouldnt need a custom backend op as i think GGML_OP_UPSCALE already has metal implementation. Maybe this is a ggml bug.

@baileyheading
Copy link
Contributor Author

baileyheading commented Jul 16, 2024

Mac OS ARM (CPU)
I ran the tests individually from the main.cpp() for mac os ARM on cpu. Could the change of the tensor stuff from {1,1} to {1,1,1,1} affect the test? not exactly sure how the tests were setup
Note: It does generate decent audio, just failing those tests

test_autoregressive();

 44032  45056  46080  46080 AUTOREGRESSIVE TEST SUCCESS!

test_diffusion();

i: 0===============================================] 98%
-0.598114:-0.518837
mel mismatch

test_vocoder();

i: 2104
0.01153:0.000646724
mel mismatch

Mac OS intel (x86) (CPU)
It compiles, but it fails near the start on. No idea why

  gpt_vocab_init("../models/tokenizer.json", vocab);
illegal hardware instruction  ./tortoise

@balisujohn
Copy link
Owner

The illegal instruction on intel may mean it needs a compile flag related to intel architecture extensions changed. Maybe worth looking at this file to see commands used to build a the similar project stable-diffusion.cpp https://github.com/leejet/stable-diffusion.cpp/blob/master/.github/workflows/build.yml

The test mismatch is something I noticed on windows as well, and is fine as long as legible audio comes out of the full process

@baileyheading
Copy link
Contributor Author

I tried one of those commands. I had no idea you could do a universal compilation for ARM and x86 wow

./tortoise: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64:Mach-O 64-bit executable arm64]
./tortoise (for architecture x86_64):	Mach-O 64-bit executable x86_64
./tortoise (for architecture arm64):	Mach-O 64-bit executable arm64

haven't found the right build command for intel yet but that's a cool finding for me

@baileyheading
Copy link
Contributor Author

I tried mac os intel with

cmake -DCMAKE_OSX_ARCHITECTURES=x86_64 ..

but no success. something else seems to be the issue.

@balisujohn
Copy link
Owner

maybe try the avx related ones, those are features that are missing on some intel chips but available on others, so could cause an illegal instruction.

@baileyheading
Copy link
Contributor Author

baileyheading commented Jul 16, 2024

maybe try the avx related ones, those are features that are missing on some intel chips but available on others, so could cause an illegal instruction.

possibly Rosetta 2 doesn't support them

@baileyheading
Copy link
Contributor Author

baileyheading commented Jul 24, 2024

@balisujohn did you determine whether the change of {1,1} was {1,1,1,1} was correct / required and functioned on other OS?

The only other changes in here is the CMake for Metal and commenting out a logging function in metal that didn't seem to exist. We can also change the readme info for Mac build to say that it builds for Mac ARM metal but does not run fully and that Mac Intel currently doesn't work as this will encourage other mac users dropping by to attempt builds and allows them to run it on cpu no issues (for mac arm)

@baileyheading
Copy link
Contributor Author

As for the Metal ops, that work will need to be done through GGML repo as discussed and currently it seems a bit out of my ability with my current testing setup

@balisujohn
Copy link
Owner

Ok yeah we can merge this as "support for ARM CPU only." The patch did not fix the issues on Windows, but I am very confident it is correct because without it you are reading to and writing from unallocated memory locations, which just happens silently on Linux(c++ moment).

If you could add a readme section detailing the extent of current Mac support and how to build on Mac, I think this will be ready to merge.

@balisujohn
Copy link
Owner

(I noticed the README is already updated; I'll handle this PR from here. Thanks for your contribution!)

@balisujohn
Copy link
Owner

alright LGTM :^)

@balisujohn balisujohn changed the title Beginning of work on mac os metal Adds support for MacOS Arm CPU builds Jul 27, 2024
@balisujohn balisujohn merged commit 81110c2 into balisujohn:master Jul 27, 2024
0 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants