MPS support #790

maximegmd · 2024-04-18T12:12:29Z

Context

For testing purposes it can be useful to run directly on a local Mac computer.

Changelog

Checks support for BF16 on MPS device.
Added a configuration targeting MPS, changes to path were required due to the way Mac resolves symlinks in /tmp as /private/ActualPath.
Set optimizer to Adam instead of AdamW to fit in memory on 64GB devices.

Test plan

Ran a training job on Mistral 7b full finetune.
The current test jobs are very CUDA specific, maybe this could be changed as well?
We may need to integrate Mac runners in the pipeline.
Some dependencies such as bitsandbytes are not yet Mac compatible.

pytorch-bot · 2024-04-18T12:12:31Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/790

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

joecummings · 2024-04-18T14:24:36Z

@maximegmd This is awesome! Can you post some loss curves for the finetune you ran?

maximegmd · 2024-04-18T19:00:41Z

@maximegmd This is awesome! Can you post some loss curves for the finetune you ran?

I will complete a run during the weekend, losses looked fine but the Llama3 release changed my priorities ^^

kartikayk

Thanks so much for making this change, supporting MPS has been on our TODO!

I'm a bit confused about how this is working because the device param is used to fetch the device using this utility function, which in turn depends on this function. We seemingly are never actually returning mps as the device. So how is this working? This'll just default to CPU I think

maximegmd · 2024-04-19T20:58:00Z

device is not None when this function is called so it just passes 'mps' to torch.device() which is the expected pytorch name.

But you are correct that there is room for improvement to automatically return mps when device is not manually specified in the config.

kartikayk · 2024-04-19T21:01:29Z

Oh good point, totally glossed over the fact the device = torch.device(device) outside the if block. Yup sounds good. What's the iter/sec you're getting with this on a mac?

maximegmd · 2024-04-19T21:05:14Z

If I recall correctly it was around 20s/it but I suspect I was swapping a bit so I can probably improve the speed. The main issue is bitsandbytes not supporting MPS so it uses quite a bit of memory for the optimizer state.

I will try to push a llama3 config tomorrow with some numbers now that my llama3 finetune is running :)

recipes/configs/mistral/7B_full_mps.yaml

maximegmd · 2024-05-18T09:21:23Z

Here is a train run on Gemma 2B, sadly laptop went to sleep right before the end but this is a 14 hours run, should be representative enough.
log_1715964526.txt

MPS config

5cbe296

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 18, 2024

kartikayk reviewed Apr 19, 2024

View reviewed changes

kartikayk mentioned this pull request May 7, 2024

can this use gpu on macs? #944

Closed

maximegmd added 2 commits May 17, 2024 14:20

Merge branch 'main' of https://github.com/pytorch/torchtune

e18b90f

fix missing import

b6a8ae3

joecummings reviewed May 17, 2024

View reviewed changes

recipes/configs/mistral/7B_full_mps.yaml Outdated Show resolved Hide resolved

Update 7B_full_mps.yaml

a79f7fb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPS support #790

MPS support #790

maximegmd commented Apr 18, 2024

pytorch-bot bot commented Apr 18, 2024 •

edited

joecummings commented Apr 18, 2024

maximegmd commented Apr 18, 2024

kartikayk left a comment

maximegmd commented Apr 19, 2024

kartikayk commented Apr 19, 2024

maximegmd commented Apr 19, 2024

maximegmd commented May 18, 2024

MPS support #790

Are you sure you want to change the base?

MPS support #790

Conversation

maximegmd commented Apr 18, 2024

Context

Changelog

Test plan

pytorch-bot bot commented Apr 18, 2024 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/790

joecummings commented Apr 18, 2024

maximegmd commented Apr 18, 2024

kartikayk left a comment

Choose a reason for hiding this comment

maximegmd commented Apr 19, 2024

kartikayk commented Apr 19, 2024

maximegmd commented Apr 19, 2024

maximegmd commented May 18, 2024

pytorch-bot bot commented Apr 18, 2024 •

edited