Add trainer integration test for llava to ensure accelerate autocasting works correctly #30489

frasermince · 2024-04-25T17:28:12Z

What does this PR do?

This PR adds a new integration test to ensure the accelerate autocasting is working correctly. This came out of a discussion found here and that PR should probably be merged first (or this one merged into that one).

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Not an issue but another PR that should probably be merged first: Fix llava half precision and autocast issues #29721 (comment)
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

frasermince · 2024-04-25T17:29:03Z

tests/models/llava/test_trainer_llava.py

@@ -0,0 +1,120 @@
+import unittest


This file should be the only new change. That will be clearer once the other is merged. Usually I would set it up to merge this one into the first but this feels a bit difference since the changes are on a fork.

frasermince · 2024-04-25T17:33:43Z

tests/models/llava/test_trainer_llava.py

+
+        output = model(**inputs)
+        expected_slice = torch.tensor(
+            [[-3.5664, -3.5625, -0.4309], [-5.8242, -5.6914, -1.3242], [-5.4805, -5.9375, 1.1465]],


I saw this pattern in the llava next tests. These came from the use of this model before training. Not quite sure if this is correct so please let me know if there is something else we want to test. Perhaps instead we want the trained model before applying the downcasting change?

Also note these do not yet pass allclose. I wanted to go ahead and open this PR to generate discussion around what the right thing to test is.

@muellerzr How are these allclose values found? Is it literally what the model is outputting now to determine if logits are changing in the future? Is it based on some original implementation?

frasermince · 2024-04-25T17:34:32Z

tests/models/llava/test_trainer_llava.py

+            "llava-hf/bakLlava-v1-hf", quantization_config=bits_and_bytes_config
+        )
+        adapter_name = "lora_default"
+        peft_config = LoraConfig(


I am somewhat unclear on where we test the slow tests but I assume there is some limit on memory so I tried to give a reasonable LORA for this test. If you think there is a simpler or more idiomatic way to do this test please let me know.

I think this is fine for what we're doing!

frasermince · 2024-04-25T17:36:02Z

tests/models/llava/test_trainer_llava.py

+    @slow
+    @require_bitsandbytes
+    def test_model_trainer_integration_test(self):
+        def image_prompt_generator():


Not entirely sure this is the simplest or most idiomatic way to create this test dataset so please let me know if there is a better way.

amyeroberts · 2024-04-25T18:32:44Z

cc @muellerzr for first review

frasermince · 2024-05-02T22:09:49Z

Updated this now that the previous PR is merged! I am very concerned about OOMs being an issue here however. I think there's some open questions around:

How we ensure models are compatible with the trainer and accelerate
How we test training a model in CI given how memory intensive this can be

amyeroberts · 2024-05-30T21:37:38Z

Gentle ping @muellerzr, or possibly @SunMarc?

amyeroberts · 2024-07-19T11:51:10Z

Another ping @muellerzr @SunMarc

muellerzr

Overall this is a very good test. Since autocasting is done "automagically" via accelerate, this tests it exactly how you should!

amyeroberts

Looks great - thanks for adding this!

Once the conflicts are resolved I think we're good to go

cc @zucchini-nlp for reference

amyeroberts · 2024-07-31T22:27:10Z

tests/models/llava/test_trainer_llava.py

@@ -0,0 +1,116 @@
+import gc


missing copyright header

amyeroberts · 2024-07-31T22:29:12Z

tests/models/llava/test_trainer_llava.py

+
+    def tearDown(self):
+        gc.collect()
+        torch.cuda.empty_cache()


My (somewhat sparse) knowledge of empty_cache is that it's not meant to be used manually and can cause unintended / surprising behaviour: #31372 (comment)

While normally I’d agree, if it’s in the tests it should be fine. That was in reference to it being in the actual Trainer code

zucchini-nlp

Interesting finding, thanks for fixing and adding tests!

Out of curiosity, iiuc training llava and llava-next with HF Trainer and fp16/bf16 flags failed with dtype errors, before the fix was done. I am wondering about other llava-based models, this should mean that vipllava, llava-next-video and video-llava should fail with the same error because all follow similar architecture. But for llava-next-video I had a script with fp16 running w/o errors, would love to know your opinion on this

frasermince · 2024-08-05T13:54:18Z

Interesting finding, thanks for fixing and adding tests!

Out of curiosity, iiuc training llava and llava-next with HF Trainer and fp16/bf16 flags failed with dtype errors, before the fix was done. I am wondering about other llava-based models, this should mean that vipllava, llava-next-video and video-llava should fail with the same error because all follow similar architecture. But for llava-next-video I had a script with fp16 running w/o errors, would love to know your opinion on this

Interesting, it's possible that this error has been subsequently fixed in these other envs. It's been a couple of months since I have looked at this but I could definitely check if there are any other nuances that could cause those models to work with a half precision flag and llava and llava-next to not. I would have to do a a bit more research on this.

frasermince commented Apr 25, 2024

View reviewed changes

frasermince force-pushed the frasermince/trainer-integration-test branch from b723f1f to 908ff93 Compare April 25, 2024 17:32

frasermince commented Apr 25, 2024

View reviewed changes

frasermince force-pushed the frasermince/trainer-integration-test branch from 908ff93 to 1cd13e0 Compare April 25, 2024 17:35

frasermince commented Apr 25, 2024

View reviewed changes

frasermince force-pushed the frasermince/trainer-integration-test branch 4 times, most recently from 7ce15e3 to 9440dd6 Compare April 25, 2024 18:13

frasermince force-pushed the frasermince/trainer-integration-test branch 2 times, most recently from e9e3feb to bc31529 Compare May 2, 2024 22:07

frasermince force-pushed the frasermince/trainer-integration-test branch from 8f35687 to 3c7e7e1 Compare May 5, 2024 19:10

huggingface deleted a comment from github-actions bot May 30, 2024

huggingface deleted a comment from github-actions bot Jun 24, 2024

huggingface deleted a comment from github-actions bot Jul 19, 2024

muellerzr approved these changes Jul 31, 2024

View reviewed changes

muellerzr requested a review from amyeroberts July 31, 2024 20:12

amyeroberts approved these changes Jul 31, 2024

View reviewed changes

zucchini-nlp reviewed Aug 1, 2024

View reviewed changes

frasermince added 4 commits August 8, 2024 15:24

Ensure input_embeds and image_features are the same dtype in autocast

33661c2

Fix nans in half precision llava-next and fix autocasting behavior.

3152739

Fix styling issues.

3aabdff

Add trainer integration test to test behavior with accelerate autocast

5c0ea84

frasermince force-pushed the frasermince/trainer-integration-test branch from 3c7e7e1 to 56bb12e Compare August 8, 2024 16:21

Add test invocations removed in rebase

cdad616

frasermince force-pushed the frasermince/trainer-integration-test branch from 56bb12e to cdad616 Compare August 8, 2024 16:24

Add trainer integration test for llava to ensure accelerate autocasting works correctly #30489

Are you sure you want to change the base?

Add trainer integration test for llava to ensure accelerate autocasting works correctly #30489

Uh oh!

Conversation

frasermince commented Apr 25, 2024

What does this PR do?

Before submitting

Who can review?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

frasermince Apr 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amyeroberts commented Apr 25, 2024

Uh oh!

frasermince commented May 2, 2024

Uh oh!

amyeroberts commented May 30, 2024

Uh oh!

amyeroberts commented Jul 19, 2024

Uh oh!

muellerzr left a comment

Choose a reason for hiding this comment

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

frasermince commented Aug 5, 2024

Uh oh!

Uh oh!

frasermince Apr 25, 2024 •

edited

Loading