Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC][fix test] missing .item() in frozen nf4 test #609

Closed
wants to merge 1 commit into from

Conversation

skcoirz
Copy link

@skcoirz skcoirz commented Mar 28, 2024

Context

  • as titled. missing .item()
  • but there is another thing. I found the error from frozen-nf4 is larger than the one from bnb-linear
Screenshot 2024-03-28 at 2 28 58 PM

Changelog

  • updated nf4 test

Test plan

  • pytest tests/torchtune/modules/low_precision/test_nf4_linear.py

Copy link

pytorch-bot bot commented Mar 28, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/609

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 28, 2024
@rohan-varma
Copy link
Member

Thanks for your contribution @skcoirz! This was definitely an oversight on my part. Though do we understand why the current test (err.item() < err_bnb) is passing in CI?

@skcoirz
Copy link
Author

skcoirz commented Mar 28, 2024

good question. Let me dig into it.

@skcoirz
Copy link
Author

skcoirz commented Mar 28, 2024

@rohan-varma , found out the cause.

  • unit test is able to detect this test. (confirmed on my local server)
  • but in CI, the 2 accuracy tests are bypassed because there is no cuda. (confirmed in logging below)

looks integration test is handled in aws. I would assume we have GPU there? Let's mark them as integration_tests?


Screenshot 2024-03-28 at 2 54 09 PM

https://github.com/pytorch/torchtune/actions/runs/8472914261/job/23216077574

@ebsmothers
Copy link
Contributor

ebsmothers commented Mar 28, 2024

@skcoirz thanks for debugging this. Yeah this is a weird edge case. We can run this GPU test in CI but right now they are meant for more like E2E runs of fine-tuning scripts. So we can mark this test as an integration via @pytest.mark.integration_test like you suggested, but it is a bit unintuitive (e.g. it'll be our only integration test outside of tests/recipes). That said, we do need a way to catch this. Out of curiosity, @rohan-varma, why is this a CUDA test anyways? Is NF4 only supported on CUDA? Or is this more of a bnb-side thing?

Edit: alternatively we could just modify our multi-GPU CI job to run on all unit tests and recipe tests. We do have some other distributed-type unit tests, so while it might be a bit of overkill it's probably not a terrible idea to do this anyways.

@ebsmothers
Copy link
Contributor

We don't have NF4Linear in torchtune anymore so this PR is no longer relevant

@ebsmothers ebsmothers closed this May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants