[RFC][fix test] missing .item() in frozen nf4 test #609

skcoirz · 2024-03-28T21:30:15Z

Context

as titled. missing .item()
but there is another thing. I found the error from frozen-nf4 is larger than the one from bnb-linear

Changelog

updated nf4 test

Test plan

pytest tests/torchtune/modules/low_precision/test_nf4_linear.py

pytorch-bot · 2024-03-28T21:30:19Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/609

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

rohan-varma · 2024-03-28T21:32:14Z

Thanks for your contribution @skcoirz! This was definitely an oversight on my part. Though do we understand why the current test (err.item() < err_bnb) is passing in CI?

skcoirz · 2024-03-28T21:34:41Z

good question. Let me dig into it.

skcoirz · 2024-03-28T21:56:26Z

@rohan-varma , found out the cause.

unit test is able to detect this test. (confirmed on my local server)
but in CI, the 2 accuracy tests are bypassed because there is no cuda. (confirmed in logging below)

looks integration test is handled in aws. I would assume we have GPU there? Let's mark them as integration_tests?

https://github.com/pytorch/torchtune/actions/runs/8472914261/job/23216077574

ebsmothers · 2024-03-28T23:28:50Z

@skcoirz thanks for debugging this. Yeah this is a weird edge case. We can run this GPU test in CI but right now they are meant for more like E2E runs of fine-tuning scripts. So we can mark this test as an integration via @pytest.mark.integration_test like you suggested, but it is a bit unintuitive (e.g. it'll be our only integration test outside of tests/recipes). That said, we do need a way to catch this. Out of curiosity, @rohan-varma, why is this a CUDA test anyways? Is NF4 only supported on CUDA? Or is this more of a bnb-side thing?

Edit: alternatively we could just modify our multi-GPU CI job to run on all unit tests and recipe tests. We do have some other distributed-type unit tests, so while it might be a bit of overkill it's probably not a terrible idea to do this anyways.

ebsmothers · 2024-05-21T14:21:55Z

We don't have NF4Linear in torchtune anymore so this PR is no longer relevant

[fix test] frozen nf4 test

e054a42

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 28, 2024

ebsmothers closed this May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC][fix test] missing .item() in frozen nf4 test #609

[RFC][fix test] missing .item() in frozen nf4 test #609

skcoirz commented Mar 28, 2024

pytorch-bot bot commented Mar 28, 2024

rohan-varma commented Mar 28, 2024

skcoirz commented Mar 28, 2024

skcoirz commented Mar 28, 2024

ebsmothers commented Mar 28, 2024 •

edited

ebsmothers commented May 21, 2024

[RFC][fix test] missing .item() in frozen nf4 test #609

[RFC][fix test] missing .item() in frozen nf4 test #609

Conversation

skcoirz commented Mar 28, 2024

Context

Changelog

Test plan

pytorch-bot bot commented Mar 28, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/609

rohan-varma commented Mar 28, 2024

skcoirz commented Mar 28, 2024

skcoirz commented Mar 28, 2024

ebsmothers commented Mar 28, 2024 • edited

ebsmothers commented May 21, 2024

ebsmothers commented Mar 28, 2024 •

edited