-
Notifications
You must be signed in to change notification settings - Fork 498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve testing coverage across all device hw types #8669
Comments
Thank you for your suggestion. I would argue that we actually benefit from having different test sets for different XLA backends. That's for a couple of reasons:
Could you give examples on when you have to add the same tests for these 3 backends? Maybe we could have a file that implements something like PyTorch's device parameterized tests |
Thanks for the feedback! Based on our recent Neuron test migration effort, we found that most of the tests from CPU (at least) were compatible with our backend. Several of the tests were commented out as needing RCA to mitigate/resolve any backend limitations, but this gives us room to cover those cases. I believe that most PyTorch/XLA functionality is intended to be consistent and agnostic to XLA backends, and if tests (which granted, can be written to be backend-specific) are disabled or not captured as part of a backend-specific test, we want to make sure that we are aware of these limitations, which are not tracked and lack sufficient visibility - which is the main concern. A unified test structure with skip decorators (as an example, not necessarily having to use decorators, e.g., @skipCPU, @skipTPU) requiring reason parameters offers several advantages. It provides clear visibility into backend-specific limitations and allows us to track technical debt through known references. When tests are skipped, the reasoning is documented directly in the code, making it easier for teams to understand intentional behavioral differences. If we have backend-specific tests, we can still use @onlyCPU/@onlyTPU decorators. If there are more general topics to capture, we can group these in dedicated testing classes that only test a specific backend, but that again, is a better means for us to document our coverage (we have these for Neuron as well). In my opinion, having the decorators with concise reasoning behind the skip is a delta improvement from having to maintain distinct test files. Note that the standard at the moment is for the developer to copy the test file across all test files. PyTorch's device parameterized tests is an interesting suggestion. I am not particularly in favor of using skip decorators, but mostly in how we can avoid having to maintain difference test files. From a maintenance perspective, consolidating tests to run with any XLA backend unless granted an exemption, mitigates the risk of tests being accidentally omitted from specific backends. In addition to the better visibility into cross-backend compatibility, it will help us identify areas where we can improve consistency across implementations. IMHO, there's added benefit in developing XLA backend agnostic solutions when possible as an additional criteria. Adding that level of visibility that X/Y/Z doesn't work for a specific backend allows those teams/contributors to add/investigate that gap and benefit from it. I think we could first survey which tests are XLA backend specific, and see what coverage we might be missing at the moment. I'll see if I can provide the examples, but started with the motivation first. |
I did a short skim, and I believe we have ~26% overlap of TPU & CPU, and 86% overlap of Neuron & CPU, and around ~26% between Neuron & TPU. An example I see here is that there's a large number of CPU tests that are not present in TPU (see below) or vice-versa, but it's unclear if they all should not be.
|
Seems that the data has revealed that we're missing a lot of test coverage on TPU and it's worth addressing that, irregardless of if we use decorators or separate files to isolate backend specific tests. |
I agree. This matches my expectation since I think this is usually unavoidable with the current test structure, but +1 to prioritizing closing the coverage gap. I made it a bit clearer above, but the test files above are not overlapping - meaning either missing on CPU or TPU, mostly the latter. |
I think this highlights exactly what I am thinking. I know that there are some backend-specific code inside PyTorch/XLA, which definitely should be tested. Since they are backend-specific, IMHO it makes more sense to me that they should live inside their own directories in In the end, I think it mostly depends on what kind of tests we wish to have in this repository:
Also, if tests are the same for all backends, we could just run them while setting That said, I agree that we shouldn't be copy-and-pasting tests for testing on different backends. Also, if the whole files are meant to be the same, we could just run them with different |
🐛 Bug
We're currently maintaining
test/run_tests.sh
(CPU),test/tpu/run_tests.sh
(TPU) andtest/neuron/run_tests.sh
(Trainium). Subsequently, anyone adding a test is required to add the test on all 3 locations, in addition to being given the option to skip (intentionally or not) a specific hardware device type. Therefore, it is evident that not all tests are present in all three files, and there is no tracking visibility if these were deliberately not added, in addition to its reasoning.Expected behavior
We should move all of the tests under a single location, and use fixtures to mark tests to skip for a specific device / hardware. In addition we can specify and split/group tests to run under a specific runner, when distributing the tests across the CI/CD. The expected behavior is that skipped tests have a reason/motivation that is concisely described or tracked under a bug/feature item.
The text was updated successfully, but these errors were encountered: