Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated test_graph_optims and test_graph_scaling_fused_optimizers to use new OptimizerInfo infrastructure #125127

Open
wants to merge 36 commits into
base: main
Choose a base branch
from

Conversation

jayanthd04
Copy link

@jayanthd04 jayanthd04 commented Apr 28, 2024

This PR is meant to address issue #123451, more specifically, the test_graph_optims and test_graph_scaling_fused_optimizers functions in test_cuda.py have been updated so that they now use the new OptimizerInfo infrastructure.

Lintrunner passed:

$ lintrunner test/test_cuda.py
ok No lint issues.

Tests passed:

>python test_cuda.py -k test_graph_optims
Ran 19 tests in 7.463s

OK (skipped=9)

>python test_cuda.py -k test_graph_scaling_fused_optimizers
Ran 6 tests in 2.800s

OK (skipped=3)

Both the functions have been moved to the newly created TestCase class TestCudaOptims. The test is mostly the same except the @optims decorator is used at the top of the function to implicitly call the function using each of the optimizers mentioned in the decorator instead of explicitly using a for loop to iterate through each of the optimizers.

I was unable to use the _get_optim_inputs_including_global_cliquey_kwargs to get all kwargs for each of the optimizers since some of the kwargs that are used in the original test_graph_optims function are not being returned by the new OptimizerInfo infrastructure, more specifically, for the torch.optim.rmsprop.RMSprop optimizer, the following kwargs are not returned whenever _get_optim_inputs_including_global_cliquey_kwargs is called:

{'foreach': False, 'maximize': True, 'weight_decay': 0}
{ 'foreach': True, 'maximize': True, 'weight_decay': 0}

I ran into the same issue for test_graph_scaling_fused_optimizers, for the torch.optim.adamw.AdamW optimizer, whenever optim_info.optim_inputs_func(device=device) was called, the following kwarg was not returned:

{'amsgrad': True}

Due to this issue, I resorted to using a dictionary to store the kwargs for each of the optimizers, I am aware that this is less than ideal. I was wondering whether I should use the OptimizerInfo infrastructure to get all the kwargs regardless of the fact that it lacks some kwargs.

Copy link

pytorch-bot bot commented Apr 28, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125127

Note: Links to docs will display an error until the docs builds have been completed.

❌ 10 New Failures

As of commit 236f2fe with merge base 1a28f73 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link

linux-foundation-easycla bot commented Apr 28, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Apr 28, 2024
@cpuhrsch cpuhrsch requested a review from janeyx99 April 30, 2024 19:52
@cpuhrsch cpuhrsch added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 30, 2024
Copy link
Contributor

@janeyx99 janeyx99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your thorough look at the kwargs. For RMSprop, feel free to add a maximize with no weight_decay option here:

supports_param_groups: bool = True,

For Adam/W, it's okay to not have amsgrad alone--we test it sufficiently with the "capturable, amsgrad" and the "amsgrad" described inputs.

This way we can use the helper function to get the kwargs instead of needing a separate dictionary.

@jayanthd04
Copy link
Author

Thank you for the review! To add a maximize with no weight_decay option in RMSprop, should I just edit optim_inputs_func_rmsprop? Also I've noticed some other optimizers such as torch.optim.Adadelta are also missing some kwargs such as no weight_decay and maximize, should I also add those options in common_optimizers.py or should I just use the current options for those optimizers?

@janeyx99
Copy link
Contributor

janeyx99 commented May 2, 2024

Thank you for the review! To add a maximize with no weight_decay option in RMSprop, should I just edit optim_inputs_func_rmsprop?

Yes

Also I've noticed some other optimizers such as torch.optim.Adadelta are also missing some kwargs such as no weight_decay and maximize, should I also add those options in common_optimizers.py

Yes

Copy link
Contributor

@janeyx99 janeyx99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two more changes! And we should be good

test/test_cuda.py Outdated Show resolved Hide resolved
test/test_cuda.py Outdated Show resolved Hide resolved
Copy link
Contributor

@janeyx99 janeyx99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving, but the lr change should be removed + ci should be green before landing

@janeyx99
Copy link
Contributor

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 14, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@janeyx99
Copy link
Contributor

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/125127/head returned non-zero exit code 1

Rebasing (1/23)
Auto-merging test/test_cuda.py
CONFLICT (content): Merge conflict in test/test_cuda.py
error: could not apply fe29e607709... Adding TestCudaOptims class to move test_graph_optims function and test_graph_scaling_fused_optimizers to
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply fe29e607709... Adding TestCudaOptims class to move test_graph_optims function and test_graph_scaling_fused_optimizers to

Raised by https://github.com/pytorch/pytorch/actions/runs/9083238583

@janeyx99
Copy link
Contributor

Could you rebase locally and see if any of the CI fails are due to this PR? They shouldn't, except the compiled optimizer one, which may require adding a customization in test_compiled_optimizers.py

@jayanthd04
Copy link
Author

jayanthd04 commented May 14, 2024

I believe this section of my branch in test_cuda.py may have been the issue. After deleting the decorator I was able to merge with the upstream and my branch

@torch.testing._internal.common_utils.markDynamoStrictTest 
class TestCudaOptims(TestCase):

@janeyx99
Copy link
Contributor

That should not be the case; the decorator is strictly better and more deterministic. Could you add it back?

@jayanthd04
Copy link
Author

I've just added it back.

@janeyx99
Copy link
Contributor

Yes, if the compiled optimizer test fails again, you may need to add a new entry to

"test_sgd_weight_decay_maximize_cpu": 4,

@jayanthd04
Copy link
Author

jayanthd04 commented May 14, 2024

I believe 'test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_weight_decay_cpu' is the test that is failing.

@janeyx99
Copy link
Contributor

Yep, looks like you need to add an entry to the code snippet I linked above as you added new configs.

@jayanthd04
Copy link
Author

Should I just add this to pytorch/test/inductor/test_compiled_optimizers.py

test_sgd_weight_decay_cpu=4,
test_sgd_weight_decay_cuda=4,

@janeyx99
Copy link
Contributor

That looks reasonable--does that pass the test?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants