Updated test_graph_optims and test_graph_scaling_fused_optimizers to use new OptimizerInfo infrastructure #125127

jayanthd04 · 2024-04-28T17:55:39Z

This PR is meant to address issue #123451, more specifically, the test_graph_optims and test_graph_scaling_fused_optimizers functions in test_cuda.py have been updated so that they now use the new OptimizerInfo infrastructure.

Lintrunner passed:

$ lintrunner test/test_cuda.py
ok No lint issues.

Tests passed:

>python test_cuda.py -k test_graph_optims
Ran 19 tests in 7.463s

OK (skipped=9)

>python test_cuda.py -k test_graph_scaling_fused_optimizers
Ran 6 tests in 2.800s

OK (skipped=3)

Both the functions have been moved to the newly created TestCase class TestCudaOptims. The test is mostly the same except the @optims decorator is used at the top of the function to implicitly call the function using each of the optimizers mentioned in the decorator instead of explicitly using a for loop to iterate through each of the optimizers.

I was unable to use the _get_optim_inputs_including_global_cliquey_kwargs to get all kwargs for each of the optimizers since some of the kwargs that are used in the original test_graph_optims function are not being returned by the new OptimizerInfo infrastructure, more specifically, for the torch.optim.rmsprop.RMSprop optimizer, the following kwargs are not returned whenever _get_optim_inputs_including_global_cliquey_kwargs is called:

{'foreach': False, 'maximize': True, 'weight_decay': 0}
{ 'foreach': True, 'maximize': True, 'weight_decay': 0}

I ran into the same issue for test_graph_scaling_fused_optimizers, for the torch.optim.adamw.AdamW optimizer, whenever optim_info.optim_inputs_func(device=device) was called, the following kwarg was not returned:

{'amsgrad': True}

Due to this issue, I resorted to using a dictionary to store the kwargs for each of the optimizers, I am aware that this is less than ideal. I was wondering whether I should use the OptimizerInfo infrastructure to get all the kwargs regardless of the fact that it lacks some kwargs.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

…st_graph_scaling_fused_optimizers to

…ptimizerInfos

… class and using OptimizerInfos

pytorch-bot · 2024-04-28T17:55:42Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125127

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit 041afac with merge base 1a28f73 ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 3, 5, linux.g5.4xlarge.nvidia.gpu) (gh) (similar failure)
test_foreach.py::TestForeachCUDA::test_parity__foreach_erf_fastpath_inplace_cuda_bool
pull / linux-focal-py3.11-clang10 / test (dynamo, 2, 3, linux.2xlarge) (gh) (similar failure)
functorch/test_aotdispatch.py::TestAOTAutograd::test_set__and_data_mutation_good
pull / linux-focal-py3.12-clang10 / test (dynamo, 3, 3, linux.2xlarge) (gh) (similar failure)
functorch/test_aotdispatch.py::TestAOTAutograd::test_set__and_data_mutation_good

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2024-04-28T17:55:43Z

The committers listed above are authorized under a signed CLA.

✅ login: jayanthd04 (ad331e2, 2aca95f, c10432e, 91264d4, 39d99b5, 37e6979, 6d72aff, 2a6c915, 287f741, a0af2fe, be64baf, 1e230e7, 9e86434, 8e0945a, 35e1b6a, 3a86b06, e901a0a, b75098e, af8653c, f6cec68, 79f542b, 8d6abfb, b8f1ad0, 501590b, d4815a7, e80cde4, fe29e60, eac2bd0, 8d968d7, c93214e, b13d6b0, 041afac, cd6249f, 543bba1, 83fd6fe, c7b33f4, 236f2fe, 5f92373)

janeyx99

Thanks for your thorough look at the kwargs. For RMSprop, feel free to add a maximize with no weight_decay option here:

pytorch/torch/testing/_internal/common_optimizers.py

Line 113 in b18d8d4

supports_param_groups: bool = True,

For Adam/W, it's okay to not have amsgrad alone--we test it sufficiently with the "capturable, amsgrad" and the "amsgrad" described inputs.

This way we can use the helper function to get the kwargs instead of needing a separate dictionary.

jayanthd04 · 2024-05-02T16:17:23Z

Thank you for the review! To add a maximize with no weight_decay option in RMSprop, should I just edit optim_inputs_func_rmsprop? Also I've noticed some other optimizers such as torch.optim.Adadelta are also missing some kwargs such as no weight_decay and maximize, should I also add those options in common_optimizers.py or should I just use the current options for those optimizers?

janeyx99 · 2024-05-02T16:55:59Z

Thank you for the review! To add a maximize with no weight_decay option in RMSprop, should I just edit optim_inputs_func_rmsprop?

Yes

Also I've noticed some other optimizers such as torch.optim.Adadelta are also missing some kwargs such as no weight_decay and maximize, should I also add those options in common_optimizers.py

Yes

…st_cuda.py

…fused_optimizers

janeyx99 · 2024-05-14T19:50:01Z

Yes, if the compiled optimizer test fails again, you may need to add a new entry to

pytorch/test/inductor/test_compiled_optimizers.py

Line 138 in 352a893

"test_sgd_weight_decay_maximize_cpu": 4,

jayanthd04 · 2024-05-14T23:32:56Z

I believe 'test/inductor/test_compiled_optimizers.py::CompiledOptimizerTests::test_sgd_weight_decay_cpu' is the test that is failing.

janeyx99 · 2024-05-15T15:54:28Z

Yep, looks like you need to add an entry to the code snippet I linked above as you added new configs.

jayanthd04 · 2024-05-15T17:04:06Z

Should I just add this to pytorch/test/inductor/test_compiled_optimizers.py

test_sgd_weight_decay_cpu=4,
test_sgd_weight_decay_cuda=4,

janeyx99 · 2024-05-15T17:44:36Z

That looks reasonable--does that pass the test?

jayanthd04 · 2024-05-18T19:53:05Z

It looks like it works.

janeyx99 · 2024-05-20T03:30:47Z

@pytorchbot merge

pytorchmergebot · 2024-05-20T03:33:12Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

DanilBaibak · 2024-05-20T12:12:45Z

@pytorchbot revert -m "Broken trunk" -c nosignal

pytorchmergebot · 2024-05-20T12:14:17Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

…izers to use new OptimizerInfo infrastructure (#125127)" This reverts commit cf35a59. Reverted #125127 on behalf of https://github.com/DanilBaibak due to Broken trunk ([comment](#125127 (comment)))

pytorchmergebot · 2024-05-20T12:14:26Z

@jayanthd04 your PR has been successfully reverted.

This PR was reopened (likely due to being reverted), so your approval was removed. Please request another review.

DanilBaibak · 2024-05-20T12:19:58Z

Hi @jayanthd04! Sorry, I need to revert your PR because it broke cuda tests. Here you can find more details.

github-actions · 2024-07-19T15:33:58Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

…rs (#133749) Fixes #123451 This is a rework of a reverted pull request, #125127. The test failure is fixed. Pull Request resolved: #133749 Approved by: https://github.com/janeyx99

…rs (pytorch#133749) Fixes pytorch#123451 This is a rework of a reverted pull request, pytorch#125127. The test failure is fixed. Pull Request resolved: pytorch#133749 Approved by: https://github.com/janeyx99

jayanthd04 added 15 commits April 18, 2024 17:05

Adding TestCudaOptims class to move test_graph_optims function and te…

fe29e60

…st_graph_scaling_fused_optimizers to

Creating test_new_graph_optims under TestCudaOptims class and using O…

f6cec68

…ptimizerInfos

Creating test_new_graph_scaling_fused_optimizers under TestCudaOptims…

9e86434

… class and using OptimizerInfos

Resolving conflict with upstream/main

8d6abfb

Merge remote-tracking branch 'upstream/main' into test_cuda_OptimInfo

3a86b06

Deleting old test_graph_scaling_fused_optimizers and test_graph_optims

c93214e

Merge remote-tracking branch 'upstream/main' into test_cuda_OptimInfo

2aca95f

Fixing some linting issues

501590b

Fixing more linting issues

c10432e

Merge remote-tracking branch 'upstream/main' into test_cuda_OptimInfo

8e0945a

Fixing final linting issues

1e230e7

Fixing almost all linting issues

c7b33f4

Fixing most of the linting issues

af8653c

Fixing even more linting issues

35e1b6a

Merge remote-tracking branch 'upstream/main' into test_cuda_OptimInfo

287f741

pytorch-bot bot added the topic: not user facing topic category label Apr 28, 2024

pytorchbot added the open source label Apr 28, 2024

cpuhrsch requested a review from janeyx99 April 30, 2024 19:52

cpuhrsch added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 30, 2024

janeyx99 reviewed May 2, 2024

View reviewed changes

jayanthd04 added 6 commits May 3, 2024 17:52

Adding kwargs to common_optimizers.py for added test coverability

39d99b5

Testing updated kwargs in common_optimizers.py

6d72aff

Merge remote-tracking branch 'upstream/main' into test_cuda_OptimInfo

eac2bd0

Adding kwargs to common_optimizers.py and deleting dictionaries in te…

b8f1ad0

…st_cuda.py

Merge remote-tracking branch 'upstream/main' into test_cuda_OptimInfo

b75098e

Deleting references to optim_kwargs dictionary in test_graph_scaling_…

be64baf

…fused_optimizers

Adding test_sgd_weight_decay configs to test_compiled_optimizers.py

83fd6fe

pytorch-bot bot added the module: inductor label May 18, 2024

Fixing linting issues

041afac

pytorchmergebot added the merging label May 20, 2024

pytorchmergebot added the Merged label May 20, 2024

pytorchmergebot closed this in cf35a59 May 20, 2024

pytorchmergebot removed the merging label May 20, 2024

pytorchmergebot added the Reverted label May 20, 2024

pytorchmergebot reopened this May 20, 2024

github-actions bot added the Stale label Jul 19, 2024

zero000064 mentioned this pull request Aug 17, 2024

parameterized test_graph_optims and test_graph_scaling_fused_optimizers #133749

Closed

github-actions bot closed this Aug 18, 2024

Updated test_graph_optims and test_graph_scaling_fused_optimizers to use new OptimizerInfo infrastructure #125127

Updated test_graph_optims and test_graph_scaling_fused_optimizers to use new OptimizerInfo infrastructure #125127

Uh oh!

Conversation

jayanthd04 commented Apr 28, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125127

✅ You can merge normally! (3 Unrelated Failures)

Uh oh!

linux-foundation-easycla bot commented Apr 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

janeyx99 left a comment

Choose a reason for hiding this comment

Uh oh!

jayanthd04 commented May 2, 2024

Uh oh!

janeyx99 commented May 2, 2024

Uh oh!

janeyx99 commented May 14, 2024

Uh oh!

jayanthd04 commented May 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

janeyx99 commented May 15, 2024

Uh oh!

jayanthd04 commented May 15, 2024

Uh oh!

janeyx99 commented May 15, 2024

Uh oh!

jayanthd04 commented May 18, 2024

Uh oh!

janeyx99 commented May 20, 2024

Uh oh!

pytorchmergebot commented May 20, 2024

Merge started

Uh oh!

DanilBaibak commented May 20, 2024

Uh oh!

pytorchmergebot commented May 20, 2024

Uh oh!

pytorchmergebot commented May 20, 2024

Uh oh!

DanilBaibak commented May 20, 2024

Uh oh!

github-actions bot commented Jul 19, 2024

Uh oh!

Uh oh!

jayanthd04 commented Apr 28, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Apr 28, 2024 •

edited

Loading

linux-foundation-easycla bot commented Apr 28, 2024 •

edited

Loading

jayanthd04 commented May 14, 2024 •

edited

Loading