New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated test_graph_optims and test_graph_scaling_fused_optimizers to use new OptimizerInfo infrastructure #125127
base: main
Are you sure you want to change the base?
Conversation
…st_graph_scaling_fused_optimizers to
… class and using OptimizerInfos
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125127
Note: Links to docs will display an error until the docs builds have been completed. ❌ 10 New FailuresAs of commit 236f2fe with merge base 1a28f73 (): NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your thorough look at the kwargs. For RMSprop, feel free to add a maximize with no weight_decay option here:
supports_param_groups: bool = True, |
For Adam/W, it's okay to not have amsgrad alone--we test it sufficiently with the "capturable, amsgrad" and the "amsgrad" described inputs.
This way we can use the helper function to get the kwargs instead of needing a separate dictionary.
Thank you for the review! To add a maximize with no weight_decay option in RMSprop, should I just edit |
Yes
Yes |
…d getting rid of explicitly assigning betas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two more changes! And we should be good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approving, but the lr change should be removed + ci should be green before landing
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
@pytorchbot rebase |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
Rebase failed due to Command
Raised by https://github.com/pytorch/pytorch/actions/runs/9083238583 |
Could you rebase locally and see if any of the CI fails are due to this PR? They shouldn't, except the compiled optimizer one, which may require adding a customization in test_compiled_optimizers.py |
I believe this section of my branch in test_cuda.py may have been the issue. After deleting the decorator I was able to merge with the upstream and my branch
|
That should not be the case; the decorator is strictly better and more deterministic. Could you add it back? |
I've just added it back. |
Yes, if the compiled optimizer test fails again, you may need to add a new entry to
|
I believe |
Yep, looks like you need to add an entry to the code snippet I linked above as you added new configs. |
Should I just add this to
|
That looks reasonable--does that pass the test? |
This PR is meant to address issue #123451, more specifically, the
test_graph_optims
andtest_graph_scaling_fused_optimizers
functions intest_cuda.py
have been updated so that they now use the new OptimizerInfo infrastructure.Lintrunner passed:
Tests passed:
Both the functions have been moved to the newly created TestCase class
TestCudaOptims
. The test is mostly the same except the@optims
decorator is used at the top of the function to implicitly call the function using each of the optimizers mentioned in the decorator instead of explicitly using a for loop to iterate through each of the optimizers.I was unable to use the
_get_optim_inputs_including_global_cliquey_kwargs
to get all kwargs for each of the optimizers since some of the kwargs that are used in the originaltest_graph_optims
function are not being returned by the new OptimizerInfo infrastructure, more specifically, for thetorch.optim.rmsprop.RMSprop
optimizer, the following kwargs are not returned whenever_get_optim_inputs_including_global_cliquey_kwargs
is called:I ran into the same issue for
test_graph_scaling_fused_optimizers
, for thetorch.optim.adamw.AdamW
optimizer, wheneveroptim_info.optim_inputs_func(device=device)
was called, the following kwarg was not returned:Due to this issue, I resorted to using a dictionary to store the kwargs for each of the optimizers, I am aware that this is less than ideal. I was wondering whether I should use the OptimizerInfo infrastructure to get all the kwargs regardless of the fact that it lacks some kwargs.