fix(optim/meta): torch tensor memory not release due to gradient link #219

ycsos · 2024-05-21T06:54:28Z

Description

when use torchopt.MetaAdam and step some times, the memory use in gpu are continuously increase. It should not be, will you excute next step, the tensor create in the former step is no need should be release. I find the reason: metaOptimizer not detach the gradient link in optimizer. and former tensor was not release by torch due to dependency.

you can run the test code, the first one memory increase by step increase. and second one (I change the code to detach the grad link) the memory is stable when step increase:

Motivation and Context

Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.
You can use the syntax close #218 if this solves the issue #15213

I have raised an issue to propose this change (required)

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)
Example (update in the folder of example)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide. (required)
My change requires a change to the documentation.
I have updated the tests accordingly. (required for a bug fix or a new feature)
I have updated the documentation accordingly.
I have reformatted the code using make format. (required)
I have checked the code using make lint. (required)
I have ensured make test pass. (required)

codecov · 2024-05-21T07:17:53Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.69%. Comparing base (b3f570c) to head (d0132cb).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #219   +/-   ##
=======================================
  Coverage   93.69%   93.69%           
=======================================
  Files          83       83           
  Lines        2963     2964    +1     
=======================================
+ Hits         2776     2777    +1     
  Misses        187      187

Flag	Coverage Δ
unittests	`93.69% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

XuehaiPan · 2024-05-21T09:58:00Z

Closing now. See my comment at #218 (comment).

XuehaiPan · 2024-05-21T11:06:26Z

torchopt/optim/meta/base.py

+                updates, new_state = self.impl.update(
+                    grads,
+                    state,
+                    params=flat_params,
+                    inplace=False,
+                )
+                self.state_groups[i] = new_state


updates can be detached from the graph while new_state should remain in the graph for explicit gradient computation. We need to add a new test for this. cc @JieRen98

fix torch tensor memory not release due to gradient link

d0132cb

Benjamin-eecs changed the title ~~fix torch tensor memory not release due to gradient link~~ fix: torch tensor memory not release due to gradient link May 21, 2024

Benjamin-eecs changed the title ~~fix: torch tensor memory not release due to gradient link~~ fix(optim): torch tensor memory not release due to gradient link May 21, 2024

Benjamin-eecs changed the title ~~fix(optim): torch tensor memory not release due to gradient link~~ fix(optim.meta): torch tensor memory not release due to gradient link May 21, 2024

Benjamin-eecs changed the title ~~fix(optim.meta): torch tensor memory not release due to gradient link~~ fix(optim/meta): torch tensor memory not release due to gradient link May 21, 2024

XuehaiPan closed this May 21, 2024

XuehaiPan reopened this May 21, 2024

XuehaiPan reviewed May 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(optim/meta): torch tensor memory not release due to gradient link #219

fix(optim/meta): torch tensor memory not release due to gradient link #219

Uh oh!

ycsos commented May 21, 2024

Uh oh!

codecov bot commented May 21, 2024 •

edited

Loading

Uh oh!

XuehaiPan commented May 21, 2024

Uh oh!

XuehaiPan May 21, 2024

Uh oh!

Uh oh!

fix(optim/meta): torch tensor memory not release due to gradient link #219

Are you sure you want to change the base?

fix(optim/meta): torch tensor memory not release due to gradient link #219

Uh oh!

Conversation

ycsos commented May 21, 2024

Description

Motivation and Context

Types of changes

Checklist

Uh oh!

codecov bot commented May 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

XuehaiPan commented May 21, 2024

Uh oh!

XuehaiPan May 21, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented May 21, 2024 •

edited

Loading