Add gpu pickleable module to torch #2265

ziyiwu9494 · 2021-04-15T08:30:35Z

A potential fix for #2079, by creating an garage.torch.Module wrapper class which implements pickling with moving to cpu on pickle and back to original device on unpickle if the device exists.

codecov · 2021-04-15T11:56:13Z

Codecov Report

Merging #2265 (3654cf0) into master (90b6090) will decrease coverage by 0.02%.
The diff coverage is 82.60%.

@@            Coverage Diff             @@
##           master    #2265      +/-   ##
==========================================
- Coverage   91.20%   91.17%   -0.03%     
==========================================
  Files         199      199              
  Lines       11627    10996     -631     
  Branches     1557     1392     -165     
==========================================
- Hits        10604    10026     -578     
+ Misses        754      703      -51     
+ Partials      269      267       -2

Impacted Files	Coverage Δ
src/garage/torch/_functions.py	`90.84% <80.95%> (-3.49%)`	⬇️
src/garage/torch/policies/policy.py	`100.00% <100.00%> (ø)`
src/garage/torch/optimizers/optimizer_wrapper.py	`93.75% <0.00%> (-6.25%)`	⬇️
src/garage/sampler/env_update.py	`82.92% <0.00%> (-3.62%)`	⬇️
src/garage/torch/optimizers/differentiable_sgd.py	`86.66% <0.00%> (-3.34%)`	⬇️
src/garage/torch/policies/stochastic_policy.py	`88.63% <0.00%> (-3.17%)`	⬇️
src/garage/torch/algos/trpo.py	`91.66% <0.00%> (-3.08%)`	⬇️
src/garage/torch/algos/dqn.py	`93.07% <0.00%> (-2.09%)`	⬇️
src/garage/np/_functions.py	`77.39% <0.00%> (-1.74%)`	⬇️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 90b6090...3654cf0. Read the comment docs.

krzentner

This looks pretty good. There's a few minor improvements I've mentioned above. We should also add this new class to q_functions and value_functions (as well as probably everything in garage.torch.modules), but that can happen in another PR.

krzentner · 2021-04-15T18:08:18Z

src/garage/torch/_functions.py

+        save_from_device = global_device()
+        self.to('cpu')
+        state = self.__dict__.copy()
+        state['device'] = save_from_device


Probably should call this 'garage.global_device' or something, so that it definitely doesn't conflict with any sub-field names.

krzentner · 2021-04-15T18:10:11Z

src/garage/torch/_functions.py

+        system_device = global_device()
+        save_from_device = state['device']
+        if save_from_device == system_device:
+            module_state_to(state, system_device)


I think you can use the other state-dict moving function you wrote here (even though this kinda isn't a state dict).

Will every movable parameter of a nn.module be a tensor or a dict?

krzentner · 2021-04-15T18:11:46Z

src/garage/torch/_functions.py

+        """
+        system_device = global_device()
+        save_from_device = state['device']
+        if save_from_device == system_device:


I'm not sure if this if statement is really needed. I think the idea here is to pre-move everything, as an optimization, but I'm not sure if that's actually faster. If you're going to do this, please use timeit.timeit to measure the performance difference.

The reason I was doing it this way was because I didn't know if you needed to move any of the other attributes in the dict as well as the module itself, so I'm just moving all internal attributes first and then moving the module itself.

Oh, I see. .to() should know how to handle that already (since that's how the module was moved to the device in the first place). If modules need to move something besides the default behavior, they should override .to() themselves.

ziyiwu9494 requested a review from a team as a code owner April 15, 2021 08:30

ziyiwu9494 requested review from krzentner and removed request for a team April 15, 2021 08:30

mergify bot requested review from a team, zequnyu and maliesa96 and removed request for a team April 15, 2021 08:31

ziyiwu9494 force-pushed the move-before-pickle branch from fe138b1 to 2805fbe Compare April 15, 2021 10:59

ziyiwu9494 closed this Apr 15, 2021

ziyiwu9494 reopened this Apr 15, 2021

Add gpu pickleable module to torch

3654cf0

ziyiwu9494 force-pushed the move-before-pickle branch from 2805fbe to 3654cf0 Compare April 15, 2021 11:05

krzentner reviewed Apr 15, 2021

View reviewed changes

ziyiwu9494 closed this May 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add gpu pickleable module to torch #2265

Add gpu pickleable module to torch #2265

ziyiwu9494 commented Apr 15, 2021

codecov bot commented Apr 15, 2021

krzentner left a comment

krzentner Apr 15, 2021

krzentner Apr 15, 2021

ziyiwu9494 Apr 15, 2021

krzentner Apr 15, 2021

ziyiwu9494 Apr 15, 2021

krzentner Apr 15, 2021

Add gpu pickleable module to torch #2265

Add gpu pickleable module to torch #2265

Conversation

ziyiwu9494 commented Apr 15, 2021

codecov bot commented Apr 15, 2021

Codecov Report

krzentner left a comment

Choose a reason for hiding this comment

krzentner Apr 15, 2021

Choose a reason for hiding this comment

krzentner Apr 15, 2021

Choose a reason for hiding this comment

ziyiwu9494 Apr 15, 2021

Choose a reason for hiding this comment

krzentner Apr 15, 2021

Choose a reason for hiding this comment

ziyiwu9494 Apr 15, 2021

Choose a reason for hiding this comment

krzentner Apr 15, 2021

Choose a reason for hiding this comment