Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error about using a grad transform with in-place operation is inconsistent with and without DDP #1112

Open
XuchanBao opened this issue Feb 24, 2023 · 1 comment

Comments

@XuchanBao
Copy link

Hi,

I was using torch.func in pytorch 2.0 to compute the Hessian-vector product of a neural network.

I first used torch.func.functional_call to define a functional version of the neural network model, and then proceeded to use torch.func.jvp and torch.func.grad to compute the hvp.

The above works when I was using one gpu without parallel processing. However, when I wrapped the model with Distributed Data Parallel (DDP), it gave the following error:

*** RuntimeError: During a grad (vjp, jvp, grad, etc) transform, the function provided attempted to call in-place operation (aten::copy_) that would mutate a captured Tensor. This is not supported; please rewrite the function being transformed to explicitly accept the mutated Tensor(s) as inputs.

I am confused about this error, because if there were indeed such in-place operations (which I couldn't find in my model.forward() code), I'd expect this error to occur regardless of DDP. Given the inconsistent behaviour, can I still trust the hvp result when I wasn't using DDP?

My torch version: is 2.0.0.dev20230119+cu117

@zou3519
Copy link
Contributor

zou3519 commented Mar 14, 2023

@XuchanBao do you have a script that reproduces the problem that we could take a look at?

DistributedDataParallel does some extra things to the model, so it's likely that your hvp result is correct but the DDP extra things are interacting badly with vmap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants