Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model.to_gpu is not usable #713

Open
frobnitzem opened this issue Jul 1, 2022 · 3 comments
Open

Model.to_gpu is not usable #713

frobnitzem opened this issue Jul 1, 2022 · 3 comments
Labels
feat / ops Backends and maths

Comments

@frobnitzem
Copy link
Contributor

I am attempting to assign individual layers to separate GPUs in order to conserve memory. However, the Model.to_gpu function takes an all or nothing approach which prevents this from working.

While diagnosing the origin of memory access error during training, (cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered), I noticed that CupyOps.device_id is never used
or set.

Ideally, all the CupyOps would run inside a cp.cuda.Device(device_id) context, but that is not the case. Instead, the xp attribute is (ab)used in many places. That will try and run everything through GPU 0, so errors won't appear until something was moved to another GPU.

Two other difficulties are the initialization step, which doesn't declare memory in the right places,
and the finish_update step, where the optimizer does arithmetic on parameters outside of a context.

@frobnitzem
Copy link
Contributor Author

Update: the culprit in my case is:

https://github.com/explosion/thinc/blob/c7b0d6759645babe94315a36c84d56ec877252f2/thinc/backends/cupy_ops.py#L67:L70

which should always call cupy.asarray because that function copies between devices, but only when it has to.

@danieldk
Copy link
Contributor

danieldk commented Jul 7, 2022

Thanks for reporting this issue! We currently only support using a single GPU with require_gpu(gpu_id=N), but multi-GPU support is on our todo list. Of course PRs to improve multi-GPU support are welcome!

@frobnitzem frobnitzem mentioned this issue Jul 14, 2022
@danieldk
Copy link
Contributor

danieldk commented Jul 15, 2022

Edit: sorry, posted a reply in the wrong issue.

@shadeMe shadeMe added the feat / ops Backends and maths label Jul 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / ops Backends and maths
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants