-
Notifications
You must be signed in to change notification settings - Fork 416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Memory leak with minimal example #2728
Comments
Hi @AdrianSosic. Thanks for sharing the simple repro. This does reproduce for me on both 0.12.0 and 0.13.0. The memory usage climbs up a few GP per replication until it is killed. I'll investigate |
I think I've identified the part that causes the memory leak but I don't yet know why. Reduced the repro all the way to evaluations of
If we replace |
That can be simplified further. No need for the acqf.
|
And we can reproduce directly with a single gpytorch context manager:
|
Interesting, thanks for sharing. So it seems that some of the backprop graphs keep lying around, right? I mean – if I understand correctly what |
That's my guess as well. The context manager seems to control |
I've isolated the issue to a specific |
What happened?
Hi @esantorella & @saitcakmak 👋🏼 After long time, I've finally had a moment to get back to #641 because I now have an actual minimal reproducing example.
For me, the process gets consistently killed after ~20 iterations. Until that point, it keeps allocating memory/swap and eventually crashes. Could you perhaps confirm if this is also the case for you?
Haven't yet checked if the
gc.get_objects
method suggest by @esantorella here to verify if it's an actual leak or just over-allocation. But in any case, a crash is unexpected since the code obviously should not allocate any long-term resources for the independent optimizations happening in the loop.Please provide a minimal, reproducible example of the unexpected behavior.
Adapted from the landing page code:
Please paste any relevant traceback/logs produced by the example provided.
BoTorch Version
0.12.0
Python Version
3.10
Operating System
macOS
Code of Conduct
The text was updated successfully, but these errors were encountered: