-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Bladebit v3.1.0 multi-gpu error during plotcheck #16677
Labels
bug
Something isn't working
Comments
we only run one processor and the gpu code is multithreaded (not python multiprocessing) so there would be only one pid. harold will look at the other issue. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
What happened?
When running two BB plotters on one machine with two GPUs, the one on device1 gives an intermittent error when performing the plotcheck at the end. Doesn't happen with every plot. When it does happen, I see device1's PID listed as a duplicate on device0 with a smaller memory footprint. Seems there might not be complete isolation between the GPUs for all aspects of the plotter when using more than one device.
--check 50 --check-threshold 0.8
It looks like the plotter for device 1 is using device 0 when it does the check portion.
Dev0 memory is 8gb, dev1 memory is 12gb, system memory is 512gb, Ubuntu 22.04 kernel 6.2.0-34
Version
chia version 2.1.1, bladebit cuda version 3.1.0, also tested new build on develop branch - same result.
What platform are you using?
Linux
What ui mode are you using?
CLI
Relevant log output
No response
The text was updated successfully, but these errors were encountered: