-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with running sanity check for OpenMPI #3541
Comments
I tried doing a re-install and the above-mentioned issue popped up again.
|
I had to disable the sanity-check in order to have the installation complete without errors. Then when I just try to run the sanity-check with
From stdout:
From end of log:
Indeed, I can't find the files |
You should probably review the build logs, skipping the sanity check is usually not a good idea. You never included the error from your original comment, but it would be good to know what that was. |
Looking back at the Slack thread, I see it was related to the "Hello world" MPI program hanging. That has been known to happen when OpenMPI uses UCX or libfabric. Do you have infiniband in the system where you are doing the builds? I would make sure that you can successfully compile and run an MPI code with the module. You may need to tune the OpenMPI a little, for example in the test cluster for easyconfig PRs we set
due to hangs similar to this (and there is no infiniband). |
Hi @ocaisa. Thanks for your input. I just managed to get the eb-installation to complete without errors (and without skipping the sanity check). I had to add the following
to the recipe |
Hmm, that should already be covered by
Was the PRRTE dependency included in your easyconfig recipe? |
I tried doing a little test of
And I load
By the way, I previously had another error message which I got rid of by introducing
Anyways, I can get rid of the above mentioned error by setting |
yes, in
|
Strange, the exact option you added should have been there already from what I see in the easyblock. Can you compare the configure command with and without the new |
Let's see if I understood that correctly. I did:
|
In those files it should say the configure command used, and indeed they seem to be identical...so I am not sure what changed |
Strange indeed. You reckon that |
I would say yes, if OpenMPI is passing it's sanity check, then things are fine, just not sure what triggered the difference (perhaps the memlock limits?). If you have a fast interconnect you can install the OSU benchmarks and check ping-pong. |
EasyBuild couldn't run the sanity check for
OpenMPI-5.0.3-GCC-13.3.0.eb
I ran EasyBuild 4.9.4 (framework: 4.9.4, easyblocks: 4.9.4) on Rocky Linux v9.4 with Python v3.9.18.
I did manage to run
eb OpenMPI-5.0.3-GCC-13.3.0.eb--robot --skip-sanity-check
, and then afterwards I raneb OpenMPI-5.0.3-GCC-13.3.0.eb --robot --sanity-check-only
, which gave me the following error msg:The problem seems to be that
self.cfg['parallel']
in line 222 evaluates toNone
. I tried to add--parallel
in the eb-command, that is,eb OpenMPI-5.0.3-GCC-13.3.0.eb --sanity-check-only --parallel=10
but that didn't help.Hence I chaned line 222 in
openmpi.py
toThat got the sanity check running a bit further:
I hacked my way around that by changing the code around line 234 in
openmpi.py
intoand then
OpenMPI-5.0.3-GCC-13.3.0.eb
passed the sanity check.Details also described on EasyBuild Slack.
Thanks to @sassy-crick for helping out with the debug.
I'm very new to EasyBuild but I'd be happy to try and make a PR with the changes as listed above - if you agree that there is an issue with
openmpi.py
.The text was updated successfully, but these errors were encountered: