-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removing PRRTE daemons #2128
Comments
That is correct - PRRTE currently does not support loss of a daemon. The DVM will shut down in that circumstance. There are some folks who have talked now-and-then about extending it, but there has been no progress made in that direction so far. Unsure if/when that may happen. |
Hmm... would you expect the --add-hosts option to prun with a hostfile which subtracts all the slots from one of the hosts currently in the DVM should result in termination of the daemon on the node being removed? |
Offhand, I would say "no" - I wouldn't expect the daemon to terminate. However, I've never tried something like that and honestly have no idea what the code would do in that case. Is that what you are attempting? If so, then that's a different issue. |
Hi, Yes, this is what I am attempting, i.e. voluntarily shutting down a daemon. I just had a go at testing the approach suggested by @hppritcha and I can confirm that it does not terminate the daemon. |
Shutting down a daemon is very different from setting its available slots to zero. The latter simply removes it from mapping operations. Shutting it down kills any executing procs on that node, breaks the communication tree, etc. |
I see, thanks for the clarification! |
Background information
What version of the PMIx Reference RTE (PRRTE) are you using? (e.g., v2.0, v3.0, git master @ hash, etc.)
Compiled with
--enable-prte-ft
flagWhat version of PMIx are you using? (e.g., v4.2.0, git branch name and hash, etc.)
Please describe the system on which you are running
Details of the problem
Hello,
I am wondering about to what extent it is possible to remove PRRTE daemons that have already been added to the DVM's allocation. Here's an example program ("say_hello.c") that just says hello 120 times:
Launching the DVM:
Then running it with PRRTE on host1 only:
However, now, say I want to shut down host2 (while say_hello is running). I have tried and failed to somehow manage this without shutting down the whole DVM, killing say_hello in the middle of execution (see below). So that is my question; is it possible to remove daemons from a running DVM?
These are the things I have tried.
prte_max_recon_attempts
(tried setting to -1 to try forever, still shut down immediately) andprte_allowed_exit_without_sync
(set to 1, had no effect)The text was updated successfully, but these errors were encountered: