-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HELP WANTED: availableWorkers() #16
Labels
Comments
HenrikBengtsson
referenced
this issue
in futureverse/future
Jan 5, 2017
HenrikBengtsson
referenced
this issue
in futureverse/future
Jan 6, 2017
HenrikBengtsson
referenced
this issue
in futureverse/future
Jan 6, 2017
…K_LIMIT_CORES_' correctly [#118]
HenrikBengtsson
referenced
this issue
in futureverse/future
Jan 6, 2017
Add validation of |
HenrikBengtsson
referenced
this issue
in futureverse/future
Jan 7, 2017
…nd PBS_NUM_NODES * PBS_NUM_PPN. If inconsistent, a warning is generated. [#118]
HenrikBengtsson
referenced
this issue
in futureverse/future
Sep 18, 2020
HenrikBengtsson
added
enhancement
New feature or request
help wanted
Extra attention is needed
labels
Oct 20, 2020
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Background
When submitting a job to the TORQUE / PBS using something like:
the scheduler will allocate 3 nodes with 2 cores each (= 6 cores total) for
myjob.sh
when launched. Exactly which 3 nodes is only known tomyjob.sh
at run time. This information is available in a file$PBS_NODEFILE
written by TORQUE / PBS, e.g.$ cat $PBS_NODEFILE n1 n1 n8 n8 n9 n9
Other HPC job schedulers use other files / environment variables for this.
Actions
Add an
availableNodes()
file that searches for common environment variables and returns a vector of node names, e.g.If no known environment variables are found, the default fallback could be to return
rep("localhost", times = availableCores()
.The above would allow us to make
workers = availableNodes()
the new default forcluster
futures (currentlyworkers = availableCores()
).Identify these settings for the following schedulers:
PBS_NODEFILE
(the name of a file containing one node per line where each node is repeated "ppn" times).PE_HOSTFILE
(a file, format unclear), cf. https://www.ace-net.ca/wiki/Sun_Grid_EngineSLURM_JOB_NODELIST
(list of nodes in a compressed format, e.g. instead of "tux1,tux3,tux4" it is stored as "tux[1,3-4]". Note that multiple "compressions" may exist, e.g. "compute-[0-6]-[0-15]". The number of nodes is can be verified bySLURM_JOB_NUM_NODES
. The "ppn" information is in stored inSLURM_TASKS_PER_NODE
).LSB_HOSTS
PJM_O_NODEINF
- "Path of the allocated node list file. For a job to which virtual nodes are allocated, the IP addresses of the nodes where the virtual nodes are placed are written one per line."The text was updated successfully, but these errors were encountered: