-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HELP WANTED: Agility of availableCores() #17
Comments
For Slurm, I believe we should use > sbatch --cpus-per-task=1 hello.sh
> sbatch --cpus-per-task=2 hello.sh |
I just found http://slurm.schedmd.com/rosetta.pdf (added reference to top post). |
@veseshan, you've mentioned that you work with a Sun Grid Engine (SGE) cluster. Do you know which environment variable SGE sets on the computer node process indicating how many tasks/cores/processes it allotted to the job? |
On TORQUE / PBS, there's an alternative to qsub -l nodes=1:ppn=32 foo.pbs which set ```sh
qsub -l procs=32 foo.pbs which set In other words, |
Actually, it could be that my TORQUE / Moab test system is faulty when it comes to interpreting My own tests give: $ echo "export | grep PBS_" | qsub -l nodes=1:ppn=4 -
=> 700801: PBS_NP=4, PBS_NUM_NODES=1, PBS_NUM_PPN=4
$ echo "export | grep PBS_" | qsub -l nodes=2:ppn=4 -
=> 700802: PBS_NP=8, PBS_NUM_NODES=2, PBS_NUM_PPN=4
$ echo "export | grep PBS_" | qsub -l procs=8 -
=> 700803: PBS_NP=8, PBS_NUM_NODES=1, PBS_NUM_PPN=1
$ echo "export | grep PBS_" | qsub -l nodes=2 -l procs=3 -
=> 700804: PBS_NP=3, PBS_NUM_NODES=2, PBS_NUM_PPN=1
$ echo "export | grep PBS_" | qsub -l procs=3 -l nodes=2 -
=> 700806: PBS_NP=3, PBS_NUM_NODES=2, PBS_NUM_PPN=1 BTW, $ qstat -n -1 -u $USER
ob ID Job name PID NDS TSK RAM Time S Since Nodes/cores
------- ---------------- ------ ----- ------ ------ --------- - --------- -----------
700801 STDIN -- 1 4 -- 99:23:59 Q -- --
700802 STDIN -- 2 8 -- 99:23:59 Q -- --
700803 STDIN -- 1 1 -- 99:23:59 Q -- --
700804 STDIN -- 2 2 -- 99:23:59 Q -- --
700806 STDIN -- 2 2 -- 99:23:59 Q -- -- See also torqueusers thread 'only one processor is used when using qsub -l procs flag', Jan 2012, http://www.supercluster.org/pipermail/torqueusers/2012-January/013959.html. In that thread it's suggested that it could be a configuration issue in Maui (open-source replacement for Moab) or a bug in Maui. From this I conclude that it's best to ignore |
I think new Versions of PBSPro are not supported correctly. |
Thanks @Phhere. I don't have access to PBSPro myself. Do you have access to a PBSPro environment? If so, would you mind submitting a simple job that calls |
Hello, no Problem
This was a job with 1 Chunk of 10 Cores. So you could use $NCPUS or $OMP_NUM_THREADS Do you support submitting jobs to multiple servers? Because PBSPro allow to request multiple servers / chunks within one job and than you can use ssh / pdsh or something else to distribute your job |
Thanks. Yes, |
…ecific to PBSPro [#22] [ci skip]
@Phhere, I've updated the develop branch to have remotes::install_github("HenrikBengtsson/future@develop") and see if Also, if you could play with various multinode requests (e.g. echo 'echo "NCPUS=$NCPUS"' | qsub -l nodes=1:ppn=2 ## => NCPUS=2
echo 'echo "NCPUS=$NCPUS"' | qsub -l nodes=2:ppn=2 ## => NCPUS=2
echo 'echo "NCPUS=$NCPUS"' | qsub -l nodes=3:ppn=1 ## => NCPUS=1 |
On Linux, you may want to consult a process's CPU affinity mask. This would cover cases where a process can only use a subset of the system's available cores, with e.g. The mask is available from count_allowed_cpus <- function() {
## read affinity mask for current process
self_stat <- readLines("/proc/self/status")
mask_line <- grep("^Cpus_allowed:", self_stat, value=TRUE)
mask_raw <- gsub("^Cpus_allowed:[[:space:]]+([0-9a-f,]+)$", "\\1", mask_line)
## the mask may be split with commas
mask_clean <- gsub(',', '', mask_raw)
## the mask contains a binary 1 for each CPU we're
## allowed to use. we can get a total CPU count by
## summing the binary digits.
mask_int <- strtoi(mask_clean, base=16)
sum(as.integer(intToBits(mask_int)))
} On my four-core system, this gets the correct count:
On the systems I've been able to test, an unbound process's mask has only as many ones as there are cores on the machine. It will return nonsense on other systems where the default mask is a wordful of ones, e.g. |
@leitec, thanks again for these pointers. I'll see if I can incorporate it. I might start by adding it as an internal function for people to try out until everything has been figured out. Do you have any official references/manual where the Having said this, this seems like something that EDIT: Here's a tweaked version: #' @importFrom utils file_test
countAllowedCPUs <- function() {
pathname <- "/proc/self/status"
if (!file_test("-f", pathname)) return(NA_integer_)
## Read affinity mask for the current process
self_stat <- readLines(pathname, warn = FALSE)
## Identify the affinity-mask entry
pattern <- "^Cpus_allowed:[[:space:]]+([0-9a-f,]+)$"
mask_line <- grep(pattern, self_stat, value = TRUE)
if (length(mask_line) == 0L) return(NA_integer_)
if (length(mask_line) > 1L) {
warning("Detected more than one 'Cpus_allowed' entry in ", sQuote(pathname), ", but will only use the first one: ", paste(sQuote(mask_line), collapse = "; "))
mask_line <- mask_line[1L]
}
## Extract the affinity mask values
mask_raw <- gsub(pattern, "\\1", mask_line)
## The mask may be separated by commas
mask_clean <- gsub(",", "", mask_raw, fixed = TRUE)
## Each CPU available corresponds to binary '1' in the mask
mask_int <- strtoi(mask_clean, base = 16L)
mask_bits <- intToBits(mask_int)
sum(mask_bits == as.raw(1L))
} |
Thanks for the tweaked version. My R is quite poor. I can't find an exact reference on the mask. Perhaps this is a sign that it's not a good idea to use it. I did eventually find a system that I can use with all I'm looking at another approach that would be appropriate either for inclusion in the |
You're R code looked just fine to me - it's just me adding a few, sometimes, overly conservative, tweaks. Also, it also helps me working through someone else's code. Just a wild guess, but it could be that a mask with all
If the mask is all R code example with a four (4) core machine: n <- 4L
mask_all <- rep(1L, times=n)
int_mask_all <- sum(2^(seq_along(mask_all)-1) * mask_all)
print(int_mask_all)
## [1] 15 With mask <- c(1,0,1,1)
int_mask <- sum(2^(seq_along(mask)-1) * mask)
print(int_mask)
## [1] 13
int_mask_avail <- bitwAnd(int_mask_all, int_mask)
print(int_mask_avail)
## [1] 13
mask_avail <- intToBits(int_mask_avail)
sum(mask_avail == as.raw(1L))
## [1] 3 and with all mask <- rep(1, times = 16)
int_mask <- sum(2^(seq_along(mask)-1) * mask)
print(int_mask)
## [1] 65535
int_mask_avail <- bitwAnd(int_mask_all, int_mask)
print(int_mask_avail)
## [1] 15
mask_avail <- intToBits(int_mask_avail)
sum(mask_avail == as.raw(1L))
## [1] 4 Again, just a wild guess. |
Yeah, that's probably the way to do it. I believe your interpretation of "all However, while looking into this, I came across the I made a trivial R wrapper around the nproc module and it works fine. I just need to clean it up, once I figure out how to do Gnulib's autoconf/automake stuff properly. I'll follow your advice and solicit discussion on R-devel. A cross-platform solution might be more palatable for the developers. If not, I can turn this into a generic package that just does this one thing, and perhaps that could then include the various job schedulers and other systems that don't necessarily use cpusets or set affinity for their processes. |
parallelly::availableCores()
returns the number of cores available for multicore processing. R itself providesparallel::detectCores()
for this. There is also themc.cores
option (default to environment variableMC_CORES
) set whenparallel
is loaded. Beyond this, various systems/setups set specific environment variables to reflect the number of available/allocated cores. For instance, resource manager PBS, sets environment variablePBS_NUM_PPN
on the compute node specifying the number of allotted cores.Currently,
availableCores()
defaults to return the first valid value of (in order):PBS_NUM_PPN
mc.cores
(andMC_CORES
)parallel::detectCores()
I would like to add support for more resource/workload managers and other distributed processing environments. For instance,
PBS_NUM_NODES
,PBS_NUM_PPN
,(see comment below)PBS_NP
NCPUS
SLURM_CPUS_PER_TASK
, e.g.--cpus-per-task=3
(or short-N 3
)SLURM_CPUS_ON_NODE
, e.g.--ntasks=3
(or short-n 3
): only truthworty when--nodes=1
--nodes=2 --ntasks=3
. Identify the number of cores to run on the current node.NSLOTS
(?), cf. http://cc.in2p3.fr/docenligne/969LSB_DJOB_NUMPROC
- "The number of processors (slots) allocated to the job." (https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_config_ref/lsf_envars_ref.html), cf. PR availableCores: add LSF/OpenLava future#360LSB_MAX_NUM_PROCESSORS
(?) - "The maximum number of processors requested when the job is submitted. [...]Set during job execution based on bsub options. For example, for a job submitted with -n 2,4, the maximum number of processors requested is 4." (https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_config_ref/lsf_envars_ref.html). This setting is recommended by https://grid.rcs.hbs.org/parallel-r
PJM_VNODE_CORE
PJM_PROC_BY_NODE
(not always)What else am I missing?
./.clustersize
file that specifies number of workers?I appreciate any feedback on what environment variables or commands that are available to a compute node to query the number of allotted cores, iff at all. Please try to provide links to documentations if you can.
References
The text was updated successfully, but these errors were encountered: