Skip to content

availableCores(): Add support for HTCondor #50

Open
@HenrikBengtsson

Description

@HenrikBengtsson

HTCondor users, I need your help to add support for HTCondor to availableCores():

HPC schedulers such as Slurm, SGE, and Torque/PBS set environment variables that can be queried to figure out how many CPU cores the scheduler has alloted to the job. This allows the job script to to be agile to what it is allowed to run. For example, when submitting a SGE job to use four (4) cores:

$ qsub -pe smp 4 my_script.sh

the my_script.sh script knows how many cores it got by:

ncores=${NSLOTS:-1}
echo "I am allowed to use $ncores cores on this machine"

Question: How do you achieve the same on HTCondor? Does HTCondor set environment variables in a similar way, or are there other ways to query the number of cores you've been assigned?


FWIW, I tried to search the web for how to do it, but I failed to find anything useful. The closest I found is in Section 2.5.11 of https://www.mn.uio.no/ifi/tjenester/it/hjelp/beregninger/htcondor/condor-manual.pdf:

HTCondor sets several additional environment variables for each executing job that may be useful for the job to reference.

  • _CONDOR_SCRATCH_DIR gives the directory where the job may place temporary data files. This directory is unique for every job that is run, and its contents are deleted by HTCondor when the job stops running on a machine, no matter how the job completes.

  • _CONDOR_SLOT gives the name of the slot (for SMP machines), on which the job is run. On machines with only a single slot, the value of this variable will be 1, just like the SlotID attribute in the machine's ClassAd. This setting is available in all universes. See section 3.7.1 for more details about SMP machines and their configuration.

  • CONDOR_VM equivalent to _CONDOR_SLOT described above, except that it is only available in the standard universe. NOTE: As of HTCondor version 6.9.3, this environment variable is no longer used. It will only be defined if the ALLOW_VM_CRUFT configuration variable is set to True.

  • X509_USER_PROXY gives the full path to the X.509 user proxy file if one is associated with the job. Typically, a user will specify x509userproxy in the submit description file. This setting is currently available in the local, java, and vanilla universes.

  • _CONDOR_JOB_AD is the path to a file in the job's scratch directory which contains the job ad for the currently running job. The job ad is current as of the start of the job, but is not updated during the running of the job. The job may read attributes and their values out of this file as it runs, but any changes will not be acted on in any way by HTCondor. The format is the same as the output of the condor_q -l command. This environment variable may be particularly useful in a USER_JOB_WRAPPER.

  • _CONDOR_MACHINE_ADis the path to a file in the job's scratch directory which contains the machine ad for the slot the currently running job is using. The machine ad is current as of the start of the job, but is not updated during the running of the job. The format is the same as the output of the condor_status -l command.

  • _CONDOR_JOB_IWD is the path to the initial working directory the job was born with.

  • _CONDOR_WRAPPER_ERROR_FILE is only set when the administrator has installed a USER_JOB_WRAPPER. If this file exists, HTCondor assumes that the job wrapper has failed and copies the contents of the file to the StarterLog for the administrator to debug the problem.

  • CONDOR_IDS overrides the value of configuration variable CONDOR_IDS, when set in the environment.

  • CONDOR_ID is set for scheduler universe jobs to be the same as the ClusterId attribute

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions