You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have successfully compiled and linked a program with IntelMPI and if I run it interactively or in background it runs very fast and without any problems on our new server (ProLiant DL580 Gen10, 1 node with 4 processors with 18 cores each, total 72 cores, hyperthreading disabled). If I try to submit it by Torque (version 4) strange things happen, for example:
if I submit 2 jobs asking each 8 cores they are both fine
if I submit a third job (8 cores) it is 4 times slower becasue the 8 process runs on two cores!
if I submit a fourth job it runs properly, but if I qdel all the four jobs, all of them disappear from qstat -a but the fourth is keeping running!
I have the feeling it is an integration problem between intelmpi and torque, so I did the following:
I have checked and PBS_ENVIRONMENT is properly set to PBS_BATCH
Also torque configuration is apparently correct, the file
/var/lib/torque/server_priv/nodes contains the following line:
dscfbeta1.units.it np=72 num_node_boards=1
This is a severe problem for me, since the machine is shared so we do need a scheduler like torque (pbs) to run jobs compiled and linked to intelmpi. Any help suggestion is welcome!
thank you in advance
Mauro
The text was updated successfully, but these errors were encountered:
Hi!
I have successfully compiled and linked a program with IntelMPI and if I run it interactively or in background it runs very fast and without any problems on our new server (ProLiant DL580 Gen10, 1 node with 4 processors with 18 cores each, total 72 cores, hyperthreading disabled). If I try to submit it by Torque (version 4) strange things happen, for example:
if I submit 2 jobs asking each 8 cores they are both fine
if I submit a third job (8 cores) it is 4 times slower becasue the 8 process runs on two cores!
if I submit a fourth job it runs properly, but if I qdel all the four jobs, all of them disappear from qstat -a but the fourth is keeping running!
I have the feeling it is an integration problem between intelmpi and torque, so I did the following:
export I_MPI_PIN=off
export I_MPI_PIN_DOMAIN=socket
to run the program I did the following call of mpirun:
/opt/intel/compilers_and_libraries_2019.1.144/linux/mpi/intel64/bin/mpirun -d -rmk pbs -bootstrap pbsdsh .................
I have checked and PBS_ENVIRONMENT is properly set to PBS_BATCH
Also torque configuration is apparently correct, the file
/var/lib/torque/server_priv/nodes contains the following line:
dscfbeta1.units.it np=72 num_node_boards=1
This is a severe problem for me, since the machine is shared so we do need a scheduler like torque (pbs) to run jobs compiled and linked to intelmpi. Any help suggestion is welcome!
thank you in advance
Mauro
The text was updated successfully, but these errors were encountered: