Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No GPU activity when the job is running #432

Open
patrick-douglas opened this issue Sep 7, 2017 · 0 comments
Open

No GPU activity when the job is running #432

patrick-douglas opened this issue Sep 7, 2017 · 0 comments

Comments

@patrick-douglas
Copy link

Firstly I'm not using cgroups(This is required?)
My issue is:
I'm using the Torque version 5.1.3 (because version greater them failed to install in Linux Mint)
Configured with the following commands
#me@root:

cd torque-5.1.3-1462984387_205d70d
./configure --with-debug --enable-nvidia-gpus --with-sendmail
make
make install

cp contrib/init.d/debian.pbs_server /etc/init.d/pbs_server
cp contrib/init.d/debian.pbs_sched /etc/init.d/pbs_sched
cp contrib/init.d/debian.trqauthd /etc/init.d/trqauthd

sysv-rc-conf pbs_server on
sysv-rc-conf trqauthd on
sysv-rc-conf pbs_sched on

echo '/usr/local/lib'>/etc/ld.so.conf.d/torque.conf
ldconfig

service trqauthd restart
echo '/usr/local/lib'>/etc/ld.so.conf.d/torque.conf
echo "master.lbn.com">/var/spool/torque/server_name
./torque.setup root
echo "node01.lbn.com np=12 gpus=1" > /var/spool/torque/server_priv/nodes

service trqauthd restart
service pbs_server restart
service pbs_sched start

qmgr -c 'set server auto_node_np = True'

make packages
#Then I do ssh node01.lbn.com and run the following:
#root@node02
apt-get update
apt-get install g++ libssl-dev libxml2-dev sysv-rc-conf libboost-all-dev -y
cd torque-5.1.3-1462984387_205d70d
./configure --with-debug --enable-nvidia-gpus
make -j 2
make install -j 2

./torque-package-clients-linux-x86_64.sh --install
./torque-package-devel-linux-x86_64.sh --install
./torque-package-doc-linux-x86_64.sh --install
./torque-package-mom-linux-x86_64.sh --install
./torque-package-server-linux-x86_64.sh --install

echo '/usr/local/lib'>/etc/ld.so.conf.d/torque.conf
ldconfig

cp contrib/init.d/debian.pbs_mom /etc/init.d/pbs_mom
cp contrib/init.d/debian.trqauthd /etc/init.d/trqauthd

sysv-rc-conf trqauthd on
sysv-rc-conf pbs_mom on

service trqauthd restart

echo '$pbsserver master'>/var/spool/torque/mom_priv/config
echo '$logevent 225'>>/var/spool/torque/mom_priv/config
echo '$usercp *:/home /home'>>/var/spool/torque/mom_priv/config

service pbs_mom start

After run this I run "pbsnodes" command and node01 is ok, I can see all GPU info, however when a submit a job the nvidia-smi change the status of GPU to Exclusive-process and the GPU activity stay 0%, but the task still running (when I run "top")
My GPU is Nvidia-Tesla k40c but I already tested with GeforceGT 430 and no success
NOTE: I'm runing CUDA 8.0 and the latest NVIDIA Drivers
Please help-me!
Thank you in advance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant