-
Notifications
You must be signed in to change notification settings - Fork 0
How to work with Bridges (PSC)
The advantage of using DBMI cluster is that we won't be charged but it is much smaller in terms of number of nodes available (not computing power). Please use DBMI for development purposes.
Project id: bi561ip
This is how you can get an interactive session on a CPU node:
interact -A bi561ip -p DBMI -t 08:00:00 --ntasks-per-node=28 -N 1
-
-A
: project id -
-p
: partition name -
-t
: time (8 hours max limit) -
-N
: number of nodes -
--ntasks-per-node
: Number of cores to allocate per node [set this value high if you are running into memory issues]
One DBMI CPU node have 128 GB memory/node, 28 cores/node, 8TB on-node storage.
In DBMI partition we have 8 such nodes.
This is how you can get an interactive GPU session with DBMI cluster:
interact -A bi561ip -p DBMI-GPU --gres=gpu:p100:1 -N 1 -t 08:00:00
Note that the argument for the -p
has changed.
-
--gres=gpu:p100:1
[the gpu: type of gpu: number of gpus]: Here, it requests p100 GPU node with 1 gpu.
One p100 GPU node have 2 gpus each with 16 GB GPU-memory and 32 cores.
In DBMI-GPU partition we have 2 such nodes (dgpu001 and dgpu002)
This is how you can get an interactive GPU session with DBMI cluster:
interact -A bi561ip -p DBMI-GPU --gres=gpu:titan-xp:2 -N 1 -t 08:00:00
-
--gres=gpu:titan-xp:2
: requests the use of titan-xp GPU node with 2 gpu's.
One titan-xp GPU node have 8 gpus each with 12 GB GPU-memory.
In DBMI-GPU partition we have 2 such nodes (dgpu003 and dgpu004)
Project id: ac5616p (general XSEDE grant)
Here is how you can get allocation with GPUs:
interact -A ac5616p -p GPU --gres=gpu:p100:2 -N 1 -t 08:00:00
- The number after
-A
is our grant number. It may change from year to year. - The
-p
specifies the partition; for example, if you want to get a session from shared GPU's you can change it to-p GPU-shared
and similarly for small-gpus-p GPU-small
- There are 2 kinds of gpus in this partition
- p100: [
--gres=gpu:p100:2
] maximum 2 gpus per node - k80: [
--gres=gpu:k80:4
] maximum 4 gpus per node
Use volta16 if you want to use jupyter notebook, for example:
interact -A ac5616p -p GPU-AI --gres=gpu:volta16:1 -N 1 -t 08:00:00
Otherwise, you can use volta32 if needed. (Reference:https://www.psc.edu/bridges-ai-early-users)
- There are 2 kinds of gpus in this partition
- volta32: [
--gres=volta32:2
] maximum 16 gpus per node, with 32 GB memory/gpu. There is 1 such node in partition GPU-AI - volta16: [
--gres=gpu:volta16:4
] maximum 8 gpus per node, with 16 GB memory/gpu. There are 8 such nodes in partition GPU-AI.
interact -A bi561ip -p DBMI-GPU --gres=gpu:p100:1
export JUPYTER_RUNTIME_DIR=~/.jupyter
jupyter notebook
#SBATCH -p DBMI-GPU
#SBATCH -N 1
#SBATCH -A bi561ip
#SBATCH --ntasks-per-node 28
#SBATCH -t 48:00:00 # HH:MM:SS
#SBATCH --gres=gpu:p100:2
#SBATCH --time-min=00:30:00
#SBATCH --mail-type=ALL
#SBATCH --mail-user=singla
For large jobs and to escape the high wait-time in queues, PSC allows us to reserve nodes. By reserving, those nodes will be explicitly allocated to the user for a designated amount of time.
To request a reservation, send an email to [email protected] with the below details
- Name:
- Email:
- Account id: Example ac5616p
- Why do you need this reservation:
- RM nodes (128 GB):
- GPU (K80):
- GPU (P100):
- GPU-AI (volta16):
- GPU-AI (volta32):
- LSM (3TB):
- ESM (12TB):
- Start date:
- Start time:
- End date:
- End time:
To use the reservation in interactive mode
interact -p GPU-AI -t 8:00:00 --egress --gres=gpu:volta16:1 -A ac5616p -R reservation_name
To use the reservation in batch mode
--res reservation_name