Skip to content

How to work with Bridges (PSC)

Sumedha Singla edited this page Nov 24, 2020 · 27 revisions

The advantage of using DBMI cluster is that we won't be charged but it is much smaller in terms of number of nodes available (not computing power). Please use DBMI for development purposes.

DBMI Cluster

Project id: bi561ip

CPU Node

This is how you can get an interactive session on a CPU node:

interact -A bi561ip -p DBMI  -t 08:00:00 --ntasks-per-node=28 -N 1
  • -A: project id
  • -p: partition name
  • -t: time (8 hours max limit)
  • -N: number of nodes
  • --ntasks-per-node: Number of cores to allocate per node [set this value high if you are running into memory issues]

One DBMI CPU node have 128 GB memory/node, 28 cores/node, 8TB on-node storage.

In DBMI partition we have 8 such nodes.

p100 GPU Node

This is how you can get an interactive GPU session with DBMI cluster:

interact -A bi561ip -p DBMI-GPU --gres=gpu:p100:1 -N 1 -t 08:00:00 

Note that the argument for the -p has changed.

  • --gres=gpu:p100:1 [the gpu: type of gpu: number of gpus]: Here, it requests p100 GPU node with 1 gpu.

One p100 GPU node have 2 gpus each with 16 GB GPU-memory and 32 cores.

In DBMI-GPU partition we have 2 such nodes (dgpu001 and dgpu002)

titan-xp GPU node

This is how you can get an interactive GPU session with DBMI cluster:

interact -A bi561ip -p DBMI-GPU --gres=gpu:titan-xp:2 -N 1 -t 08:00:00 
  • --gres=gpu:titan-xp:2: requests the use of titan-xp GPU node with 2 gpu's.

One titan-xp GPU node have 8 gpus each with 12 GB GPU-memory.

In DBMI-GPU partition we have 2 such nodes (dgpu003 and dgpu004)

Bridge cluster

Project id: ac5616p (general XSEDE grant)

GPU Node

Here is how you can get allocation with GPUs:

interact -A ac5616p -p GPU --gres=gpu:p100:2 -N 1 -t 08:00:00 
  • The number after -A is our grant number. It may change from year to year.
  • The -p specifies the partition; for example, if you want to get a session from shared GPU's you can change it to -p GPU-shared and similarly for small-gpus -p GPU-small
  • There are 2 kinds of gpus in this partition
  1. p100: [--gres=gpu:p100:2] maximum 2 gpus per node
  2. k80: [--gres=gpu:k80:4] maximum 4 gpus per node

GPU-AI Node

Use volta16 if you want to use jupyter notebook, for example:

interact -A ac5616p -p GPU-AI --gres=gpu:volta16:1 -N 1 -t 08:00:00

Otherwise, you can use volta32 if needed. (Reference:https://www.psc.edu/bridges-ai-early-users)

  • There are 2 kinds of gpus in this partition
  1. volta32: [--gres=volta32:2] maximum 16 gpus per node, with 32 GB memory/gpu. There is 1 such node in partition GPU-AI
  2. volta16: [--gres=gpu:volta16:4] maximum 8 gpus per node, with 16 GB memory/gpu. There are 8 such nodes in partition GPU-AI.

Run jupyter notebook on an interactive node

login the an interactive GPU node

interact -A bi561ip -p DBMI-GPU --gres=gpu:p100:1

set runtime directory, do not forget, otherwise you will get a runtime error!!!

export JUPYTER_RUNTIME_DIR=~/.jupyter

start jupyter notebook

jupyter notebook

Setting up sbatch file

#SBATCH -p DBMI-GPU

#SBATCH -N 1

#SBATCH -A bi561ip

#SBATCH --ntasks-per-node 28

#SBATCH -t 48:00:00 # HH:MM:SS

#SBATCH --gres=gpu:p100:2

#SBATCH --time-min=00:30:00

#SBATCH --mail-type=ALL

#SBATCH --mail-user=singla

Bridge Reservation

For large jobs and to escape the high wait-time in queues, PSC allows us to reserve nodes. By reserving, those nodes will be explicitly allocated to the user for a designated amount of time.

To request a reservation, send an email to [email protected] with the below details

  • Name:
  • Email:
  • Account id: Example ac5616p
  • Why do you need this reservation:
  • RM nodes (128 GB):
  • GPU (K80):
  • GPU (P100):
  • GPU-AI (volta16):
  • GPU-AI (volta32):
  • LSM (3TB):
  • ESM (12TB):
  • Start date:
  • Start time:
  • End date:
  • End time:

To use the reservation in interactive mode

interact -p GPU-AI  -t 8:00:00 --egress --gres=gpu:volta16:1 -A ac5616p -R reservation_name

To use the reservation in batch mode

--res reservation_name