SCITAS Kuma User Guide

This guide explains how to use the SCITAS Kuma GPU cluster (Spec) with:

Python & Micromamba (faster alternative for conda)
VS code Remote Window
PyTorch + CUDA

1. Connect to the EPFL Network

Ensure you are on the EPFL network or connected via VPN to access Kuma.

2. Join the HPC-LAPD Group

Visit EPFL Groups
Search for hpc-lapd.
If you are not a member, contact the administrators.

3. SSH Access to Kuma

Open PowerShell (Windows) or Terminal (Linux/macOS).
Run: ssh <username>@kuma.hpc.epfl.ch
- Example for Leo Jih-Liang Hsieh: ssh [email protected]
Enter your password (characters will not be displayed as you type).
You are now in the Kuma frontend (kuma1 or kuma2), but GPU access is not yet available.

4. File System on Kuma

Kuma has different storage locations:

Home Directory (`/home/<username>`)

Limited to 100 GB per user.
Data is deleted after 2 years of inactivity or 6 months after leaving EPFL.

Useful commands:

pwd                  # Show current directory
cd /home/<username>  # Change to home directory (or `cd ~`)
ls                   # List file/folder with details (or `ls -l`, `ll`, `ls -la`, `ll -a`)
du -sh <file/folder> # Check disk usage

Scratch Directory (`/scratch/<username>`)

Shared 435 TB of high-speed storage.
Suitable for computation-heavy tasks.
Files older than 30 days are automatically deleted.

5. Install Micromamba & Python

Micromamba is a fast alternative to Conda. Follow these steps to install it in your scratch directory:

Install Micromamba:
```
"${SHELL}" <(curl -L micro.mamba.pm/install.sh)
```
- Binary folder: /scratch/<username>/.local/bin
- Shell initialization: y
- Configure conda-forge: y
- Installation prefix: /scratch/<username>/micromamba
Activate Micromamba:
```
source ~/.bashrc
micromamba activate
```
- You should see (base) in your terminal.

Create a Python environment:

micromamba create -n my_env python=3.13
micromamba activate my_env
which python  # Note this path for VS Code (Example: `/scratch/jlhsieh/micromamba/envs/my_env/bin/python`)

Install packages:

SCITAS Lmod tool Tool for managing scientific software.
uv Extremely fast. Leo prefer uv pip install over pip install, conda install, micromaba install.
PyTorch Use the PyTorch that match the CDUA on Kuma.

pip install uv  # an extremely fast Python package installer
module load gcc/13.2.0 cuda/12.4.1  # SCITAS Lmod tool
module list  # Check what tool have been load
nvcc --version  # Check if there is CUDA
uv pip install torch torchvision torchaudio  # Check PyTorch website to match CDUA version if needed
uv pip install ipykernel  # For Jupyter

6. Set Up VS Code and run PyTorch (no GPU yet)

Open VS Code on your local computer.
Click the blue Open a Remote Window button (bottom left corner).
Click Connect to Host > Add New SSH Host.
Enter: ssh <username>@kuma.hpc.epfl.ch.
Choose the current user SSH configuration file.
(Example: /home/leohsieh/.ssh/config or C:\Users\leohsieh\.ssh\config)
Rename the Host in SSH config file.
Connect VS code to Remote Host.

Connect.VS.code.to.Remote.Host.mp4

Open /scratch/<username> folder in VS Code.
Create and test a Python script (test.py) to check PyTorch.

    # %% test.py
    import torch
    import sys

    def check_gpu() -> None:
        print(f"Python version: {sys.version}")
        print(f"PyTorch version: {torch.__version__}")
        if torch.cuda.is_available():
            print("CUDA is available. Here are the details of the CUDA devices:")
            for i in range(torch.cuda.device_count()):
                print(f"Device {i}: {torch.cuda.get_device_name(i)}")
                a = f"  CUDA Capability: {torch.cuda.get_device_properties(i).major}.{torch.cuda.get_device_properties(i).minor},"
                b = f"  Multiprocessors: {torch.cuda.get_device_properties(i).multi_processor_count}"
                print(f"{a+b}")
                print(f"  Memory")
                print(f"    {torch.cuda.get_device_properties(i).total_memory / (1024 ** 3):.2f} GB: Total Memory")
                print(f"    {torch.cuda.memory_reserved(i) / (1024 ** 3):.2f} GB: PyTorch current Reserved Memory")
                print(f"    {torch.cuda.memory_allocated(i) / (1024 ** 3):.2f} GB: PyTorch current Allocated Memory")
                print(f"    {torch.cuda.max_memory_reserved(i) / (1024 ** 3):.2f} GB: PyTorch max ever Reserved Memory")
                print(f"    {torch.cuda.max_memory_allocated(i) / (1024 ** 3):.2f} GB: PyTorch max ever Allocated Memory")
        else:
            print("CUDA is NOT available")

    check_gpu()

Create.test.py.mp4

If your Python interpreter in Micromamba is not detected, enter the Python path manually.
Example: /scratch/jlhsieh/micromamba/envs/my_env/bin/python
Try running test.py. You should see PyTorch version: 2.6.0+cu124.

(No GPU access yet. You'll see CUDA is NOT available).

Select.Python.interpreter.mp4

7. Set Up Passwordless SSH

This is required because we need ProxyJump to GPU node later.

Open PowerShell (Windows) or Terminal (Linux/macOS).
Generate an SSH key pair in your local computer:
- Both Linux/macOS and Windows PowerShell
```
ssh-keygen -t ed25519 -f ${HOME}/.ssh/my_ssh_key
```

Copy the public key to Kuma:

Linux/macOS

ssh-copy-id -i ${HOME}/.ssh/my_ssh_key.pub <username>@kuma.hpc.epfl.ch`

Windows PowerShell

type $HOME\.ssh\my_ssh_key.pub | ssh <username>@kuma.hpc.epfl.ch "mkdir -p .ssh && tee -a .ssh/authorized_keys"

Modify the SSH config file as:

Host my-kuma-frontend
  HostName kuma.hpc.epfl.ch
  User jlhsieh
  IdentityFile ~/.ssh/my_ssh_key
  IdentitiesOnly yes

Host my-kuma-node
  HostName kh???
  User jlhsieh
  IdentityFile ~/.ssh/my_ssh_key
  IdentitiesOnly yes
  ProxyJump my-kuma-frontend

Test the connection without password:
```
ssh my-kuma-frontend
```
- If successful, no password is required.

8. Interactive GPU Access in VS Code

Use interactive sessions for testing/debugging:

Open PowerShell (Windows) or Terminal (Linux/macOS) and connect to my-kuma-frontend:

ssh my-kuma-frontend
micromamba activate my_env
module load gcc/13.2.0 cuda/12.4.1

Request an interactive GPU node:
```
Sinteract -p h100 -q debug -m 4G -g gpu:1 -t 0-00:10:00
```
- -p h100 (Use H100 GPU) or -p l40s (Use L40S GPU)
- -q debug (free but 1 hour max) or -q normal or -q long # See QOS detail and Kuma Pricing & QOS.
- -m 4G (4 GB RAM)
- -g gpu:1 (1 GPU) or -g gpu:2 (2 GPUs)
- -t 0-00:30:00 (Time duration format: D-HH:MM:SS)
- For more details: Sinteract --help or link
A Kuma GPU node will be assigned (Example: kh029).
Modify the SSH config accordingly, connect to the GPU node in VS Code, and start using the GPU.
https://github.com/user-attachments/assets/30150cce-c39e-49b3-9236-986e57b2fcc7
- The GPU node will close when the time is up or when the Terminal/PowerShell is closed.

9. Running Jobs with GPU on Kuma

For long-running jobs, submit batch scripts instead of using interactive mode. This way, you don't need to keep your Terminal/PowerShell open.

connect to the Kuma frontend node in VS Code (GPU node not needed)

Create myjob.run in /scratch/<username>:

#!/bin/bash
#SBATCH --partition h100
#SBATCH --qos debug
#SBATCH --mem 4G
#SBATCH --gpus 1
#SBATCH --time 0-00:01:00

echo "==== Start job ==================================="
module load gcc/13.2.0 cuda/12.4.1
micromamba run -n my_env python /scratch/jlhsieh/test.py
echo "sleep 10 seconds"
sleep 10
micromamba run -n my_env python /scratch/jlhsieh/test.py
echo "==== End job====================================="

For more details: link

Open a terminal in VS Code and submit the job:
```
sbatch myjob.run
```
Check job status:
```
Squeue
```
Submit.job.mp4

You can overwrite job parameters when submitting:

sbatch --partition=l40s --qos normal --mem=8G --gpus=2 --time=0-00:05:00 myjob.run  # parameters in `myjob.run` will be overwritten

Now your job runs independently, even after you disconnect from Kuma.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SCITAS Kuma User Guide

Table of Contents

1. Connect to the EPFL Network

2. Join the HPC-LAPD Group

3. SSH Access to Kuma

4. File System on Kuma

Home Directory (`/home/<username>`)

Scratch Directory (`/scratch/<username>`)

5. Install Micromamba & Python

6. Set Up VS Code and run PyTorch (no GPU yet)

7. Set Up Passwordless SSH

8. Interactive GPU Access in VS Code

9. Running Jobs with GPU on Kuma

About

Releases

Packages

EPFL-LAPD/scitas-kuma

Folders and files

Latest commit

History

Repository files navigation

SCITAS Kuma User Guide

Table of Contents

1. Connect to the EPFL Network

2. Join the HPC-LAPD Group

3. SSH Access to Kuma

4. File System on Kuma

Home Directory (/home/<username>)

Scratch Directory (/scratch/<username>)

5. Install Micromamba & Python

6. Set Up VS Code and run PyTorch (no GPU yet)

7. Set Up Passwordless SSH

8. Interactive GPU Access in VS Code

9. Running Jobs with GPU on Kuma

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Home Directory (`/home/<username>`)

Scratch Directory (`/scratch/<username>`)

Packages