Skip to content

Commit bea3dd2

Browse files
committed
Merge duplicated compilation instructions
1 parent abce8ed commit bea3dd2

File tree

2 files changed

+108
-165
lines changed

2 files changed

+108
-165
lines changed

Exercises_Instructions.md

Lines changed: 107 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -115,71 +115,114 @@ When we execute `module load cuda`, it will effectively modify the above environ
115115
SYCL is not part of the module system at the moment. The SYCL compilers were build for this training. We recommend that you use one of the two SYCL implementations.
116116

117117
### Intel oneAPI compilers
118+
118119
oneAPI is a collection of tool and library supporting a wide range of programming languange and parallel programming paradigms. It includes a SYCL implementation which supports all Intel devices (CPUs, FPGAs, and GPUs) and has SYCL plug-ins for targeting Nvidia and AMD GPUs.
119-
In order to use the intel SYCL compiler one has to set the environment varibles first:
120120

121-
on Mahti:
122-
```
123-
. /projappl/project_2012125/intel/oneapi/setvars.sh --include-intel-llvm
124-
module load cuda # This is needed for compiling sycl code for nvidia gpus
125-
module load openmpi/4.1.2-cuda # This is neeeded for using CUDA aware MPI
126-
```
121+
#### oneAPI on Mahti
127122

128-
on LUMI:
129-
```
130-
. /projappl/project_462000752/intel/oneapi/setvars.sh --include-intel-llvm
123+
Set up the environment:
131124

132-
module load LUMI
133-
module load partition/G
134-
module load rocm/6.0.3
135-
export MPICH_GPU_SUPPORT_ENABLED=1 # Needed for GPU aware MPI
136-
```
137-
After this one can load other modules that might be needed for compiling the codes. With the environment set-up we can compile and run the SYCL codes.
125+
source /projappl/project_2012125/intel/oneapi/setvars.sh --include-intel-llvm
126+
module load cuda/11.5.0 # Needed for compiling to NVIDIA GPUs
127+
module load openmpi/4.1.2-cuda # Needed for using GPU-aware MPI
138128

139-
On Mahti:
140-
```
141-
icpx -fuse-ld=lld -fsycl -fsycl-targets=nvptx64-nvidia-cuda,spir64_x86_64 -Xsycl-target-backend=nvptx64-nvidia-cuda --cuda-gpu-arch=sm_80 <sycl_code>.cpp
142-
```
143-
on LUMI
144-
```
145-
icpx -fsycl -fsycl-targets=amdgcn-amd-amdhsa,spir64_x86_64 -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=gfx90a <sycl_code>.cpp
146-
```
147-
Where `-fsycl` flag indicates that a sycl code is compiled and `-fsycl-targets` is used to instruct the compiler to generate optimized code for both CPU and GPU SYCL devices.
129+
Compile sycl code:
130+
131+
icpx -fuse-ld=lld -std=c++20 -O3 -fsycl -fsycl-targets=nvptx64-nvidia-cuda,spir64_x86_64 -Xsycl-target-backend=nvptx64-nvidia-cuda --cuda-gpu-arch=sm_80 <sycl_code>.cpp
132+
133+
Here `-fsycl` flag indicates that a sycl code is compiled and `-fsycl-targets` is used to instruct the compiler to generate optimized code for both CPU and GPU devices.
134+
135+
#### oneAPI on LUMI
136+
137+
Set up the environment:
138+
139+
source /projappl/project_462000752/intel/oneapi/setvars.sh --include-intel-llvm
140+
module load craype-x86-trento craype-accel-amd-gfx90a rocm/6.0.3 # Needed for compiling to AMD GPUs
141+
export MPICH_GPU_SUPPORT_ENABLED=1 # Needed for using GPU-aware MPI
142+
143+
Compile sycl code:
144+
145+
icpx -fuse-ld=lld -std=c++20 -O3 -fsycl -fsycl-targets=amdgcn-amd-amdhsa,spir64_x86_64 -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=gfx90a <sycl_code>.cpp
146+
147+
Here `-fsycl` flag indicates that a sycl code is compiled and `-fsycl-targets` is used to instruct the compiler to generate optimized code for both CPU and GPU devices.
148148

149149
### AdaptiveCpp
150+
150151
This is another SYCL implementation with support for many type of devices. No special set-up is needed, expect from loading the modules related to the backend (cuda or rocm).
151152

152-
on Mahti:
153-
```
154-
module purge
155-
module use /scratch/project_2012125/cristian/spack/share/spack/modules/linux-rhel8-x86_64_v3/
156-
module load hipsycl/24.06.0-gcc-10.4.0-4nny2ja
157-
module load gcc/10.4.0
158-
```
159-
```
160-
acpp -fuse-ld=lld -O3 -L/appl/spack/v020/install-tree/gcc-8.5.0/gcc-10.4.0-2oazqj/lib64/ --acpp-targets="omp.accelerated;cuda:sm_80" vector_add_buffer.cpp vector_add_buffer.cpp
161-
```
162-
on LUMI:
163-
```
164-
module load LUMI
165-
module load partition/G
166-
module load rocm/6.0.3
167-
export MPICH_GPU_SUPPORT_ENABLED=1
168-
#export LD_LIBRARY_PATH=/appl/lumi/SW/LUMI-22.08/G/EB/Boost/1.79.0-cpeCray-22.08/lib:$LD_LIBRARY_PATH ???
169-
#export LD_PRELOAD=/pfs/lustrep4/appl/lumi/SW/LUMI-22.08/G/EB/rocm/5.3.3/llvm/lib/libomp.so ??????
170-
```
153+
#### AdaptiveCpp on Mahti
154+
155+
Set up the environment:
156+
157+
module purge
158+
module use /scratch/project_2012125/cristian/spack/share/spack/modules/linux-rhel8-x86_64_v3/
159+
module load hipsycl/24.06.0-gcc-10.4.0-4nny2ja
160+
module load gcc/10.4.0
161+
162+
Compile sycl code:
163+
164+
acpp -fuse-ld=lld -O3 -L/appl/spack/v020/install-tree/gcc-8.5.0/gcc-10.4.0-2oazqj/lib64/ --acpp-targets="omp.accelerated;cuda:sm_80" vector_add_buffer.cpp vector_add_buffer.cpp
165+
166+
#### AdaptiveCpp on LUMI
167+
168+
Set up the environment:
169+
170+
module load LUMI/24.03
171+
module load partition/G
172+
module load rocm/6.0.3
173+
export PATH=/projappl/project_462000752/ACPP/bin/:$PATH
174+
export LD_LIBRARY_PATH=/appl/lumi/SW/LUMI-24.03/G/EB/Boost/1.83.0-cpeGNU-24.03/lib64/:$LD_LIBRARY_PATH
175+
export LD_PRELOAD=/opt/rocm-6.0.3/llvm/lib/libomp.so
176+
177+
Compile sycl code:
178+
179+
acpp -O3 --acpp-targets="omp.accelerated;hip:gfx90a" <sycl_code>.cpp
180+
181+
### NVIDIA HPC on Mahti for stdpar
182+
183+
Set up the environment:
184+
185+
ml purge
186+
ml use /appl/opt/nvhpc/modulefiles
187+
ml nvhpc/24.3
188+
ml gcc/11.2.0
189+
export PATH=/appl/spack/v017/install-tree/gcc-8.5.0/binutils-2.37-ed6z3n/bin:$PATH
190+
191+
Compile stdpar code:
192+
193+
nvc++ -O4 -std=c++20 -stdpar=gpu -gpu=cc80 --gcc-toolchain=$(dirname $(which g++)) code.cpp
194+
195+
### LUMI container with ROCm 6.2.4, hipstdpar, and AdaptiveCpp
196+
197+
Set up the environment with container:
198+
199+
export CONTAINER_EXEC="singularity exec /projappl/project_462000752/rocm_6.2.4_stdpar_acpp.sif"
200+
export HIPSTDPAR_PATH="/opt/rocm-6.2.4/include/thrust/system/hip/hipstdpar"
201+
export SINGULARITY_BIND="/pfs,/scratch,/projappl,/project,/flash,/appl"
202+
export SINGULARITYENV_LC_ALL=C
203+
export HSA_XNACK=1 # needed for stdpar
204+
205+
Compile stdpar code with hipcc:
206+
207+
$CONTAINER_EXEC hipcc -std=c++20 -O3 --hipstdpar --hipstdpar-path=$HIPSTDPAR_PATH --offload-arch=gfx90a:xnack+ code.cpp
208+
209+
Compile stdpar code with acpp:
210+
211+
$CONTAINER_EXEC acpp -std=c++20 -O3 --acpp-stdpar --acpp-targets=hip:gfx90a -ltbb code.cpp
212+
213+
Compile sycl code with acpp:
214+
215+
$CONTAINER_EXEC acpp -std=c++20 -O3 --acpp-targets=hip:gfx90a code.cpp
171216

172-
```
173-
acpp -O3 --acpp-targets="omp.accelerated;hip:gfx90a" <sycl_code>.cpp
174-
```
175217
### MPI
218+
176219
MPI (Message Passing Interface) is a standardized and portable message-passing standard designed for parallel computing architectures. It allows communication between processes running on separate nodes in a distributed memory environment. MPI plays a pivotal role in the world of High-Performance Computing (HPC), this is why is important to know we could combine SYCL and MPI.
177220

178221
The SYCL implementation do not know anything about MPI. Intel oneAPI contains mpi wrappers, however they were not configure for Mahti and LUMI. Both Mahti and LUMI provide wrappers that can compile applications which use MPI, but they can not compile SYCL codes. We can however extract the MPI related flags and add them to the SYCL compilers.
179222

180223
For exampl on Mahti in order to use CUDA-aware MPI we would first load the modules:
181224
```
182-
module load cuda
225+
module load cuda/11.5.0
183226
module load openmpi/4.1.2-cuda
184227
```
185228
The environment would be setup for compiling a CUDA code which use GPU to GPU communications. We can inspect the `mpicxx` wrapper:
@@ -190,7 +233,7 @@ $ mpicxx -showme
190233
We note that underneath `mpicxx` is calling `g++` with a lots of MPI related flags. We can obtain and use these programmatically with `mpicxx --showme:compile` and `mpicxx --showme:link`
191234
for compiling the SYCL+MPI codes:
192235
```
193-
icpx -fuse-ld=lld -fsycl -fsycl-targets=nvptx64-nvidia-cuda -Xsycl-target-backend=nvptx64-nvidia-cuda --cuda-gpu-arch=sm_80 `mpicxx --showme:compile` `mpicxx --showme:link` <sycl_mpi_code>.cpp
236+
icpx -fuse-ld=lld -std=c++20 -O3 -fsycl -fsycl-targets=nvptx64-nvidia-cuda -Xsycl-target-backend=nvptx64-nvidia-cuda --cuda-gpu-arch=sm_80 `mpicxx --showme:compile` `mpicxx --showme:link` <sycl_mpi_code>.cpp
194237
```
195238
or
196239
```
@@ -201,20 +244,18 @@ module load gcc/10.4.0
201244
acpp -fuse-ld=lld -O3 -L/appl/spack/v020/install-tree/gcc-8.5.0/gcc-10.4.0-2oazqj/lib64/ --acpp-targets="omp.accelerated;cuda:sm_80" `mpicxx --showme:compile` `mpicxx --showme:link` <sycl_mpi_code>.cpp
202245
```
203246

204-
Similarly on LUMI. First we set up the envinronment and load the modules as indicated above
205-
```
206-
. /projappl/project_462000752/intel/oneapi/setvars.sh --include-intel-llvm
207-
208-
module load LUMI
209-
module load partition/G
210-
module load rocm/6.0.3
247+
Similarly on LUMI. First we set up the environment and load the modules as indicated above
248+
```bash
249+
source /projappl/project_462000752/intel/oneapi/setvars.sh --include-intel-llvm
250+
module load craype-x86-trento craype-accel-amd-gfx90a rocm/6.0.3
211251
export MPICH_GPU_SUPPORT_ENABLED=1
212252
```
213-
Now compile with intel compilers:
214253

254+
Now compile with intel compilers:
255+
```bash
256+
icpx -fuse-ld=lld -std=c++20 -O3 -fsycl -fsycl-targets=amdgcn-amd-amdhsa,spir64_x86_64 -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=gfx90a `CC --cray-print-opts=cflags` <sycl_mpi_code>.cpp `CC --cray-print-opts=libs`
215257
```
216-
icpx -fsycl -fsycl-targets=amdgcn-amd-amdhsa,spir64_x86_64 -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=gfx90a `CC --cray-print-opts=cflags` <sycl_mpi_code>.cpp `CC --cray-print-opts=libs`
217-
```
258+
218259
Or with AdaptiveCpp:
219260
```
220261
module load LUMI/24.03
@@ -333,3 +374,10 @@ srun my_gpu_exe
333374
Similarly to Mahti, on LUMI we have 2 cpu nodes reservered for us, and as well 2 gpu nodes.
334375

335376
**NOTE** Some exercises have additional instructions of how to run!
377+
378+
#### Container
379+
380+
Running works as usual except that the code needs to be executed through the container:
381+
382+
srun -A project_462000752 -p dev-g --nodes=1 --ntasks-per-node=1 --cpus-per-task=1 --gpus-per-node=1 --time=00:15:00 $CONTAINER_EXEC ./a.out
383+

README_setup.md

Lines changed: 1 addition & 106 deletions
Original file line numberDiff line numberDiff line change
@@ -1,111 +1,6 @@
1-
# Usage
2-
3-
## C++ stdpar on Mahti
4-
5-
Set up the environment:
6-
7-
ml purge
8-
ml use /appl/opt/nvhpc/modulefiles
9-
ml nvhpc/24.3
10-
ml gcc/11.2.0
11-
export PATH=/appl/spack/v017/install-tree/gcc-8.5.0/binutils-2.37-ed6z3n/bin:$PATH
12-
13-
Compile:
14-
15-
nvc++ -O4 -std=c++20 -stdpar=gpu -gpu=cc80 --gcc-toolchain=$(dirname $(which g++)) code.cpp
16-
17-
Run on one GPU:
18-
19-
srun -A project_2012125 -p gputest --nodes=1 --ntasks-per-node=1 --cpus-per-task=1 --gres=gpu:a100:1 --time=00:15:00 ./a.out
20-
21-
## C++ stdpar on LUMI
22-
23-
Set up the environment with container:
24-
25-
export CONTAINER_EXEC="singularity exec /projappl/project_462000752/rocm_6.2.4_stdpar_acpp.sif"
26-
export HIPSTDPAR_PATH="/opt/rocm-6.2.4/include/thrust/system/hip/hipstdpar"
27-
export SINGULARITY_BIND="/pfs,/scratch,/projappl,/project,/flash,/appl"
28-
export SINGULARITYENV_LC_ALL=C
29-
export HSA_XNACK=1
30-
31-
Compile:
32-
33-
$CONTAINER_EXEC hipcc -std=c++20 -O3 --hipstdpar --hipstdpar-path=$HIPSTDPAR_PATH --offload-arch=gfx90a:xnack+ code.cpp
34-
35-
Compile using AdaptiveCpp:
36-
37-
$CONTAINER_EXEC acpp -std=c++20 -O3 --acpp-stdpar --acpp-targets=hip:gfx90a -ltbb code.cpp
38-
39-
Run on one GPU through container:
40-
41-
srun -A project_462000752 -p dev-g --nodes=1 --ntasks-per-node=1 --cpus-per-task=1 --gpus-per-node=1 --time=00:15:00 $CONTAINER_EXEC ./a.out
42-
43-
## OneAPI on Mahti
44-
45-
Set up the environment:
46-
47-
source /projappl/project_2012125/intel/oneapi/setvars.sh --include-intel-llvm
48-
ml cuda/11.5.0 openmpi/4.1.2-cuda
49-
50-
Compile:
51-
52-
icpx -fuse-ld=lld -std=c++20 -O3 -fsycl -fsycl-targets=nvptx64-nvidia-cuda,spir64_x86_64 -Xsycl-target-backend=nvptx64-nvidia-cuda --cuda-gpu-arch=sm_80 code.cpp
53-
54-
Run on one GPU:
55-
56-
srun -A project_2012125 -p gputest --nodes=1 --ntasks-per-node=1 --cpus-per-task=1 --gres=gpu:a100:1 --time=00:15:00 ./a.out
57-
58-
## OneAPI on LUMI
59-
60-
Set up the environment:
61-
62-
source /projappl/project_462000752/intel/oneapi/setvars.sh --include-intel-llvm
63-
ml craype-x86-trento craype-accel-amd-gfx90a rocm/6.0.3
64-
export MPICH_GPU_SUPPORT_ENABLED=1
65-
66-
Compile:
67-
68-
icpx -fuse-ld=lld -std=c++20 -O3 -fsycl -fsycl-targets=amdgcn-amd-amdhsa,spir64_x86_64 -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=gfx90a code.cpp
69-
70-
Run on one GPU:
71-
72-
srun -A project_462000752 -p dev-g --nodes=1 --ntasks-per-node=1 --cpus-per-task=1 --gpus-per-node=1 --time=00:15:00 ./a.out
73-
74-
## AdaptiveCpp on Mahti
75-
76-
Load the modules needed:
77-
```
78-
module purge
79-
module use /scratch/project_2012125/cristian/spack/share/spack/modules/linux-rhel8-x86_64_v3/
80-
module load hipsycl/24.06.0-gcc-10.4.0-4nny2ja
81-
```
82-
Compile for cpu anf nvidia targets:
83-
84-
```
85-
acpp -fuse-ld=lld -O3 -L/appl/spack/v020/install-tree/gcc-8.5.0/gcc-10.4.0-2oazqj/lib64/ --acpp-targets="omp.accelerated;cuda:sm_80" vector_add_buffer.cpp vector_add_buffer.cpp
86-
```
87-
## AdaptiveCpp on LUMI
88-
89-
Set up the environment:
90-
91-
module load LUMI/24.03
92-
module load partition/G
93-
module load rocm/6.0.3
94-
export PATH=/projappl/project_462000752/ACPP/bin/:$PATH
95-
export LD_LIBRARY_PATH=/appl/lumi/SW/LUMI-24.03/G/EB/Boost/1.83.0-cpeGNU-24.03/lib64/:$LD_LIBRARY_PATH
96-
export LD_PRELOAD=/opt/rocm-6.0.3/llvm/lib/libomp.so
97-
98-
Compile for amd and cpu targets:
99-
100-
acpp -O3 --acpp-targets="omp.accelerated;hip:gfx90a" <sycl_code>.cpp
101-
102-
Run as an usual gpu program:
103-
104-
srun -A project_462000752 -p dev-g --nodes=1 --ntasks-per-node=1 --cpus-per-task=1 --gpus-per-node=1 --time=00:15:00 ./a.out
105-
1061
# Installations
1072

108-
*Here are instructions how the modules used above were installed.*
3+
*Here are instructions how the compilation environments used in the course were created.*
1094

1105
## LUMI ROCm container with hipstdpar and AdaptiveCpp
1116

0 commit comments

Comments
 (0)