Suitable example for In-situ techniques and GPU Utilization #761

mingshuai-li · 2023-01-16T13:26:12Z

mingshuai-li
Jan 16, 2023

Hello, can you recommend me a suitable example for in-situ techniques?

In my specific case, I need to link Neko with ADIOS2 and Catalyst to do Synchronous and asynchronous Image Generation. I tried this on tgv case but it doesn't seem well, it's kind of small. I want to find an example on 4 GPUs with about 10 minutes and should run faster on the gpu than the CPU(which is not good in tgv).

Besides, I'm a little confused with how the Neko uses GPU, such as visible devices. If you have many nodes and many GPUs, how do you control which devices your application uses?

Thanks for your work.

Answered by njansson

Jan 16, 2023

Hi,

You can easily increase the amount of work for the tgv example by using one of the larger meshes in the example folder (mesh_file in the case file), and the length can be controlled by reducing T_end. However, when increasing the problem size, you might need to reduce dt as well to avoid divergence (see for example https://github.com/ExtremeFLOW/MSA-tests/blob/main/tgv/tgv.case). The 32768.nmsh mesh should be enough to keep two - four GPUs busy, while 262144.nmsh can be used to scale out from 8 - 16 GPUs to a couple of hundreds of devices.

For multiple GPUs per node, you need to set the visible devices (CUDA_VISIBLE_DEVICES or ROCR_VISIBLE_DEVICES) in the job script, for example, to $…

View full answer

njansson · 2023-01-16T14:54:27Z

njansson
Jan 16, 2023
Maintainer

Hi,

You can easily increase the amount of work for the tgv example by using one of the larger meshes in the example folder (mesh_file in the case file), and the length can be controlled by reducing T_end. However, when increasing the problem size, you might need to reduce dt as well to avoid divergence (see for example https://github.com/ExtremeFLOW/MSA-tests/blob/main/tgv/tgv.case). The 32768.nmsh mesh should be enough to keep two - four GPUs busy, while 262144.nmsh can be used to scale out from 8 - 16 GPUs to a couple of hundreds of devices.

For multiple GPUs per node, you need to set the visible devices (CUDA_VISIBLE_DEVICES or ROCR_VISIBLE_DEVICES) in the job script, for example, to $SLURM_LOCALID, $OMPI_COMM_WORLD_LOCAL_RANK or $MPI_LOCALRANKID. Neko assumes that each MPI rank has its own device, and that this is controlled the environment. See for example run.sh in https://github.com/ExtremeFLOW/flettner_rotor, which replaces the call to neko in a job script.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suitable example for In-situ techniques and GPU Utilization #761

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Suitable example for In-situ techniques and GPU Utilization #761

mingshuai-li Jan 16, 2023

Replies: 1 comment

njansson Jan 16, 2023 Maintainer

mingshuai-li
Jan 16, 2023

njansson
Jan 16, 2023
Maintainer