Test the GPU bandwidth of collectives operators like all-reduce, all-gather, broadcast and all-to-all primitives on single-node multi-GPUs and multi-nodes multi-GPUs (16 cards) setups, using only PyTorch and Python built-in packages.
Single-node multi-GPUs:
You can use torchrun --standalone --nproc_per_node=2 test_bandwidth.py
or use the shell script run_test_bandwidth.sh
.
Multi-nodes multi-GPUs:
Run run_test_bandwidth.sh script.
For example, run command sh run_test_bandwidth.sh 2 2 0 10.20.1.81 22
on the first node, and run command sh run_test_bandwidth.sh 2 2 1 10.20.1.81 22
on the second node.
The $NNODES
represents the machine number you want to use,and $NODE_RANK
represents the gpus number you want to use per node machine.