Skip to content

Commit ac659d5

Browse files
toceanwkcn
andauthored
[Bug Fixed] Install nccl to system path (#57)
**Description** Closes #53 . If we use "make install" to install nccl, it will install nccl to /usr/local/lib. And if we want pytorch or extension link to the nccl we build, we need to change LD_LIBRARY_PATH and persist it. Another solution is we just install nccl to the system path and suggest user to use docker. **Major Revision** - Install nccl to system path. - Recommend to use PyTorch Container. --------- Co-authored-by: JackieWu <[email protected]>
1 parent 14fe8e2 commit ac659d5

File tree

1 file changed

+13
-9
lines changed

1 file changed

+13
-9
lines changed

README.md

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,13 @@ Features:
2222
- CUDA version 11 or later (which can be checked by running `nvcc --version`).
2323
- PyTorch version 1.13 or later (which can be checked by running `python -c "import torch; print(torch.__version__)"`).
2424

25+
We strongly recommend using [PyTorch NGC Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch). For example, to start PyTorch 1.13 container, run the following command:
26+
27+
```
28+
sudo docker run -it -d --name=msamp --privileged --net=host --ipc=host --gpus=all nvcr.io/nvidia/pytorch:22.09-py3 bash
29+
sudo docker exec -it msamp bash
30+
```
31+
2532
### Install MS-AMP
2633

2734
You can clone the source from GitHub.
@@ -44,13 +51,18 @@ make -j src.build NVCC_GENCODE="-gencode=arch=compute_80,code=sm_80"
4451
# H100
4552
make -j src.build NVCC_GENCODE="-gencode=arch=compute_90,code=sm_90"
4653

47-
sudo make install
54+
apt-get update
55+
apt install build-essential devscripts debhelper fakeroot
56+
make pkg.debian.build
57+
dpkg -i build/pkg/deb/libnccl2_*.deb
58+
4859
cd -
4960
```
5061

5162
Then, you can install MS-AMP from source.
5263

5364
```
65+
python3 -m pip install --upgrade pip
5466
python3 -m pip install .
5567
make postinstall
5668
```
@@ -61,14 +73,6 @@ After that, you can verify the installation by running:
6173
python3 -c "import msamp; print(msamp.__version__)"
6274
```
6375

64-
### Run unit tests
65-
66-
You can execute the following command to run unit tests.
67-
68-
```
69-
python3 setup.py test
70-
```
71-
7276
### Usage
7377

7478
Enabling MS-AMP is very simple when traning model on single GPU, you only need to add one line of code `msamp.initialize(model, optimizer, opt_level)` after defining model and optimizer.

0 commit comments

Comments
 (0)