Skip to content

Commit 279d079

Browse files
authored
beginner_source/ddp_series_multigpu.rst λ²ˆμ—­ (#912)
* Translate beginner_source/ddp_series_multigpu.rst
1 parent 9cb5af4 commit 279d079

File tree

1 file changed

+61
-69
lines changed

1 file changed

+61
-69
lines changed

β€Žbeginner_source/ddp_series_multigpu.rst

Lines changed: 61 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -6,61 +6,59 @@
66
`minGPT Training <../intermediate/ddp_series_minGPT.html>`__
77

88

9-
Multi GPU training with DDP
9+
DDPλ₯Ό μ΄μš©ν•œ 닀쀑 GPU ν›ˆλ ¨
1010
===========================
1111

12-
Authors: `Suraj Subramanian <https://github.com/suraj813>`__
12+
μ €μž: `Suraj Subramanian <https://github.com/suraj813>`__
13+
μ—­μž: `Nathan Kim <https://github.com/NK590>`__
1314

1415
.. grid:: 2
1516

16-
.. grid-item-card:: :octicon:`mortar-board;1em;` What you will learn
17+
.. grid-item-card:: :octicon:`mortar-board;1em;` μ—¬κΈ°μ—μ„œ λ°°μš°λŠ” 것
1718

18-
- How to migrate a single-GPU training script to multi-GPU via DDP
19-
- Setting up the distributed process group
20-
- Saving and loading models in a distributed setup
19+
- DDPλ₯Ό μ΄μš©ν•˜μ—¬ 단일 GPU ν•™μŠ΅ 슀크립트λ₯Ό 닀쀑 GPU ν•™μŠ΅ 슀크립트둜 λ°”κΎΈλŠ” 법
20+
- λΆ„μ‚° ν”„λ‘œμ„ΈμŠ€ κ·Έλ£Ή(distributed process group)을 μ„€μ •ν•˜λŠ” 법
21+
- λΆ„μ‚° ν™˜κ²½μ—μ„œ λͺ¨λΈμ„ μ €μž₯ 및 μ½μ–΄μ˜€λŠ” 법
2122

2223
.. grid:: 1
2324

2425
.. grid-item::
2526

26-
:octicon:`code-square;1.0em;` View the code used in this tutorial on `GitHub <https://github.com/pytorch/examples/blob/main/distributed/ddp-tutorial-series/multigpu.py>`__
27+
:octicon:`code-square;1.0em;` 이 νŠœν† λ¦¬μ–Όμ—μ„œ μ‚¬μš©λœ μ½”λ“œλŠ” `GitHub <https://github.com/pytorch/examples/blob/main/distributed/ddp-tutorial-series/multigpu.py>`__ μ—μ„œ 확인 κ°€λŠ₯
2728

28-
.. grid-item-card:: :octicon:`list-unordered;1em;` Prerequisites
29+
.. grid-item-card:: :octicon:`list-unordered;1em;` λ“€μ–΄κ°€κΈ° μ•žμ„œ μ€€λΉ„ν•  것
30+
31+
* `DDPκ°€ μ–΄λ–»κ²Œ λ™μž‘ν•˜λŠ”μ§€ <ddp_series_theory.html>`__ 에 λŒ€ν•œ μ „λ°˜μ μΈ 이해도
32+
* 닀쀑 GPUλ₯Ό κ°€μ§„ ν•˜λ“œμ›¨μ–΄ (이 νŠœν† λ¦¬μ–Όμ—μ„œλŠ” AWS p3.8xlarge μΈμŠ€ν„΄μŠ€λ₯Ό μ΄μš©ν•¨)
33+
* CUDA ν™˜κ²½μ—μ„œ `μ„€μΉ˜λœ PyTorch <https://pytorch.org/get-started/locally/>`__
2934

30-
* High-level overview of `how DDP works <ddp_series_theory.html>`__
31-
* A machine with multiple GPUs (this tutorial uses an AWS p3.8xlarge instance)
32-
* PyTorch `installed <https://pytorch.org/get-started/locally/>`__ with CUDA
33-
34-
Follow along with the video below or on `youtube <https://www.youtube.com/watch/-LAtx9Q6DA8>`__.
35+
μ•„λž˜μ˜ λΉ„λ””μ˜€ ν˜Ήμ€ `유튜브 <https://www.youtube.com/watch/-LAtx9Q6DA8>`__ 도 μ°Έκ³ ν•΄μ£Όμ„Έμš”.
3536

3637
.. raw:: html
3738

3839
<div style="margin-top:10px; margin-bottom:10px;">
3940
<iframe width="560" height="315" src="https://www.youtube.com/embed/-LAtx9Q6DA8" frameborder="0" allow="accelerometer; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
4041
</div>
4142

42-
In the `previous tutorial <ddp_series_theory.html>`__, we got a high-level overview of how DDP works; now we see how to use DDP in code.
43-
In this tutorial, we start with a single-GPU training script and migrate that to running it on 4 GPUs on a single node.
44-
Along the way, we will talk through important concepts in distributed training while implementing them in our code.
43+
`이전 νŠœν† λ¦¬μ–Ό <ddp_series_theory.html>`__ μ—μ„œ, DDPκ°€ μ–΄λ–»κ²Œ λ™μž‘ν•˜λŠ”μ§€μ— λŒ€ν•΄ μ „λ°˜μ μœΌλ‘œ μ•Œμ•„λ³΄μ•˜μœΌλ―€λ‘œ, 이제 μ‹€μ œλ‘œ DDPλ₯Ό μ–΄λ–»κ²Œ μ‚¬μš©ν•˜λŠ”μ§€ μ½”λ“œλ₯Ό λ³Ό μ°¨λ‘€μž…λ‹ˆλ‹€.
44+
이 νŠœν† λ¦¬μ–Όμ—μ„œλŠ”, λ¨Όμ € 단일 GPU ν•™μŠ΅ μŠ€ν¬λ¦½νŠΈμ—μ„œ μ‹œμž‘ν•˜μ—¬, 단일 λ…Έλ“œλ₯Ό κ°€μ§„ 4개의 GPUμ—μ„œ λ™μž‘ν•˜κ²Œ λ§Œλ“€ κ²ƒμž…λ‹ˆλ‹€.
45+
이 κ³Όμ •μ—μ„œ, λΆ„μ‚° ν›ˆλ ¨(distributed training)에 λŒ€ν•œ μ€‘μš”ν•œ κ°œλ…λ“€μ„ 직접 μ½”λ“œλ‘œ κ΅¬ν˜„ν•˜λ©΄μ„œ λ‹€λ£¨κ²Œ 될 κ²ƒμž…λ‹ˆλ‹€.
4546

4647
.. note::
47-
If your model contains any ``BatchNorm`` layers, it needs to be converted to ``SyncBatchNorm`` to sync the running stats of ``BatchNorm``
48-
layers across replicas.
49-
50-
Use the helper function
51-
`torch.nn.SyncBatchNorm.convert_sync_batchnorm(model) <https://pytorch.org/docs/stable/generated/torch.nn.SyncBatchNorm.html#torch.nn.SyncBatchNorm.convert_sync_batchnorm>`__ to convert all ``BatchNorm`` layers in the model to ``SyncBatchNorm``.
48+
λ§Œμ•½ λ‹Ήμ‹ μ˜ λͺ¨λΈμ΄ ``BatchNorm`` λ ˆμ΄μ–΄λ₯Ό κ°€μ§€κ³  μžˆλ‹€λ©΄, ν•΄λ‹Ή λ ˆμ΄μ–΄ κ°„ λ™μž‘ μƒν™©μ˜ 동기화λ₯Ό μœ„ν•΄ 이걸 λͺ¨λ‘ ``SyncBatchNorm`` 으둜 λ°”κΏ€ ν•„μš”κ°€ μžˆμŠ΅λ‹ˆλ‹€.
5249

50+
도움 ν•¨μˆ˜(helper function)
51+
`torch.nn.SyncBatchNorm.convert_sync_batchnorm(model) <https://pytorch.org/docs/stable/generated/torch.nn.SyncBatchNorm.html#torch.nn.SyncBatchNorm.convert_sync_batchnorm>`__ λ₯Ό μ΄μš©ν•˜μ—¬ λͺ¨λΈ μ•ˆμ˜ ``BatchNorm`` λ ˆμ΄μ–΄λ₯Ό ``SyncBatchNorm`` λ ˆμ΄μ–΄λ‘œ λ°”κΏ”μ£Όμ„Έμš”.
5352

54-
Diff for `single_gpu.py <https://github.com/pytorch/examples/blob/main/distributed/ddp-tutorial-series/single_gpu.py>`__ v/s `multigpu.py <https://github.com/pytorch/examples/blob/main/distributed/ddp-tutorial-series/multigpu.py>`__
53+
`single_gpu.py <https://github.com/pytorch/examples/blob/main/distributed/ddp-tutorial-series/single_gpu.py>`__ 와 `multigpu.py <https://github.com/pytorch/examples/blob/main/distributed/ddp-tutorial-series/multigpu.py>`__ 의 차이
5554

56-
These are the changes you typically make to a single-GPU training script to enable DDP.
55+
μœ„ μ½”λ“œμ˜ 차이λ₯Ό λΉ„κ΅ν•˜λ©΄μ„œ 일반적으둜 단일 GPU ν•™μŠ΅ μŠ€ν¬λ¦½νŠΈμ—μ„œ DDPλ₯Ό μ μš©ν•˜λŠ” 법을 μ•Œ 수 μžˆμŠ΅λ‹ˆλ‹€.
5756

58-
Imports
57+
μž„ν¬νŠΈ
5958
~~~~~~~
60-
- ``torch.multiprocessing`` is a PyTorch wrapper around Python's native
61-
multiprocessing
62-
- The distributed process group contains all the processes that can
63-
communicate and synchronize with each other.
59+
- ``torch.multiprocessing`` 은 Python의 λ„€μ΄ν‹°λΈŒ λ©€ν‹°ν”„λ‘œμ„Έμ‹± λͺ¨λ“ˆμ˜ 래퍼(wrapper)μž…λ‹ˆλ‹€.
60+
61+
- λΆ„μ‚° ν”„λ‘œμ„ΈμŠ€ κ·Έλ£Ή(distributed process group)은 μ„œλ‘œ 정보 κ΅ν™˜μ΄ κ°€λŠ₯ν•˜κ³  동기화가 κ°€λŠ₯ν•œ λͺ¨λ“  ν”„λ‘œμ„ΈμŠ€λ“€μ„ ν¬ν•¨ν•©λ‹ˆλ‹€.
6462

6563
.. code-block:: diff
6664
@@ -75,18 +73,15 @@ Imports
7573
+ import os
7674
7775
78-
Constructing the process group
76+
ν”„λ‘œμ„ΈμŠ€ κ·Έλ£Ή ꡬ성
7977
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8078

81-
- First, before initializing the group process, call `set_device <https://pytorch.org/docs/stable/generated/torch.cuda.set_device.html?highlight=set_device#torch.cuda.set_device>`__,
82-
which sets the default GPU for each process. This is important to prevent hangs or excessive memory utilization on `GPU:0`
83-
- The process group can be initialized by TCP (default) or from a
84-
shared file-system. Read more on `process group
85-
initialization <https://pytorch.org/docs/stable/distributed.html#tcp-initialization>`__
86-
- `init_process_group <https://pytorch.org/docs/stable/distributed.html?highlight=init_process_group#torch.distributed.init_process_group>`__
87-
initializes the distributed process group.
88-
- Read more about `choosing a DDP
89-
backend <https://pytorch.org/docs/stable/distributed.html#which-backend-to-use>`__
79+
- λ¨Όμ €, κ·Έλ£Ή ν”„λ‘œμ„ΈμŠ€λ₯Ό μ΄ˆκΈ°ν™”ν•˜κΈ° 전에, `set_device <https://pytorch.org/docs/stable/generated/torch.cuda.set_device.html?highlight=set_device#torch.cuda.set_device>`__ λ₯Ό ν˜ΈμΆœν•˜μ—¬
80+
각각의 ν”„λ‘œμ„ΈμŠ€μ— GPUλ₯Ό ν• λ‹Ήν•΄μ£Όμ„Έμš”. 이 과정은 `GPU:0` 에 κ³Όλ„ν•œ λ©”λͺ¨λ¦¬ μ‚¬μš© ν˜Ήμ€ 멈좀 ν˜„μƒμ„ λ°©μ§€ν•˜κΈ° μœ„ν•΄ μ€‘μš”ν•©λ‹ˆλ‹€.
81+
- 이 ν”„λ‘œμ„ΈμŠ€ 그룹은 TCP(κΈ°λ³Έ) ν˜Ήμ€ 곡유 파일 μ‹œμŠ€ν…œ 등을 ν†΅ν•˜μ—¬ μ΄ˆκΈ°ν™”λ  수 μžˆμŠ΅λ‹ˆλ‹€.
82+
μžμ„Έν•œ λ‚΄μš©μ€ `ν”„λ‘œμ„ΈμŠ€ κ·Έλ£Ή μ΄ˆκΈ°ν™” <https://pytorch.org/docs/stable/distributed.html#tcp-initialization>`__ λ₯Ό μ°Έκ³ ν•΄μ£Όμ„Έμš”.
83+
- `init_process_group <https://pytorch.org/docs/stable/distributed.html?highlight=init_process_group#torch.distributed.init_process_group>`__ 으둜 λΆ„μ‚° ν”„λ‘œμ„ΈμŠ€ 그룹을 μ΄ˆκΈ°ν™”μ‹œν‚΅λ‹ˆλ‹€.
84+
- 좔가적인 λ‚΄μš©μ€ `DDP λ°±μ—”λ“œ 선택 <https://pytorch.org/docs/stable/distributed.html#which-backend-to-use>`__ 을 μ°Έκ³ ν•΄μ£Όμ„Έμš”.
9085

9186
.. code-block:: diff
9287
@@ -103,21 +98,21 @@ Constructing the process group
10398
10499
105100
106-
Constructing the DDP model
101+
DDP λͺ¨λΈ ꡬ좕
107102
~~~~~~~~~~~~~~~~~~~~~~~~~~
108103

109104
.. code-block:: diff
110105
111106
- self.model = model.to(gpu_id)
112107
+ self.model = DDP(model, device_ids=[gpu_id])
113108
114-
Distributing input data
109+
μž…λ ₯ 데이터 λΆ„μ‚°
115110
~~~~~~~~~~~~~~~~~~~~~~~
116111

117-
- `DistributedSampler <https://pytorch.org/docs/stable/data.html?highlight=distributedsampler#torch.utils.data.distributed.DistributedSampler>`__
118-
chunks the input data across all distributed processes.
119-
- Each process will receive an input batch of 32 samples; the effective
120-
batch size is ``32 * nprocs``, or 128 when using 4 GPUs.
112+
- `DistributedSampler <https://pytorch.org/docs/stable/data.html?highlight=distributedsampler#torch.utils.data.distributed.DistributedSampler>`__
113+
λ₯Ό μ΄μš©ν•˜μ—¬ λͺ¨λ“  λΆ„μ‚° ν”„λ‘œμ„ΈμŠ€μ— μž…λ ₯ 데이터λ₯Ό λ‚˜λˆ•λ‹ˆλ‹€.
114+
- 각각의 ν”„λ‘œμ„ΈμŠ€λŠ” 32개 μƒ˜ν”Œ 크기의 μž…λ ₯ 배치λ₯Ό λ°›μŠ΅λ‹ˆλ‹€.
115+
이상적인 배치 ν¬κΈ°λŠ” ``32 * nprocs``, ν˜Ήμ€ 4개의 GPUλ₯Ό μ‚¬μš©ν•  λ•Œ 128μž…λ‹ˆλ‹€.
121116

122117
.. code-block:: diff
123118
@@ -129,8 +124,8 @@ Distributing input data
129124
+ sampler=DistributedSampler(train_dataset),
130125
)
131126
132-
- Calling the ``set_epoch()`` method on the ``DistributedSampler`` at the beginning of each epoch is necessary to make shuffling work
133-
properly across multiple epochs. Otherwise, the same ordering will be used in each epoch.
127+
- λ§€ 에폭(epoch)의 μ‹œμž‘λ§ˆλ‹€ ``DistributedSampler`` 의 ``set_epoch()`` λ©”μ†Œλ“œλ₯Ό ν˜ΈμΆœν•˜λŠ” 것은 λ‹€μˆ˜μ˜ μ—ν­μ—μ„œ μˆœμ„œλ₯Ό 적절히 μ„žκΈ° μœ„ν•΄ ν•„μˆ˜μ μž…λ‹ˆλ‹€.
128+
이λ₯Ό μ‚¬μš©ν•˜μ§€ μ•Šμ„ 경우, λ§€ μ—ν­λ§ˆλ‹€ 같은 μˆœμ„œκ°€ μ‚¬μš©λ©λ‹ˆλ‹€.
134129

135130
.. code-block:: diff
136131
@@ -142,12 +137,12 @@ Distributing input data
142137
self._run_batch(source, targets)
143138
144139
145-
Saving model checkpoints
140+
λͺ¨λΈ 체크포인트(checkpoints) μ €μž₯
146141
~~~~~~~~~~~~~~~~~~~~~~~~
147-
- We only need to save model checkpoints from one process. Without this
148-
condition, each process would save its copy of the identical mode. Read
149-
more on saving and loading models with
150-
DDP `here <https://tutorials.pytorch.kr/intermediate/ddp_tutorial.html#save-and-load-checkpoints>`__
142+
- λͺ¨λΈ 체크포인트λ₯Ό μ €μž₯ν•  λ•Œ, ν•˜λ‚˜μ˜ ν”„λ‘œμ„ΈμŠ€μ— λŒ€ν•΄μ„œλ§Œ 체크포인트λ₯Ό μ €μž₯ν•˜λ©΄ λ©λ‹ˆλ‹€. μ΄λ ‡κ²Œ ν•˜μ§€ μ•ŠμœΌλ©΄,
143+
각각의 ν”„λ‘œμ„ΈμŠ€κ°€ λͺ¨λ‘ λ™μΌν•œ μƒνƒœλ₯Ό μ €μž₯ν•˜κ²Œ 될 κ²ƒμž…λ‹ˆλ‹€.
144+
`μ—¬κΈ° <https://tutorials.pytorch.kr/intermediate/ddp_tutorial.html#save-and-load-checkpoints>`__ μ—μ„œ
145+
DDP ν™˜κ²½μ—μ„œ λͺ¨λΈμ˜ μ €μž₯κ³Ό μ½μ–΄μ˜€κΈ° 등에 λŒ€ν•΄ μžμ„Έν•œ λ‚΄μš©μ„ 확인할 수 μžˆμŠ΅λ‹ˆλ‹€.
151146

152147
.. code-block:: diff
153148
@@ -160,21 +155,19 @@ Saving model checkpoints
160155
self._save_checkpoint(epoch)
161156
162157
.. warning::
163-
`Collective calls <https://pytorch.org/docs/stable/distributed.html#collective-functions>`__ are functions that run on all the distributed processes,
164-
and they are used to gather certain states or values to a specific process. Collective calls require all ranks to run the collective code.
165-
In this example, `_save_checkpoint` should not have any collective calls because it is only run on the ``rank:0`` process.
166-
If you need to make any collective calls, it should be before the ``if self.gpu_id == 0`` check.
167-
158+
`μ§‘ν•© 콜(Collective Calls) <https://pytorch.org/docs/stable/distributed.html#collective-functions>`__ 은 λͺ¨λ“  λΆ„μ‚° ν”„λ‘œμ„ΈμŠ€μ—μ„œ λ™μž‘ν•˜λŠ” ν•¨μˆ˜(functions)이며,
159+
νŠΉμ • ν”„λ‘œμ„ΈμŠ€μ˜ νŠΉμ •ν•œ μƒνƒœλ‚˜ 값을 λͺ¨μœΌκΈ° μœ„ν•΄ μ‚¬μš©λ©λ‹ˆλ‹€. μ§‘ν•© μ½œμ€ μ§‘ν•© μ½”λ“œ(collective code)λ₯Ό μ‹€ν–‰ν•˜κΈ° μœ„ν•΄ λͺ¨λ“  랭크(rank)λ₯Ό ν•„μš”λ‘œ ν•©λ‹ˆλ‹€.
160+
이 μ˜ˆμ œμ—μ„œ, `_save_checkpoint`λŠ” μ˜€λ‘œμ§€ ``rank:0`` ν”„λ‘œμ„ΈμŠ€μ—μ„œλ§Œ μ‹€ν–‰λ˜κΈ° λ•Œλ¬Έμ—, μ–΄λ– ν•œ μ§‘ν•© μ½œλ„ κ°€μ§€κ³  있으면 μ•ˆ λ©λ‹ˆλ‹€.
161+
λ§Œμ•½ μ§‘ν•© μ½œμ„ λ§Œλ“€μ–΄μ•Ό λœλ‹€λ©΄, ``if self.gpu_id == 0`` 확인 이전에 λ§Œλ“€μ–΄μ Έμ•Ό ν•©λ‹ˆλ‹€.
168162

169-
Running the distributed training job
163+
λΆ„μ‚° ν•™μŠ΅ μž‘μ—…μ˜ μ‹€ν–‰
170164
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
171165

172-
- Include new arguments ``rank`` (replacing ``device``) and
173-
``world_size``.
174-
- ``rank`` is auto-allocated by DDP when calling
175-
`mp.spawn <https://pytorch.org/docs/stable/multiprocessing.html#spawning-subprocesses>`__.
176-
- ``world_size`` is the number of processes across the training job. For GPU training,
177-
this corresponds to the number of GPUs in use, and each process works on a dedicated GPU.
166+
- μƒˆλ‘œμš΄ μΈμžκ°’ ``rank`` (``device`` λ₯Ό λŒ€μ²΄)와 ``world_size`` λ₯Ό λ„μž…ν•©λ‹ˆλ‹€.
167+
- ``rank`` λŠ” `mp.spawn <https://pytorch.org/docs/stable/multiprocessing.html#spawning-subprocesses>`__ 을 ν˜ΈμΆœν•  λ•Œ
168+
DDP에 μ˜ν•΄ μžλ™μ μœΌλ‘œ ν• λ‹Ήλ©λ‹ˆλ‹€.
169+
- ``world_size`` λŠ” ν•™μŠ΅ μž‘μ—…μ— μ΄μš©λ˜λŠ” ν”„λ‘œμ„ΈμŠ€μ˜ κ°œμˆ˜μž…λ‹ˆλ‹€. GPUλ₯Ό μ΄μš©ν•œ ν•™μŠ΅μ˜ κ²½μš°μ—λŠ”,
170+
이 값은 ν˜„μž¬ μ‚¬μš©μ€‘μΈ GPU의 개수 및 ν•œ GPU에 ν• λ‹Ήλœ ν”„λ‘œμ„ΈμŠ€μ˜ κ°œμˆ˜μ— ν•΄λ‹Ήν•©λ‹ˆλ‹€.
178171

179172
.. code-block:: diff
180173
@@ -199,11 +192,10 @@ Running the distributed training job
199192
200193
201194
202-
Further Reading
195+
더 읽을거리
203196
---------------
204197

205-
- `Fault Tolerant distributed training <ddp_series_fault_tolerance.html>`__ (next tutorial in this series)
206-
- `Intro to DDP <ddp_series_theory.html>`__ (previous tutorial in this series)
207-
- `Getting Started with DDP <https://tutorials.pytorch.kr/intermediate/ddp_tutorial.html>`__
208-
- `Process Group
209-
initialization <https://pytorch.org/docs/stable/distributed.html#tcp-initialization>`__
198+
- `결함 ν—ˆμš©(fault tolerant) λΆ„μ‚° μ‹œμŠ€ν…œ <ddp_series_fault_tolerance.html>`__ (λ³Έ μ‹œλ¦¬μ¦ˆμ˜ λ‹€μŒ νŠœν† λ¦¬μ–Ό)
199+
- `DDP μž…λ¬Έ <ddp_series_theory.html>`__ (λ³Έ μ‹œλ¦¬μ¦ˆμ˜ 이전 νŠœν† λ¦¬μ–Ό)
200+
- `λΆ„μ‚° 데이터 병렬 처리(DDP) μ‹œμž‘ν•˜κΈ° <https://tutorials.pytorch.kr/intermediate/ddp_tutorial.html>`__
201+
- `ν”„λ‘œμ„ΈμŠ€ κ·Έλ£Ή μ΄ˆκΈ°ν™” <https://pytorch.org/docs/stable/distributed.html#tcp-initialization>`__

0 commit comments

Comments
Β (0)