|
1 |
| -## [中文版本请点击这里](./README_CN.md) |
2 |
| - |
3 |
| -## Train |
| 1 | +## Training |
4 | 2 | ### 1.Requirements
|
5 | 3 | python==3.6
|
6 | 4 | cuda==10.1
|
@@ -90,5 +88,32 @@ sudo mkdir /train_tmp
|
90 | 88 | mount -t tmpfs -o size=140G tmpfs /train_tmp
|
91 | 89 | ```
|
92 | 90 |
|
| 91 | +## Our Method |
| 92 | + |
| 93 | + |
| 94 | +### 1. The classification layer model is parallel |
| 95 | +Class centers are evenly distributed across different GPUs. It only takes three communications to complete |
| 96 | +loss-free Softmax calculations. |
| 97 | + |
| 98 | +#### 1. Synchronization of features |
| 99 | +Make sure each GPU has all the GPU features on it, as is shown in `AllGather(x_i)`. |
| 100 | + |
| 101 | +#### 2. Synchronization of denominator of the softmax function |
| 102 | +We can first calculate the local sum of each GPU, and then compute the global sum through communication, as is shown |
| 103 | +in `Allreduce(sum(exp(logits_i)))` |
| 104 | + |
| 105 | +#### 3. Synchronization the gradients of feature |
| 106 | +The gradient of logits can be calculated independently, so is the gradient of the feature. finally, we collect all the |
| 107 | +gradients on GPU and send them back to backbone, as is shown in `Allreduce(deta(X))` |
| 108 | + |
| 109 | +### 2. Softmax approximate |
| 110 | + |
| 111 | +Just a subset of class centers can approximate softmax's computation(positive class centers must in these class centers), |
| 112 | +this can be done with the following code: |
| 113 | +```python |
| 114 | +centers_p = func_positive(label) # select the positive class centers by the label of the sample |
| 115 | +centers_n = func_negative(centers_p) # negative class centers are randomly sampled after excluding positive classes |
| 116 | +centers_final = concat(centers_n, centers_p) # class centers that participate in softmax calculations |
| 117 | +``` |
93 | 118 |
|
94 | 119 |
|
0 commit comments