You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+43-33
Original file line number
Diff line number
Diff line change
@@ -1,25 +1,6 @@
1
1
# VoxCeleb trainer
2
2
3
-
This repository contains the framework for training speaker recognition models described in 'In defence of metric learning for speaker recognition.'
4
-
5
-
6
-
### Distributed training
7
-
8
-
This branch contains experimental code for distributed training. It will be merged into `master` in the future.
9
-
10
-
- GPU indices should be set using the command `export CUDA_VISIBLE_DEVICES=0,1,2,3`.
11
-
12
-
- Evaluation is not performed between epochs during training.
13
-
14
-
- Use `--distributed` flag to enable distributed training.
15
-
16
-
- At every epoch, the whole dataset is passed through **each** GPU once. Therefore `test_interval` and `max_epochs` must be divided by the number of GPUs for the same number of forward passes as single GPU training. For example, `--test_interval 10` using 1 GPU should be equivalent to `--test_interval 2` using 5 GPUs.
17
-
18
-
- If you run more than one distributed training session, you need to change the port.
19
-
20
-
- The code only works on Linux systems with CUDA 9.2 or later.
21
-
22
-
If you have any suggestions for improvement, please raise it as an issue.
3
+
This repository contains the framework for training speaker recognition models described in the paper '_In defence of metric learning for speaker recognition_'.
23
4
24
5
### Dependencies
25
6
```
@@ -47,32 +28,32 @@ In addition to the Python dependencies, `wget` and `ffmpeg` must be installed on
A larger model trained with data augmentation can be downloaded from [here](http://www.robots.ox.ac.uk/~joon/data/baseline_v2_ap.model).
51
+
A larger model trained with online data augmentation, described in [2], can be downloaded from [here](http://www.robots.ox.ac.uk/~joon/data/baseline_v2_ap.model).
`--augment True` enables online data augmentation, described in [2].
80
+
97
81
### Adding new models and loss functions
98
82
99
83
You can add new models and loss functions to `models` and `loss` directories respectively. See the existing definitions for examples.
100
84
85
+
### Accelerating training
86
+
87
+
- Use `--mixedprec` flag to enable mixed precision training. This is recommended for Tesla V100, GeForce RTX 20 series or later models.
88
+
89
+
- Use `--distributed` flag to enable distributed training.
90
+
91
+
- GPU indices should be set using the command `export CUDA_VISIBLE_DEVICES=0,1,2,3`.
92
+
93
+
- Evaluation is not performed between epochs during training.
94
+
95
+
- If you are running more than one distributed training session, you need to change the port.
96
+
97
+
- At every epoch, the whole dataset is passed through **each** GPU once. Therefore `test_interval` and `max_epochs` must be divided by the number of GPUs for the same number of forward passes as single GPU training.
98
+
101
99
### Data
102
100
103
101
The [VoxCeleb](http://www.robots.ox.ac.uk/~vgg/data/voxceleb/) datasets are used for these experiments.
@@ -114,9 +112,10 @@ test list for VoxCeleb1 from [here](http://www.robots.ox.ac.uk/~vgg/data/voxcele
114
112
### Replicating the results from the paper
115
113
116
114
1. Model definitions
117
-
-`VGG-M-40` in the paper is `VGGVox` in the code.
118
-
-`Thin ResNet-34` is in the paper `ResNetSE34` in the code.
119
-
-`Fast ResNet-34` is in the paper `ResNetSE34L` in the code.
115
+
-`VGG-M-40` in [1] is `VGGVox` in the repository.
116
+
-`Thin ResNet-34` in [1] is `ResNetSE34` in the repository.
117
+
-`Fast ResNet-34` in [1] is `ResNetSE34L` in the repository.
118
+
-`H / ASP` in [2] is `ResNetSE34V2` in the repository.
120
119
121
120
2. For metric learning objectives, the batch size in the paper is `nPerSpeaker` multiplied by `batch_size` in the code. For the batch size of 800 in the paper, use `--nPerSpeaker 2 --batch_size 400`, `--nPerSpeaker 3 --batch_size 266`, etc.
122
121
@@ -125,13 +124,14 @@ test list for VoxCeleb1 from [here](http://www.robots.ox.ac.uk/~vgg/data/voxcele
125
124
4. You can get a good balance between speed and performance using the configuration below.
0 commit comments