Implementation of SEGAN by Pascual et al. in 2017, using pytorch. Original Tensorflow version can be found here.
- python v3.5.2 or higher
- pytorch v0.3.0 (other versions not tested)
- CUDA preferred
- noisy speech dataset downloaded from here
- libraries specified in
requirements.txt
pip install -r requirements.txt
Use data_preprocess.py
file to preprocess downloaded data.
Adjust the file paths at the beginning of the file to properly locate the data files, output folder, etc.
Uncomment functions in __main__
to perform desired preprocessing stage.
Data preprocessing consists of three main stages:
- Downsampling - downsample original audio files (48k) to sampling rate of 16000.
- Serialization - Splitting the audio files into 2^14-sample (about 1 second) snippets.
- Verification - whether it contains proper number of samples.
Note that the second stage takes a fairly long time - more than an hour.
python model.py
Again, fix and adjust datapaths in model.py
according to your needs.
Especially, provide accurate path to where serialized data are stored.