The repository describes the feature extraction methods for speech signals.
- OpenLSR: OpenSLR is a site devoted to hosting speech and language resources, such as training corpora for speech recognition, and software related to speech recognition.
- VoxForge: VoxForge is now mirroring the LT and the Teleccoperation group Open Speech Data Corpus for German with 35 hours of speech from about 180 speakers.
- TIMIT: The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus.
- Mozilla Speech: Mozilla Releases the world's Second Largest Public Voice Data Set on Nov 29th, 2017.
- Open Data for Deep Learning
- feature_extraction_functions.py: a set of feature extraction functions from RDShi-SpeakerCount.
- MFCC: Mel-frequency cepstral coefficients calculation.
- MFCC.py, MFCCTest.py: Compute the MFCC feature.
- FeatureExtraction.ipynb: Speech preprocessing, including loading data, pre-emphasis, framing, window, Fourier-transform, power spectrum, filter banks, mfccs and mean normalization.
- Volume: volume calculation.
- ZeroCR: Zero-Crossing Rate calculation.
- Pitch: Pitch calculation and pitch tracking.
- Timbre: spectrogram drawing.
- VAD: EPD (End-Point Detection), or Speech Detection, or VAD(Voice Activity Detection).
Anaconda3 (Python3.x)
- http://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html
- https://github.com/wiseman/py-webrtcvad
- https://github.com/jameslyons/python_speech_features
- https://github.com/ZhihaoDU/speech_feature_extractor
- http://ibillxia.github.io/blog/archives/
- http://stevemorphet.weebly.com/speech-and-audio-processing
- MFCC
- Git tutorial
- Linux下CMake生成和调用动态库