- Description
Implement a classifier able to predict which digit is pronounced in a short audio excerpt. - Input
The dataset used is the Free Spoken Digit Dataset (FSDD). In the folder recodings you will find the audio files named in a specific format. Please read the READ ME distributed with the dataset. The results of the classification must be reported as a confusion matrix and, optionally, other metrics of your choice. - Output
- a brief presentation of your work (max 5 minutes) that will be given to the class
- a more detailed report in which you illustrate and explain every step of your classification system and in which the results are shown and commented (max 8 pages) to be delivered by May 17th.
- a link to a repository containing the code (e.g. on GitHub) with minimal comments.
- Preprocessing
- Feature selection
- Dataset split
- Feature extraction
- Feature selection
- Classification
- Performance evaluation
Mel-frequency cepstrum coefficients
Linear Predictive Coding
Phoneme detection
HiddenMarkovModels
Image processing on spectrogram