Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



9 Commits

Repository files navigation

DT2119 Speech and speaker recogintion

Lab1: Feature extraction

• compute MFCC features step-by-step • examine features • evaluate correlation between feature • compare utterances with Dynamic Time Warping • illustrate the discriminative power of the features with respect to words • perform hierarchical clustering of utterances • train and analyze a Gaussian Mixture Model of the feature vectors.

Lab2: Hidden Markov Models with Gaussian Emissions

• combine phonetic HMMs into word HMMs using a lexicon • implement the forward-backward algorithm, • use it compute the log likelihood of spoken utterances given a Gaussian HMM • perform isolated word recognition • implement the Viterbi algorithm, and use it to compute Viterbi path and likelihood • compare and comment Viterbi and Forward likelihoods • implement the Baum-Welch algorithm to update the parameters of the emission probability distributions

Lab3: Phoneme Recognition with Deep Neural Network

Train and test a phone recogniser based on digit speech material from the TIDIGIT database:

• using predefined Gaussian-emission HMM phonetic models, create time aligned phonetic transcriptions of the TIDIGITS database • define appropriate DNN models for phoneme recognition using Keras • train and evaluate the DNN models on a frame-by-frame recognition score • repeat the training by varying model parameters and input features

Project: Automatic music genre classification using deep learning technologies

Music genre classification based on CNN and LSTM netwotks


speech recogintion






No releases published


No packages published