Skip to content

Voice Id Door Lock Web-App is a Speaker-Identification and Sentence-Verification using Voice MFCCs Feature and GMM

Notifications You must be signed in to change notification settings

Omar-Saad-ELGharbawy/Voice-Fingerprint-Identification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Voice Fingerprint Identification || DSP-Task-3

(Voice Door-Lock)

Voice Fingerprint Door-Lock is a Digital-Signal-Processing WebApp that is used for Speaker-Identification and Sentence-Verification using Machine-Learning and extracted Audio-Features from voice biometrics.

Table of contents:

Voice Fingerprint Principles

Voice Fingerprint is one of the DSP Applications that depends on Audio Feature Extraction and Machine-Learning Model Trainig

Feature Extraction

The Audio Features are extracted from the Audio Signal using Fourier Transform and Mel-Frequency Cepstral Coefficients (MFCC) and their Delta

What is MFCC?

  • A set of features used in speech recognition and audio information retrieval.
  • Represent the spectral envelope of a sound by measuring the magnitude of the spectral components
  • Represent the short-term power spectrum of a sound by combining a number of adjacent frequency bands
  • Represent the spectral shape of a sound in the frequency domain
  • Calculation Steps
    1. Frame the signal, and compute fourier.
    2. Apply mel filterbank to power spectra, sum energy bands.
    3. Take the log of all filterbank energies, then take Discrete Fourier Transform (DCT).
    4. Keep DCT coefficients 2-13, discard the rest.
    5. Take the logarithm of the power spectrum • Delta and Delta-Delta features are usually also appended, then applying liftering.

mfcc

You can read more about MFCC here

Model Training

Gaussian Mixture Model (GMM)

  • GMM is an unsupervised Clustering model
  • GMM is a probabilistic model that assumes all the data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters.
  • GMM is used in voice identification to identify the speaker by analyzing the spectral characteristics of the voice.
  • GMM uses a set of Gaussian distributions to model the spectral characteristics of the voice.
  • Each Gaussian distribution is characterized by its mean and variance.
  • GMM uses an Expectation Maximization (EM) algorithm to estimate the parameters of the Gaussian distributions.
  • The EM algorithm iteratively estimates the parameters of the Gaussian distributions by maximizing the likelihood of the observed data.
  • The GMM model is then used to classify the speaker by comparing the spectral characteristics of the voice with the estimated parameters of the Gaussian distributions.

gmm2

You can read more about GMM here

Project full Demo

video

Dynamic E-Poster Graphs

MFCC Spectogram

  • Spectogram represents the Mel-Frequency Cepstral Coefficients of the user audio.

MFCC

Gaussian Normal Distribution

  • Represents the normal distribution of mfcc feauture of each user of the team and the input user voice to represent which team fingerprint is closer to the input audio based on principles of GMM Model.

Normal

Scores Bar Chart

  • Bar chart represents scores of gmm models to represent which score is closer to the team scores and compares them with the threshold of dissimilarity.

scores

Project Structure

  • Frontend takes the user audio and sends it to the backend.
  • Backend extracts the audio features and sends them to the machine learning model.
  • Machine learning model compares the input audio features with the team audio features in team verification step
  • If the Voice Fingerprint is verified(From Registered team Users), the machine learning model compares the input audio features with the user audio features in sentence verification step.
  • Door is opened only if the Voice Fingerprint(User in team) is verified and the sentence(Open The Door) is verified.
  • Then Machine learning model returns the result to the backend and the backend returns the result to the frontend
  • Frontend displays the result to the user and the door is opened if the result is verified.

process

  • Frontend :
    • HTML
    • CSS
    • JavaScript
  • Backend :
    • Flask (Python)
  • Machine Learning Model Training
    • GMM Model (Python)
  • Used Libraries
    • python_speech_features
    • librosa
    • sklearn
    • Numpy
    • Scipy

Run The Project

  • Clone the project
  • Open Terminal and write
cd src
pip install -r requirements.txt
flask run --reload 

Team Members