Skip to content

rishsans/Speaker-Verification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

Voice Assistant Speaker Verification

project-image

Overview

The widespread use of speech assistants like Siri, Cortana, Google Assistant, and Alexa has presented new challenges for speaker verification. While speaker recognition technology has been in use for years to identify individuals based on their distinctive voice characteristics, these virtual speech assistants are typically produced using Text-to-Speech (TTS) technology, and therefore lack the natural variations present in human voices. This project explores the use of transfer learning to adapt existing speaker recognition models to recognize synthetic voices produced by TTS technology.

Methodology

The project uses a pre-trained ECAPA-TDNN model from the SpeechBrain toolkit and adapts it using transfer learning to recognize synthetic voices produced by TTS technology. The model has originally been trained on human voices using the VoxCeleb2 dataset. The study assesses the accuracy and reliability of speaker verification for both text-dependent and text-independent voice samples.

Text-dependent voice samples refer to situations where the speaker is prompted to say a specific phrase or set of phrases, while text-independent voice samples refer to situations where the speaker can say any phrase or sentence. In this project, both types of samples are used to test the accuracy of the model.

Results

The results indicate similarities among different versions/generations of Siri and Alexa for certain speaker verification tasks. The analysis also highlights the challenges in differentiating between the same and different texts spoken by the speech assistant, which could pose a risk to speaker verification systems. Inter-pair comparison reveals variations among speech assistants, with Cortana and Google Assistant showing some similarity to Alexa's Voice. The findings suggest the need for further training and fine-tuning, as well as considering ethical and privacy implications. Overall, this research underscores the potential of transfer learning and SpeechBrain for speaker verification with text-dependent and text-independent synthetic voices.

Project Screenshots

71

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages