Skip to content

Speaker verification on speech assistants using ECAPA-TDNN model , with focus on intra and inter-voice assistant variations and emphasizing the potential of transfer learning for secure speaker verification

Notifications You must be signed in to change notification settings

rishsans/Voice-Assistant-Speaker-Verification

Repository files navigation

Voice Assistant Speaker Verification

ECAPA-Based Speaker Verification of Virtual Assistants: A Transfer Learning Approach

Abstract

ss_v4

  • Utilizing transfer learning with the ECAPA-TDNN model trained on the VoxCeleb2 dataset.
  • Intra-voice assistant comparisons: Achieved accuracies of 83.33% (iOS) and 66.67% (Alexa) for text-independent samples and 50% for text-dependent samples.
  • Inter-voice assistant comparisons (Alexa, Siri, Google Assistant, Cortana): 100% accuracy for text-independent, 80% for text-dependent.
  • Demonstrates the effectiveness of transfer learning and ECAPA-TDNN model for secure speaker verification across speech assistant versions.
  • Valuable insights for enhancing speaker verification in the context of speech assistants.

Introduction

  • Speaker verification utilizes speech characteristics differentiated based on pitch, formants, spectral envelope, MFCCs, and prosody characteristics.
  • "Voice prints" represent a speaker's unique vocal qualities.
  • Two types of speaker verification methods: text-dependent and text-independent.
  • Transfer learning employs pre-trained models to improve performance when labeled data is scarce.
  • The ECAPA-TDNN model from the SpeechBrain toolkit is used in this study for transfer learning on virtual assistants.

Methodology

Dataset

  • A custom audio dataset was created with a subset selected for analysis.
  • Organized into:
    • Intra-pair Comparisons:
      • Siri Versions (iOS 9 vs iOS 10 vs iOS 11)
      • Alexa Versions (3rd gen vs 4th gen vs 5th gen)
    • Inter-pair Comparisons:
      • Alexa
      • Siri
      • Google
      • Cortana

SpeechBrain

  • Features the ECAPA-TDNN model, a state-of-the-art model for speaker recognition that uses TDNN design with MFA mechanism, Squeeze-Excitation (SE), and residual blocks.
  • Hyperparameters are detailed in a YAML format.
  • Data Loading makes use of a PyTorch dataset interface.
  • Batching includes extracting speech features like spectrograms and MFCCs.
  • Brain_class() simplifies the neural model training process.

Pre-trained Model: ECAPA-TDNN

  • SpeechBrain provides outputs using pre-trained models such as ECAPA-TDNN.
  1. Data preprocessing: Extract 80-dimensional filterbank features.
  2. Model initialization: 5 TDNN layers, an attention mechanism, and an MLP classifier.
  3. Hyperparameter setting: epochs, batch size, learning rate, etc.
  4. Training: Trained on the VoxCeleb2 dataset.
  5. Validation and Testing: Evaluate on a validation set.

Implementation

- Normalize, denoise, and extract features from audio samples.
  • Adjust the ECAPA-TDNN model's initial layer for TDSV and TISV.

  • Use the model to verify speaker identities and obtain similarity scores.

  • Store scores and predictions in arrays.

  • Calculate accuracy, precision, F1 score, and recall for evaluation.

ss_v5

Result

ss_v3

Output Snippets

ss_v2 s3 ss_v1 s5 s6 s7

Conclusion

  • Intra-pair TDSV analysis shows similarities among all versions, leading to potential security concerns.
  • Inter-pair TDSV analysis found matches between Cortana & Google Assistant and Alexa.
  • TISV has higher accuracy than TDSV due to the model's capability to differentiate different texts.
  • For better performance, additional training on a broader dataset of synthetic voices is recommended.
  • The study emphasizes the potential of transfer learning and SpeechBrain for speaker verification, also acknowledging challenges with synthetic voices.

About

Speaker verification on speech assistants using ECAPA-TDNN model , with focus on intra and inter-voice assistant variations and emphasizing the potential of transfer learning for secure speaker verification

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages