- Utilizing transfer learning with the ECAPA-TDNN model trained on the VoxCeleb2 dataset.
- Intra-voice assistant comparisons: Achieved accuracies of 83.33% (iOS) and 66.67% (Alexa) for text-independent samples and 50% for text-dependent samples.
- Inter-voice assistant comparisons (Alexa, Siri, Google Assistant, Cortana): 100% accuracy for text-independent, 80% for text-dependent.
- Demonstrates the effectiveness of transfer learning and ECAPA-TDNN model for secure speaker verification across speech assistant versions.
- Valuable insights for enhancing speaker verification in the context of speech assistants.
- Speaker verification utilizes speech characteristics differentiated based on pitch, formants, spectral envelope, MFCCs, and prosody characteristics.
- "Voice prints" represent a speaker's unique vocal qualities.
- Two types of speaker verification methods: text-dependent and text-independent.
- Transfer learning employs pre-trained models to improve performance when labeled data is scarce.
- The ECAPA-TDNN model from the SpeechBrain toolkit is used in this study for transfer learning on virtual assistants.
- A custom audio dataset was created with a subset selected for analysis.
- Organized into:
- Intra-pair Comparisons:
- Siri Versions (iOS 9 vs iOS 10 vs iOS 11)
- Alexa Versions (3rd gen vs 4th gen vs 5th gen)
- Inter-pair Comparisons:
- Alexa
- Siri
- Cortana
- Intra-pair Comparisons:
- Features the ECAPA-TDNN model, a state-of-the-art model for speaker recognition that uses TDNN design with MFA mechanism, Squeeze-Excitation (SE), and residual blocks.
- Hyperparameters are detailed in a YAML format.
- Data Loading makes use of a PyTorch dataset interface.
- Batching includes extracting speech features like spectrograms and MFCCs.
Brain_class()
simplifies the neural model training process.
- SpeechBrain provides outputs using pre-trained models such as ECAPA-TDNN.
- Data preprocessing: Extract 80-dimensional filterbank features.
- Model initialization: 5 TDNN layers, an attention mechanism, and an MLP classifier.
- Hyperparameter setting: epochs, batch size, learning rate, etc.
- Training: Trained on the VoxCeleb2 dataset.
- Validation and Testing: Evaluate on a validation set.
![ss_v2](https://private-user-images.githubusercontent.com/98217912/267134396-2874e542-f198-48ec-8da6-4189317d386e.jpg?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjExOTg4NzMsIm5iZiI6MTcyMTE5ODU3MywicGF0aCI6Ii85ODIxNzkxMi8yNjcxMzQzOTYtMjg3NGU1NDItZjE5OC00OGVjLThkYTYtNDE4OTMxN2QzODZlLmpwZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzE3VDA2NDI1M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTQ0M2IyZWEwMjc0ZDhiY2Q1ZjYxMDdjMWEzMWM4ODg3ZjhlZGI3YWJmMjMyZGIwNjFiZTdhZTU4MjJmMGE5M2QmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.zjhqaD52GCpUjp4osRuP7A-mTARvMk0WuKkLS4J45fY)
![s3](https://private-user-images.githubusercontent.com/98217912/261526517-e82071d5-91e8-47d2-9d5f-2bd3e8a399f9.jpg?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjExOTg4NzMsIm5iZiI6MTcyMTE5ODU3MywicGF0aCI6Ii85ODIxNzkxMi8yNjE1MjY1MTctZTgyMDcxZDUtOTFlOC00N2QyLTlkNWYtMmJkM2U4YTM5OWY5LmpwZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzE3VDA2NDI1M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWU0ZDc4ZWExMmE4ZmFlNjM4NjllNThlNzNmODIxNzk2MzJjODI5NDdiMmExMWEwYzMxNzJmNTQ3YjBlMGJiY2UmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.l1oA0p0kA2DykbTM8K3sew-btA5O8eBg6ag7v1ItbSs)
![ss_v1](https://private-user-images.githubusercontent.com/98217912/267134685-0e38457a-ebde-4509-b35e-3d76725ad930.jpg?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjExOTg4NzMsIm5iZiI6MTcyMTE5ODU3MywicGF0aCI6Ii85ODIxNzkxMi8yNjcxMzQ2ODUtMGUzODQ1N2EtZWJkZS00NTA5LWIzNWUtM2Q3NjcyNWFkOTMwLmpwZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzE3VDA2NDI1M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQ5ZmZiZDBjZTYxNzJlMTFlOGFlNmRjNDJlYWFiOGVhZGQxZGViOWRiZjdmZmM4MjU3YmNkNjMyOWE0NjExN2EmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.unRjtzSzlqwj3KVWdtLHyT9EXnZ7H8Llbavg6QKB7UQ)
![s5](https://private-user-images.githubusercontent.com/98217912/261526551-77f4871e-7b3a-4928-a9a3-f96f25655c5e.jpg?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjExOTg4NzMsIm5iZiI6MTcyMTE5ODU3MywicGF0aCI6Ii85ODIxNzkxMi8yNjE1MjY1NTEtNzdmNDg3MWUtN2IzYS00OTI4LWE5YTMtZjk2ZjI1NjU1YzVlLmpwZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzE3VDA2NDI1M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWU0ZTc3NDMxNmVhZTVmNzViYzYwMDZlMzA4YTY0NDU0ZGMyODQwMjdmN2I3YjI5ZTU5MzI2NGM4ZTcyZDQ1MWEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.ceRkwBQh1eRw785Pqe1kM2wX7mzRvz0vIbUkn5kfHFY)
![s6](https://private-user-images.githubusercontent.com/98217912/261526567-07a3fbee-59b1-44a4-8c56-6fd27fa6cc49.jpg?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjExOTg4NzMsIm5iZiI6MTcyMTE5ODU3MywicGF0aCI6Ii85ODIxNzkxMi8yNjE1MjY1NjctMDdhM2ZiZWUtNTliMS00NGE0LThjNTYtNmZkMjdmYTZjYzQ5LmpwZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzE3VDA2NDI1M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTA5OTkyNTJjOTQzZDU3NTY1NjVmZTA2YTA0YjVhNzZlNTEzOTlhZTIwM2Y2OGExMGQ3NzNkZmJjN2FkZTc3NWQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.V1bBZPlNyLkoLWhTS1-fxuhgwyyb8Ed_keIJu2Y2S7U)
![s7](https://private-user-images.githubusercontent.com/98217912/261526579-8ddcd976-c3eb-40af-a390-6e80e28fa8db.jpg?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjExOTg4NzMsIm5iZiI6MTcyMTE5ODU3MywicGF0aCI6Ii85ODIxNzkxMi8yNjE1MjY1NzktOGRkY2Q5NzYtYzNlYi00MGFmLWEzOTAtNmU4MGUyOGZhOGRiLmpwZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzE3VDA2NDI1M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTIwMDU5OTZhYjIwNWUwMGEyNjcwYjc1NmZkNGVlOGRjMzMxYjgyN2MwNDY0ZGM3OTk0YWE2YzNhMGM0YTlhNmUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.GneTvfyMRMuZHFKAVageyw7Y1fy-HljHAxRL9jSlVnU)
- Intra-pair TDSV analysis shows similarities among all versions, leading to potential security concerns.
- Inter-pair TDSV analysis found matches between Cortana & Google Assistant and Alexa.
- TISV has higher accuracy than TDSV due to the model's capability to differentiate different texts.
- For better performance, additional training on a broader dataset of synthetic voices is recommended.
- The study emphasizes the potential of transfer learning and SpeechBrain for speaker verification, also acknowledging challenges with synthetic voices.