Skip to content

Commit

Permalink
update supp reading
Browse files Browse the repository at this point in the history
  • Loading branch information
sanchit-gandhi committed Jul 12, 2023
1 parent 77db368 commit 228db95
Showing 1 changed file with 7 additions and 0 deletions.
7 changes: 7 additions & 0 deletions chapters/en/chapter7/supplemental_reading.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,10 @@ Speech-to-speech translation:
* [Leveraging unsupervised and weakly-supervised data to improve direct STST](https://arxiv.org/abs/2203.13339) by Google: proposes new approaches for leveraging unsupervised and weakly supervised data for training direct STST models and a small change to the Transformer architecture
* [Translatotron-2](https://google-research.github.io/lingvo-lab/translatotron2/) by Google: a system that is able to retain speaker characteristics in translated speech

Voice Assistant:
* [Accurate wakeword detection](https://www.amazon.science/publications/accurate-detection-of-wake-word-start-and-end-using-a-cnn) by Amazon: a low latency approach for wakeword detection for on-device applications
* [RNN-Transducer Architecture](https://arxiv.org/pdf/1811.06621.pdf) by Google: a modification to the CTC architecture for streaming on-device ASR

Meeting Transcriptions:
* [pyannote.audio Technical Report](https://huggingface.co/pyannote/speaker-diarization/blob/main/technical_report_2.1.pdf) by Hervé Bredin: this report describes the main principles behind the `pyannote.audio` speaker diarization pipeline
* [Whisper X](https://arxiv.org/pdf/2303.00747.pdf) by Max Bain et al.: a superior approach to computing word-level timestamps using the Whisper model

0 comments on commit 228db95

Please sign in to comment.