Skip to content

Commit

Permalink
Merge pull request #84 from sanchit-gandhi/u7-all-together
Browse files Browse the repository at this point in the history
U7: putting it all together
  • Loading branch information
sanchit-gandhi authored Jul 12, 2023
2 parents a076263 + 228db95 commit 38e75fa
Show file tree
Hide file tree
Showing 16 changed files with 1,055 additions and 29 deletions.
32 changes: 14 additions & 18 deletions chapters/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -103,27 +103,23 @@
- local: chapter6/supplemental_reading
title: Supplemental reading and resources

#- title: Unit 7. Audio to audio
# sections:
# - local: chapter7/introduction
# title: What you'll learn and what you'll build
# - local: chapter7/tasks
# title: Examples of audio-to-audio tasks
# - local: chapter7/choosing_dataset
# title: Choosing a dataset
# - local: chapter7/preprocessing
# title: Loading and preprocessing data
# - local: chapter7/evaluation
# title: Evaluation metrics for audio-to-audio
# - local: chapter7/fine-tuning
# title: Fine-tuning the model
- title: Unit 7. Putting it all together
sections:
- local: chapter7/introduction
title: What you'll learn and what you'll build
- local: chapter7/speech-to-speech
title: Speech-to-speech translation
- local: chapter7/voice-assistant
title: Creating a voice assistant
- local: chapter7/transcribe-meeting
title: Transcribe a meeting
# - local: chapter7/quiz
# title: Quiz
# quiz: 7
# - local: chapter7/hands_on
# title: Hands-on exercise
# - local: chapter7/supplemental_reading
# title: Supplemental reading and resources
- local: chapter7/hands-on
title: Hands-on exercise
- local: chapter7/supplemental_reading
title: Supplemental reading and resources
#
#- title: Unit 8. Finish line
# sections:
Expand Down
47 changes: 47 additions & 0 deletions chapters/en/chapter7/hands-on.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Hands-on exercise

In this Unit, we consolidated the material covered in the previous six units of the course to build three integrated
audio applications. As you've experienced, building more involved audio tools is fully within reach by using the
foundational skills you've acquired in this course.

The hands-on exercise takes one of the applications covered in this Unit, and extends it with a few multilingual
tweaks 🌍 Your objective is to take the [cascaded speech-to-speech translation Gradio demo](https://huggingface.co/spaces/course-demos/speech-to-speech-translation)
from the first section in this Unit, and update it to translate to any **non-English** language. That is to say, the
demo should take speech in language X, and translate it to speech in language Y, where the target language Y is not
English. You should start by [duplicating](https://huggingface.co/spaces/course-demos/speech-to-speech-translation?duplicate=true)
the template under your Hugging Face namespace. There's no requirement to use a GPU accelerator device - the free CPU
tier works just fine 🤗 However, you should ensure that the visibility of your demo is set to **public**. This is required
such that your demo is accessible to us and can thus be checked for correctness.

Tips for updating the speech translation function to perform multilingual speech translation are provided in the
section on [speech-to-speech translation](speech-to-speech.mdx). By following these instructions, you should be able
to update the demo to translate from speech in language X to text in language Y, which is half of the task!

To synthesise from text in language Y to speech in language Y, where Y is a multilingual language, you will need
to use a multilingual TTS checkpoint. For this, you can either use the SpeechT5 TTS checkpoint that you fine-tuned
in the previous hands-on exercise, or a pre-trained multilingual TTS checkpoint. There are two options for pre-trained
checkpoints, either the checkpoint [sanchit-gandhi/speecht5_tts_vox_nl](https://huggingface.co/sanchit-gandhi/speecht5_tts_vox_nl),
which is a SpeechT5 checkpoint fine-tuned on the Dutch split of the [VoxPopuli](https://huggingface.co/datasets/facebook/voxpopuli)
dataset, or an MMS TTS checkpoint (see section on [pretrained models for TTS](../chapter6/pre-trained_models.mdx)).

<Tip>
In our experience experimenting with the Dutch language, using an MMS TTS checkpoint results in better performance than a
fine-tuned SpeechT5 one, but you might find that your fine-tuned TTS checkpoint is preferable in your language.
If you decide to use an MMS TTS checkpoint, you will need to update the <a href="https://huggingface.co/spaces/course-demos/speech-to-speech-translation/blob/a03175878f522df7445290d5508bfb5c5178f787/requirements.txt#L2">requirements.txt</a>
file of your demo to install <code>transformers</code> from the PR branch:
<p><code>git+https://github.com/hollance/transformers.git@6900e8ba6532162a8613d2270ec2286c3f58f57b</code></p>
</Tip>


Your demo should take as input an audio file, and return as output another audio file, matching the signature of the
[`speech_to_speech_translation`](https://huggingface.co/spaces/course-demos/speech-to-speech-translation/blob/3946ba6705a6632a63de8672ac52a482ab74b3fc/app.py#L35)
function in the template demo. Therefore, we recommend that you leave the main function `speech_to_speech_translation`
as is, and only update the [`translate`](https://huggingface.co/spaces/course-demos/speech-to-speech-translation/blob/a03175878f522df7445290d5508bfb5c5178f787/app.py#L24)
and [`synthesise`](https://huggingface.co/spaces/course-demos/speech-to-speech-translation/blob/a03175878f522df7445290d5508bfb5c5178f787/app.py#L29)
functions as required.

Once you have built your demo as a Gradio demo on the Hugging Face Hub, you can submit it for assessment. Head to the
Space [audio-course-u7-assessment](https://huggingface.co/spaces/huggingface-course/audio-course-u7-assessment) and
provide the repository id of your demo when prompted. This Space will check that your demo has been built correctly by
sending a sample audio file to your demo and checking that the returned audio file is indeed non-English. If your demo
works correctly, you'll get a green tick next to your name on the overall [progress space](https://huggingface.co/spaces/MariaK/Check-my-progress-Audio-Course)
16 changes: 16 additions & 0 deletions chapters/en/chapter7/introduction.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Unit 7. Putting it all together 🪢

Well done on making it to Unit 7 🥳 You're just a few steps away from completing the course and acquiring the final few
skills you need to navigate the field of Audio ML. In terms of understanding, you already know everything there is to know!
Together, we've comprehensively covered the main topics that constitute the audio domain and their accompanying theory
(audio data, audio classification, speech recognition and text-to-speech). What this Unit aims to deliver is a framework
for **putting it all together**: now that you know how each of these tasks work in isolation, we're going to explore how
you can combine them together to build some real-world applications.

## What you'll learn and what you'll build

In this Unit, we'll cover the following three topics:

* [Speech-to-speech translation](speech-to-speech): translate speech from one language into speech in a different language
* [Creating a voice assistant](voice-assistant): build your own voice assistant that works in a similar way to Alexa or Siri
* [Transcribing meetings](transcribe-meeting): transcribe a meeting and label the transcript with who spoke when
Loading

0 comments on commit 38e75fa

Please sign in to comment.