Skip to content

jupyter notebooks to fine tune whisper models on Vietnamese using Colab and/or Kaggle and/or AWS EC2

License

Notifications You must be signed in to change notification settings

phineas-pta/fine-tune-whisper-vi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fine-tune whisper vi

jupyter notebooks to fine tune whisper models on vietnamese using kaggle (should also work on colab but not throughly tested)

using my collection of vietnamese speech datasets: https://huggingface.co/collections/doof-ferb/vietnamese-speech-dataset-65c6af8c15c9950537862fa6

N.B.1 import any trainer or pipeline class from transformers crash kaggle TPU session (see huggingface/transformers#28609) so better use GPU

N.B.2 trainer class from transformers can auto use multi-GPU like kaggle free T4×2 without code change by default trainer use naive model parallelism which cannot fully use all gpu in same time, so better use distributed data parallelism

N.B.3 use default greedy search, because beam search trigger a spike in VRAM usage which may cause out-of-memory (original whisper use num beams = 5, something like do_sample=True, num_beams=5)

N.B.4 if use kaggle + resume training, remember to enable files persistency before launching

scripts

evaluate accuracy (WER) with batched inference:

fine-tune whisper tiny with traditional approach:

fine-tine whisper large with PEFT-LoRA + int8:

(testing - not always working) fine-tune wav2vec v2 bert: w2v-bert-v2.ipynb

docker image to run on AWS EC2: Dockerfile, comes with standalone scripts

convert to openai-whisper, whisper.cpp, faster-whisper, ONNX, TensorRT: not yet

miscellaneous: convert to huggingface audio datasets format

resources