Skip to content

Implementation of SoundStream, an end-to-end neural audio codec

License

Notifications You must be signed in to change notification settings

haydenshively/SoundStream

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SoundStream - PyTorch

Implementation of SoundStream, an end-to-end neural audio codec

Figure 2 from the SoundStream paper

  • 🔊 Implements SoundStream model inference
  • 🎛️ Works with 27M parameter model pretrained on 10k hours of English speech (Multilingual LibriSpeech dataset)

Install

pip install soundstream

Usage

Note The pretrained model is configured as specified in NaturalSpeech 2, so it has different channels/strides than the original SoundStream.

import torchaudio

from soundstream import from_pretrained, load


waveform = load('in.wav')
audio_codec = from_pretrained()  # downloads model from Hugging Face

quantized = audio_codec(waveform, mode='encode')
recovered = audio_codec(quantized, mode='decode')

torchaudio.save('out.wav', recovered[0], 16000)

Citations

Code

Papers

@misc{zeghidour2021soundstream,
      title={SoundStream: An End-to-End Neural Audio Codec}, 
      author={Neil Zeghidour and Alejandro Luebs and Ahmed Omran and Jan Skoglund and Marco Tagliasacchi},
      year={2021},
      eprint={2107.03312},
      archivePrefix={arXiv},
      primaryClass={cs.SD}
}
@misc{kumar2019melgan,
      title={MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis}, 
      author={Kundan Kumar and Rithesh Kumar and Thibault de Boissiere and Lucas Gestin and Wei Zhen Teoh and Jose Sotelo and Alexandre de Brebisson and Yoshua Bengio and Aaron Courville},
      year={2019},
      eprint={1910.06711},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}
@misc{tagliasacchi2020seanet,
      title={SEANet: A Multi-modal Speech Enhancement Network}, 
      author={Marco Tagliasacchi and Yunpeng Li and Karolis Misiunas and Dominik Roblek},
      year={2020},
      eprint={2009.02095},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}
@misc{shen2023naturalspeech,
      title={NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers}, 
      author={Kai Shen and Zeqian Ju and Xu Tan and Yanqing Liu and Yichong Leng and Lei He and Tao Qin and Sheng Zhao and Jiang Bian},
      year={2023},
      eprint={2304.09116},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}

About

Implementation of SoundStream, an end-to-end neural audio codec

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages