Implementing a 'MidiME-style VAE' for Music Generation with PythAE #111

doyney · 2023-10-18T18:07:15Z

doyney
Oct 18, 2023

Hi,

I am working on a project to generate music in a specific style based on a MIDI file, following the approach outlined in Google's MidiME paper
(https://storage.googleapis.com/pub-tools-public-publication-data/pdf/c667ad30514350d65e9fa591f8b2263a8abcc9fd.pdf).

MidiME suggests training a VAE on the latent space of a larger pre-trained VAE to summarise the latent vectors and sample from vectors that are representative of a particular musical style.

I read about the TwoStageVAESampler in the pythae docs and it seems to offer a similar two-stage VAE training approach! Specifically, it allows training a secondary VAE on the latent representations produced by a primary VAE?

On the face of it this library seems perfect, but I'd like to ask to clarification questions:

Is pythae compatible with many types of data? Specifically music data like MIDI?
Do you think using the two stage VAE sampler is appropriate for this task.
Can you use pythae with pretrained models?

I'm still getting my head around my project, so hopefully I've been clear enough. I'd really appreciate any thoughts :)

clementchadebec · 2023-10-22T21:07:16Z

clementchadebec
Oct 22, 2023
Maintainer

Hello @doyney,

Thank you very much for your interest in this library. As to your questions:

Is pythae compatible with many types of data? Specifically music data like MIDI?

Yes it is since you can define your own autoencoding architecture (as explained here and in this example). The only thing you need to ensure it that your encoder outputs the emdeddings and associated covariances with correct shape (i.e. compatible with the latent space dimension) and your decoder the reconstructed samples. You can hence choose the type of neural nets you want to work with and everything should be working.

Do you think using the two stage VAE sampler is appropriate for this task.

From what I understand in the paper, the authors indeed proposed to train a second VAE in the latent space of the first one to learn a compressed representation of the features. This is indeed pretty similar to the approach used in the 2-stage VAE sampler. Please nonetheless note that the TwoStageVAESampler implementation was though as a sampler meaning that it is intended to be used to generate new data. If you are interested in analyzing some elements of your second VAE (like the learned latent codes) I suggest that you use the VAE model implementation instead and train it on the latent codes learned by the first stage VAE. By design, the VAE model allows an easier access to elements such as latent codes, reconstructed samples etc.

Can you use pythae with pretrained models?

For models trained with pythae they can be reloaded using the AutoModel class pretty easily. To include models trained with other implementations, it should be possible to reproduce the architecture of the encoder/decoder and transfer the weigths so the model follows the pythae design. If you think of using the pre-trained music VAE and use its latent codes to trained the second one, I am not sure transfering the weights is needed though.

I hope this helps,

Best,

Clément

1 reply

doyney Oct 23, 2023
Author

Thank you Clément. Unbelievably helpful!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing a 'MidiME-style VAE' for Music Generation with PythAE #111

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Implementing a 'MidiME-style VAE' for Music Generation with PythAE #111

doyney Oct 18, 2023

Replies: 1 comment · 1 reply

clementchadebec Oct 22, 2023 Maintainer

doyney Oct 23, 2023 Author

doyney
Oct 18, 2023

Replies: 1 comment 1 reply

clementchadebec
Oct 22, 2023
Maintainer

doyney Oct 23, 2023
Author