Skip to content

Midi 120: Unsupervised learning and fine-tuning #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 57 commits into
base: master
Choose a base branch
from

Conversation

WojciechMat
Copy link
Contributor

@WojciechMat WojciechMat commented Oct 26, 2023

T5 denoising

As described in Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer, T5 model is pre-trained on denoising objective with sentinel tokens as masks.

In this implementation I have masking_probability * len(sequence) of tokens masked at random. Then each mask is replaced with sentinel token of increasing id, so each sentinel token in a given sequence is unique. If there are several <MASK> tokens one after another they are replaced with a single sentinel token.

Target corresponding with this source sequence is a sequence of tokens masked in source. Other tokens are masked with sentinel tokens.

Representation of this masking method on text from the above paper:
image

Model

The default model is a full-sized 44M-parameter T5 model, trained with 3e-05 learning rate with 0.15 note masking probability for 4 epochs on maestro-v1-sustain.
See its run here.

Dashboard

To download the model run

python -m dashboard.download_models

Then, you can look at model predictions using

PYTHONPATH=. streamlit run --server.port 4466 dashboard/denoise/main.py

@WojciechMat WojciechMat changed the base branch from master to MIDI-119/classic-tok October 26, 2023 16:45
@WojciechMat WojciechMat marked this pull request as draft October 26, 2023 17:55
@WojciechMat
Copy link
Contributor Author

WojciechMat commented Oct 27, 2023

I have done an experiment on optimizing encoders and training and here are the results (the values are in seconds):
image
The differences are small but in a quest for perfection I feel obligated to strive for the best possible solution.

@roszcz
Copy link
Member

roszcz commented Oct 27, 2023

I have done an experiment on optimizing encoders and training and here are the results (the values are in seconds): image The differences are small but in a quest for perfection I feel obligated to strive for the best possible solution.

I'm not sure what I'm looking at, can you describe what exactly are you comparing here? Also please don't use images to share text information - try to take advantage of the text formatting options provided by github :)

I'm all about perfection here, but make sure that you're not hanging up on early micro optimizations - even if you can do something 5x faster it's not going to be very productive if that thing is only 1% of the total computational cost :)

@WojciechMat
Copy link
Contributor Author

Getting somewhere 🚀

image
image

The model

I picked a learning rate of 3e-05, trained a full-sized model (44M parameters) for 4 epochs on maestro-v1-sustain.

The model has learnt to formulate notes with pitch-time-velocity sequences. We are actually able to listen to model predictions!
They sound very good but I wonder what will happen if I were to use more masks (15% of original sequence is masked right now).

I will probably try to make the dashboard look more like Tomek's BERT inference dashboard to highlight notes predicted by the model.

Some of the changes:

PEP 526

I started using variable annotations from PEP 526 where I find them helpful.

eos_token_id

I was using the default 1 as eos token id in the T5Config...
I use

config = T5Config(
        vocab_size=vocab_size(train_cfg),
        decoder_start_token_id=start_token_id,
        pad_token_id=pad_token_id,
        eos_token_id=pad_token_id,
        use_cache=False,
        d_model=train_cfg.model.d_model,
        d_kv=train_cfg.model.d_kv,
        d_ff=train_cfg.model.d_ff,
        num_layers=train_cfg.model.num_layers,
        num_heads=train_cfg.model.num_heads,
    )

now.

@WojciechMat WojciechMat marked this pull request as ready for review November 10, 2023 11:08
@WojciechMat WojciechMat requested a review from roszcz November 10, 2023 11:08
@WojciechMat WojciechMat changed the base branch from MIDI-119/classic-tok to master December 26, 2023 16:30
@WojciechMat WojciechMat force-pushed the MIDI-120/unsupervised-training branch from 6abb305 to 5f4b377 Compare December 26, 2023 16:44
@WojciechMat WojciechMat changed the title Midi 120/unsupervised training Midi 120: Unsupervised learning and fine-tuning Dec 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants