-
Notifications
You must be signed in to change notification settings - Fork 0
Midi 120: Unsupervised learning and fine-tuning #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Getting somewhere 🚀The modelI picked a learning rate of The model has learnt to formulate notes with pitch-time-velocity sequences. We are actually able to listen to model predictions! I will probably try to make the dashboard look more like Tomek's BERT inference dashboard to highlight notes predicted by the model. Some of the changes:PEP 526I started using variable annotations from PEP 526 where I find them helpful. eos_token_idI was using the default config = T5Config(
vocab_size=vocab_size(train_cfg),
decoder_start_token_id=start_token_id,
pad_token_id=pad_token_id,
eos_token_id=pad_token_id,
use_cache=False,
d_model=train_cfg.model.d_model,
d_kv=train_cfg.model.d_kv,
d_ff=train_cfg.model.d_ff,
num_layers=train_cfg.model.num_layers,
num_heads=train_cfg.model.num_heads,
) now. |
6abb305
to
5f4b377
Compare
T5 denoising
As described in Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer, T5 model is pre-trained on denoising objective with sentinel tokens as masks.
In this implementation I have
masking_probability * len(sequence)
of tokens masked at random. Then each mask is replaced with sentinel token of increasing id, so each sentinel token in a given sequence is unique. If there are several <MASK> tokens one after another they are replaced with a single sentinel token.Target corresponding with this source sequence is a sequence of tokens masked in source. Other tokens are masked with sentinel tokens.
Representation of this masking method on text from the above paper:

Model
The default model is a full-sized 44M-parameter T5 model, trained with 3e-05 learning rate with 0.15 note masking probability for 4 epochs on maestro-v1-sustain.
See its run here.
Dashboard
To download the model run
Then, you can look at model predictions using