-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MIDI-101: GAN for MIDI data #3
base: master
Are you sure you want to change the base?
Conversation
Currently we are standing at
|
@SamuelJanas Can you add in the PR description a short explanation of what you're trying to achieve here? I mean in terms of the MIDI structures, model outputs, and data flow - bonus points for mermaid diagrams :-) |
training/train_midi.py
Outdated
dstart = fake_data[0, :] * 48.916666666666515 | ||
duration = fake_data[1, :] * (99.45833333333331 - 0.0010416666666515084) + 0.0010416666666515084 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The values here are min/max values calculated during normalization, they are printed during preprocessing. You can check them when running preprocess_maestro.py
@SamuelJanas This piano roll looks very cool! Did you try listening to this? |
I'm glad you like what you're seeing! For now take a look at this wandb. There are many similar pianorolls generated by the model. It isn't training too well due to the gradient loss we are experiencing right now but it's consistently generating something that doesn't look utterly random! |
I wouldn't make large bets against this being random 😅 Maybe instead of adding mp3 to wandb, it would make more sense to have a separate streamlit dashboard that would allow us to inspect and review the musicality of outputs generated by this model? |
There are several places where possible issues may be happening for now.
|
To explain the latest commit. I've noticed that values inside the generator after BatchNorm were getting larger and larger until approx. the point at which the model "stopped" learning. I wanted to address this issue by clipping the gradient. Overall it helped with generations somewhat, the duration of generations falls more into that 0-20 seconds range, instead of previously seen > 1200 seconds long. Stopped using BatchNorm after reading this article, but it's a matter I'll need to delve deeper into. |
I think this looks mostly random, and I sort of suspect that it's not going to get much better with the current architecture 🤔 One avenue to explore could be having additional projection layers in the generator, dedicated to different elements of the MIDI performance: class Generator(nn.Module):
def forward(self, x):
x = current_forward(x)
velocity_out = self.velocity_projection_layer(x)
dstart_out = self.dstart_projection_layer(x)
...
return velocity_out, dstart_out, ... Just thinking out loud, let me know if this makes sense (or not) |
MIDI-101