Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIDI-101: GAN for MIDI data #3

Draft
wants to merge 42 commits into
base: master
Choose a base branch
from
Draft

MIDI-101: GAN for MIDI data #3

wants to merge 42 commits into from

Conversation

SamuelJanas
Copy link
Collaborator

@SamuelJanas SamuelJanas commented Oct 2, 2023

@SamuelJanas
Copy link
Collaborator Author

Currently we are standing at

  1. MIDI Data Training Setup: Finalized the training pipeline for MIDI data.
  2. Visualization Bug Fix (TO-DO): Identified a minor bug in the WandB visualization logging.
  3. Model Compatibility (TO-DO): Planning to make adjustments to ensure the model is fully compatible with the new data format.

Base automatically changed from MIDI-92/improve-model to master October 3, 2023 04:40
@roszcz
Copy link
Member

roszcz commented Oct 4, 2023

@SamuelJanas Can you add in the PR description a short explanation of what you're trying to achieve here? I mean in terms of the MIDI structures, model outputs, and data flow - bonus points for mermaid diagrams :-)

@SamuelJanas
Copy link
Collaborator Author

For now I was trying to direct the generator towards plausible values. I started with normalization in [0, 1] range to discourage model from negative values, MIDI data doesn't really need those. I accounted for that by adding sigmoid activation at the end of the generator. For now the model well... struggles.

The issue is somehwere in the generator as sigmoid outputs mostly 1s and 0s. Currently trying to fix that but not quite sure yet how to tackle this issue.

Wanted to go for the bonus points but not quite sure what exactly should I show.
here's the overview of the model!

graph TD

    subgraph Generator
        noise[Noise] --> CT1[ConvTranspose1d + BN + LeakyReLU]
        CT1 --> CT2[ConvTranspose1d + BN + LeakyReLU]
        CT2 --> CT3[ConvTranspose1d + BN + LeakyReLU]
        CT3 --> CT4[ConvTranspose1d + BN + LeakyReLU]
        CT4 --> CT5[ConvTranspose1d + Sigmoid]
    end
    
    subgraph Training
        real[Real Data] --> Disc[Discriminator]
        noise --> Gen[Generator]
        fake[Generated Data from Gen] --> Disc
        Disc --> Loss[Compute Losses]
        Loss --> Backprop[Backpropagation]
        Backprop --> Opt[Optimizers]
    end
Loading

Comment on lines 42 to 43
dstart = fake_data[0, :] * 48.916666666666515
duration = fake_data[1, :] * (99.45833333333331 - 0.0010416666666515084) + 0.0010416666666515084
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The values here are min/max values calculated during normalization, they are printed during preprocessing. You can check them when running preprocess_maestro.py

@SamuelJanas
Copy link
Collaborator Author

SamuelJanas commented Oct 4, 2023

Update to the above comment,

The training looks a little surprising to say at least but it managed to create something that doesn't look as bad.

Most notes are extremely short but that is probably what happened to the data after normalization. I will be taking a closer look at what other possibilites I have. Might try different scaling and look for other activation function at the end of generator.

@roszcz
Copy link
Member

roszcz commented Oct 4, 2023

@SamuelJanas This piano roll looks very cool! Did you try listening to this?

@SamuelJanas
Copy link
Collaborator Author

@SamuelJanas This piano roll looks very cool! Did you try listening to this?

I'm glad you like what you're seeing!
I didn't listen to it yet as I believe we can make it even better with better normalization. I'll be creating evals for .mp3 and .mid generation from a trained model afterwards.

For now take a look at this wandb. There are many similar pianorolls generated by the model. It isn't training too well due to the gradient loss we are experiencing right now but it's consistently generating something that doesn't look utterly random!

@roszcz
Copy link
Member

roszcz commented Oct 4, 2023

I wouldn't make large bets against this being random 😅

Maybe instead of adding mp3 to wandb, it would make more sense to have a separate streamlit dashboard that would allow us to inspect and review the musicality of outputs generated by this model?

@SamuelJanas
Copy link
Collaborator Author

You can now explore wonderful GAN music on this simple dashboard:

I hope you enjoy experimental jazz, those generations make it sound mainstream! 😅

@SamuelJanas
Copy link
Collaborator Author

SamuelJanas commented Oct 4, 2023

There are several places where possible issues may be happening for now.

  1. the data is skewed to the left even after 'better normalization', there are more notes with dstart value closer to 0.05(post-norm, around 0-2 pre-norm) which can't be really observed in our generation, same goes with duration
  2. generator and/or discriminator may not be as capable as we want them to be. If you look at wandb I've linked, and notice the "cliff" on the loss functions, you can see that it's closely corelated with generator's weights going down significantly. This in turn translates to slower learning, and by slower I mean almost non-existent.
  3. ...

@SamuelJanas
Copy link
Collaborator Author

SamuelJanas commented Oct 5, 2023

To explain the latest commit. I've noticed that values inside the generator after BatchNorm were getting larger and larger until approx. the point at which the model "stopped" learning. I wanted to address this issue by clipping the gradient. Overall it helped with generations somewhat, the duration of generations falls more into that 0-20 seconds range, instead of previously seen > 1200 seconds long.

Stopped using BatchNorm after reading this article, but it's a matter I'll need to delve deeper into.

@roszcz
Copy link
Member

roszcz commented Oct 5, 2023

I think this looks mostly random, and I sort of suspect that it's not going to get much better with the current architecture 🤔 One avenue to explore could be having additional projection layers in the generator, dedicated to different elements of the MIDI performance:

class Generator(nn.Module):
    def forward(self, x):
        x = current_forward(x)
        velocity_out = self.velocity_projection_layer(x)
        dstart_out = self.dstart_projection_layer(x)
        ...

    return velocity_out, dstart_out, ...

Just thinking out loud, let me know if this makes sense (or not)

@SamuelJanas
Copy link
Collaborator Author

SamuelJanas commented Oct 6, 2023

I think this looks mostly random, and I sort of suspect that it's not going to get much better with the current architecture 🤔 One avenue to explore could be having additional projection layers in the generator, dedicated to different elements of the MIDI performance:

class Generator(nn.Module):
    def forward(self, x):
        x = current_forward(x)
        velocity_out = self.velocity_projection_layer(x)
        dstart_out = self.dstart_projection_layer(x)
        ...

    return velocity_out, dstart_out, ...

Just thinking out loud, let me know if this makes sense (or not)

A little hard to tell honestly, I'll have to experiment with that, in theory it looks good. What I'm mostly curious about is the reason the drop at ~90steps is happening:
W B Chart 6 10 2023, 12_29_41 (1)
I was using sgd optimizer, which worked well on ECG data but I might want to switch back to ADAM for this one. I'll upload results from projection layers after some experimenting.
EDIT: wrong message

@SamuelJanas
Copy link
Collaborator Author

Seems I didn't need to wonder for long. Switching the optimizer back to ADAM solves this issue

Now looking at the graphs, making the generator stronger is a reasonable idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants