MIDI-101: GAN for MIDI data #3

SamuelJanas · 2023-10-02T10:42:11Z

SamuelJanas · 2023-10-02T12:56:34Z

Currently we are standing at

MIDI Data Training Setup: Finalized the training pipeline for MIDI data.
Visualization Bug Fix (TO-DO): Identified a minor bug in the WandB visualization logging.
Model Compatibility (TO-DO): Planning to make adjustments to ensure the model is fully compatible with the new data format.

…-101/MIDI-GAN

roszcz · 2023-10-04T11:14:26Z

@SamuelJanas Can you add in the PR description a short explanation of what you're trying to achieve here? I mean in terms of the MIDI structures, model outputs, and data flow - bonus points for mermaid diagrams :-)

SamuelJanas · 2023-10-04T11:37:01Z

For now I was trying to direct the generator towards plausible values. I started with normalization in [0, 1] range to discourage model from negative values, MIDI data doesn't really need those. I accounted for that by adding sigmoid activation at the end of the generator. For now the model well... struggles.

The issue is somehwere in the generator as sigmoid outputs mostly 1s and 0s. Currently trying to fix that but not quite sure yet how to tackle this issue.

Wanted to go for the bonus points but not quite sure what exactly should I show.
here's the overview of the model!

graph TD

    subgraph Generator
        noise[Noise] --> CT1[ConvTranspose1d + BN + LeakyReLU]
        CT1 --> CT2[ConvTranspose1d + BN + LeakyReLU]
        CT2 --> CT3[ConvTranspose1d + BN + LeakyReLU]
        CT3 --> CT4[ConvTranspose1d + BN + LeakyReLU]
        CT4 --> CT5[ConvTranspose1d + Sigmoid]
    end
    
    subgraph Training
        real[Real Data] --> Disc[Discriminator]
        noise --> Gen[Generator]
        fake[Generated Data from Gen] --> Disc
        Disc --> Loss[Compute Losses]
        Loss --> Backprop[Backpropagation]
        Backprop --> Opt[Optimizers]
    end

SamuelJanas · 2023-10-04T11:58:50Z

training/train_midi.py

+    dstart = fake_data[0, :] * 48.916666666666515
+    duration = fake_data[1, :] * (99.45833333333331 - 0.0010416666666515084) + 0.0010416666666515084


The values here are min/max values calculated during normalization, they are printed during preprocessing. You can check them when running preprocess_maestro.py

SamuelJanas · 2023-10-04T12:08:27Z

Update to the above comment,

The training looks a little surprising to say at least but it managed to create something that doesn't look as bad.

Most notes are extremely short but that is probably what happened to the data after normalization. I will be taking a closer look at what other possibilites I have. Might try different scaling and look for other activation function at the end of generator.

roszcz · 2023-10-04T13:04:54Z

@SamuelJanas This piano roll looks very cool! Did you try listening to this?

SamuelJanas · 2023-10-04T13:12:10Z

@SamuelJanas This piano roll looks very cool! Did you try listening to this?

I'm glad you like what you're seeing!
I didn't listen to it yet as I believe we can make it even better with better normalization. I'll be creating evals for .mp3 and .mid generation from a trained model afterwards.

For now take a look at this wandb. There are many similar pianorolls generated by the model. It isn't training too well due to the gradient loss we are experiencing right now but it's consistently generating something that doesn't look utterly random!

roszcz · 2023-10-04T13:38:43Z

I wouldn't make large bets against this being random 😅

Maybe instead of adding mp3 to wandb, it would make more sense to have a separate streamlit dashboard that would allow us to inspect and review the musicality of outputs generated by this model?

SamuelJanas · 2023-10-04T14:36:51Z

You can now explore wonderful GAN music on this simple dashboard:

I hope you enjoy experimental jazz, those generations make it sound mainstream! 😅

SamuelJanas · 2023-10-04T14:48:54Z

There are several places where possible issues may be happening for now.

the data is skewed to the left even after 'better normalization', there are more notes with dstart value closer to 0.05(post-norm, around 0-2 pre-norm) which can't be really observed in our generation, same goes with duration
generator and/or discriminator may not be as capable as we want them to be. If you look at wandb I've linked, and notice the "cliff" on the loss functions, you can see that it's closely corelated with generator's weights going down significantly. This in turn translates to slower learning, and by slower I mean almost non-existent.
...

SamuelJanas · 2023-10-05T19:06:09Z

To explain the latest commit. I've noticed that values inside the generator after BatchNorm were getting larger and larger until approx. the point at which the model "stopped" learning. I wanted to address this issue by clipping the gradient. Overall it helped with generations somewhat, the duration of generations falls more into that 0-20 seconds range, instead of previously seen > 1200 seconds long.

Stopped using BatchNorm after reading this article, but it's a matter I'll need to delve deeper into.

roszcz · 2023-10-05T19:25:47Z

I think this looks mostly random, and I sort of suspect that it's not going to get much better with the current architecture 🤔 One avenue to explore could be having additional projection layers in the generator, dedicated to different elements of the MIDI performance:

class Generator(nn.Module):
    def forward(self, x):
        x = current_forward(x)
        velocity_out = self.velocity_projection_layer(x)
        dstart_out = self.dstart_projection_layer(x)
        ...

    return velocity_out, dstart_out, ...

Just thinking out loud, let me know if this makes sense (or not)

SamuelJanas · 2023-10-06T09:05:43Z

I think this looks mostly random, and I sort of suspect that it's not going to get much better with the current architecture 🤔 One avenue to explore could be having additional projection layers in the generator, dedicated to different elements of the MIDI performance:

class Generator(nn.Module):
    def forward(self, x):
        x = current_forward(x)
        velocity_out = self.velocity_projection_layer(x)
        dstart_out = self.dstart_projection_layer(x)
        ...

    return velocity_out, dstart_out, ...

Just thinking out loud, let me know if this makes sense (or not)

A little hard to tell honestly, I'll have to experiment with that, in theory it looks good. What I'm mostly curious about is the reason the drop at ~90steps is happening:

I was using sgd optimizer, which worked well on ECG data but I might want to switch back to ADAM for this one. I'll upload results from projection layers after some experimenting.
EDIT: wrong message

SamuelJanas · 2023-10-06T10:18:34Z

Seems I didn't need to wonder for long. Switching the optimizer back to ADAM solves this issue

Now looking at the graphs, making the generator stronger is a reasonable idea.

SamuelJanas added 28 commits September 26, 2023 09:08

experiment with noise size

8d2823c

combine all data for loader

ddf0b5b

batch_size

e39e4bb

broken TGAN

d727ade

changed to 2 epochs

cd40bf2

code from vqvae-midi

cb5ef8a

encode, decode function

f5cd56d

copied transformer code to revise

a80a11d

rename to vqvae

bb20dba

refactor 01

bc79e4e

vqvae training for 1 chanel

a20fefd

cleaned dataset

16bd6b0

interpolation eval

76b6044

remove tanh

647e768

compute statistics for all data

1d2ec42

normalized data, tanh

dfc022b

denormalization in evals

b3bfe90

Add ResidualBlocks

6bfa9c6

update config

199308e

Update README

433e41b

midi data loader

fd30870

rolling window from maestro

208fe8e

Move trainings

4958f8f

update dataloader

6478d5a

training for midi

6831a09

inlcude pitch for interpretable generations

e83b985

midi generation

d5193a0

config update

3d75838

Base automatically changed from MIDI-92/improve-model to master October 3, 2023 04:40

SamuelJanas added 7 commits October 3, 2023 20:08

Merge branch 'master' of https://github.com/Nospoko/ecg-gan into MIDI…

5e27bc3

…-101/MIDI-GAN

deleted train files

52fbc1b

Add constraints to generator

8041eb7

normalize dstart and duration for consistency

183509c

normalize velocities and pitches

74202cb

remove unnecessary dimension in visualization

7323b8b

account for normalization

5e92181

denormalize for visualization

5ac941b

SamuelJanas commented Oct 4, 2023

View reviewed changes

SamuelJanas added 2 commits October 4, 2023 15:46

log scaling

963d235

simplistic dashboard

5c5ec49

add gradient clip and batchNorm->instanceNorm

6c31c31

SamuelJanas added 2 commits October 6, 2023 11:29

Adjust output size in the model

c5ee4a6

switch to ADAM

0099138

save midi files, dashboard

3b971e6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MIDI-101: GAN for MIDI data #3

MIDI-101: GAN for MIDI data #3

SamuelJanas commented Oct 2, 2023 •

edited by jira bot

Loading

SamuelJanas commented Oct 2, 2023

roszcz commented Oct 4, 2023

SamuelJanas commented Oct 4, 2023

SamuelJanas Oct 4, 2023

SamuelJanas commented Oct 4, 2023 •

edited

Loading

roszcz commented Oct 4, 2023

SamuelJanas commented Oct 4, 2023

roszcz commented Oct 4, 2023

SamuelJanas commented Oct 4, 2023

SamuelJanas commented Oct 4, 2023 •

edited

Loading

SamuelJanas commented Oct 5, 2023 •

edited

Loading

roszcz commented Oct 5, 2023 •

edited

Loading

SamuelJanas commented Oct 6, 2023 •

edited

Loading

SamuelJanas commented Oct 6, 2023

		dstart = fake_data[0, :] * 48.916666666666515
		duration = fake_data[1, :] * (99.45833333333331 - 0.0010416666666515084) + 0.0010416666666515084

MIDI-101: GAN for MIDI data #3

Are you sure you want to change the base?

MIDI-101: GAN for MIDI data #3

Conversation

SamuelJanas commented Oct 2, 2023 • edited by jira bot Loading

SamuelJanas commented Oct 2, 2023

roszcz commented Oct 4, 2023

SamuelJanas commented Oct 4, 2023

SamuelJanas Oct 4, 2023

Choose a reason for hiding this comment

SamuelJanas commented Oct 4, 2023 • edited Loading

roszcz commented Oct 4, 2023

SamuelJanas commented Oct 4, 2023

roszcz commented Oct 4, 2023

SamuelJanas commented Oct 4, 2023

SamuelJanas commented Oct 4, 2023 • edited Loading

SamuelJanas commented Oct 5, 2023 • edited Loading

roszcz commented Oct 5, 2023 • edited Loading

SamuelJanas commented Oct 6, 2023 • edited Loading

SamuelJanas commented Oct 6, 2023

SamuelJanas commented Oct 2, 2023 •

edited by jira bot

Loading

SamuelJanas commented Oct 4, 2023 •

edited

Loading

SamuelJanas commented Oct 4, 2023 •

edited

Loading

SamuelJanas commented Oct 5, 2023 •

edited

Loading

roszcz commented Oct 5, 2023 •

edited

Loading

SamuelJanas commented Oct 6, 2023 •

edited

Loading