Can we write test_step function outside the FOR epoch loop? #1055

SailSabnis · 2024-08-23T04:44:09Z

SailSabnis
Aug 23, 2024

Trying to understand why is it necessary to put the test_step function inside the For epoch loop along with the train_step function. We are not optimising anything in test_step. I have been running my model by keeping it outside and not seeing any improvement in test accuracy. So I tried putting it inside the epoch for loop and ran it for 20 epochs. Optimizer is Adam with Lr or 0.001. How to reduce the test loss to under 0.01?

Full model arch

My Confusion Matrix is all over the place PLEASE HELP!

LuluW8071 · 2024-08-23T08:04:55Z

LuluW8071
Aug 23, 2024

@SailSabnis, I noticed that ur models test losses are increasing, indicating that it is starting to overfit. This suggests that the model is learning the training set well but struggling to generalize to the test set. To overcome this issue, consider introducing BatchNorm2d layers into ur model. These can improve generalization and stabilize training by normalizing activations across each batch.

Add BatchNorm2d( )
More hidden layers

class CNNV2(nn.Module):
    def __init__(self, input, output, hidden):
        super(CNNV2, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(in_channels=input, out_channels=hidden, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(hidden),
            nn.ReLU(),
            nn.Conv2d(in_channels=hidden, out_channels=hidden*2, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(hidden*2),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )

        self.layer2 = nn.Sequential(
            nn.Conv2d(in_channels=hidden*2, out_channels=hidden*4, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(hidden*4),
            nn.ReLU(),
            nn.Conv2d(in_channels=hidden*4, out_channels=hidden, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(hidden),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )
        
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(hidden*7*7, output)
        )

    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        return self.classifier(x)

# Instantiation of model
modelv2 = CNNV2(input=1, output=10, hidden=64)

And the answer to ur question

Can we write test_step function outside the FOR epoch loop?

Yes u can call test function outside the loop. It should run test at the end of final train epoch. But the purpose of doing test after train within loop is to monitor the train and test losses and accuracies. By observing these metrics, we can determine if the model is:

Learning to evaluate on test/unseen data
Overfitting, underfitting, or achieving the best fit

3 replies

SailSabnis Aug 23, 2024
Author

Thank you so much @LuluW8071 . Tried adding the BatchNorm2D layer but my loss still increases with each epoch after ~ first 6 epochs.

Few follows ups (if I may) -

How did you decide out_channels=hidden2 or out_channels = hidden4. What is the logic behind this?
Is there any rule of thumb to follow to decide Batch_size , learning rate, hidden layers, conv2 layers, # of epochs? Or it is purely trial and error way. I feel like I am wasting a lot of time just getting these values correct.
How did Daniel achieved such near to perfect confusion Matrix with just ~88% accuracy whereas my model's confusion matrix is all over the place even with ~91% accuracy?

LuluW8071 Aug 23, 2024

How did you decide out_channels=hidden * 2 or out_channels = hidden * 4. What is the logic behind this?

It's just a method for increasing the number of hidden layers between the input and output layers. A basic math where u increase hidden layers * 2 and hidden layers * 3 and so forth. Here is a visual representation.

Is there any rule of thumb to follow to decide Batch_size , learning rate, hidden layers, conv2 layers, # of epochs? Or it is purely trial and error way. I feel like I am wasting a lot of time just getting these values correct.

Batch size: Generally selected as powers of 2 (e.g., 2, 4, 8, 16, 32, 64, 128, 256, 512, etc.). Choose the batch size based on your device's computational capacity.
Learning rate: Typically ranges from 1e-3 to 1e-6, depending on the situation. This often requires experimentation. For computer vision tasks, it's usually around 1e-3, while for NLP, it's closer to 2e-5.
Hidden size: Similar to batch size, following the 2^n rule. I usually start with 32 or 64, but this can vary based on the dataset and model complexity.
Conv2 layers: I experiment with this. Typically, I use up to 3 sequential layers with 2 convolutional layers in each sequence for basic model and if this method doesnt suffice just use transfer learning approach.
Epochs: This also requires experimentation. For initial testing, I use 10 epochs using subset of training data and then adjust to 20 or 50 using actual training data based on the results. Sometimes, I use 100 epochs with early stopping and checkpointing, depending on model complexity.

How did Daniel achieved such near to perfect confusion Matrix with just ~88% accuracy whereas my model's confusion matrix is all over the place even with ~91% accuracy?

There might be an issue in ur training loop or somewhere in data loading that u haven't identified yet. From my experience, u should start seeing results within the first 10 epochs, as demonstrated in the Daniel PyTorch video.

SailSabnis Aug 24, 2024
Author

Thank you so much @LuluW8071 . Grateful for you taking the time out to answer my questions in detail.
After hours of trying to figure out what's happening. I figured it out!

My model was performing nicely. I still need to bring down the loss but I will experiment with it later. My main issue was why my confusion matrix was so wrong even when model's accuracy was similar to Daniel's accuracy. And you know what the problem was?

I was shuffling my test_dataloader !!!!

So, when comparing y_pred_tensor with test_data.targets the order of the values was messing up. I switched it to False and my confusion matrix now looks wayyy better -

Thank you so much for your time!

Can we write test_step function outside the FOR epoch loop? #1055

Uh oh!

Uh oh!

SailSabnis Aug 23, 2024

Replies: 1 comment · 3 replies

Uh oh!

Uh oh!

LuluW8071 Aug 23, 2024

Uh oh!

Uh oh!

SailSabnis Aug 23, 2024 Author

Uh oh!

Uh oh!

LuluW8071 Aug 23, 2024

Uh oh!

Uh oh!

SailSabnis Aug 24, 2024 Author

I was shuffling my test_dataloader !!!!

SailSabnis
Aug 23, 2024

Replies: 1 comment 3 replies

LuluW8071
Aug 23, 2024

SailSabnis Aug 23, 2024
Author

SailSabnis Aug 24, 2024
Author