Fix shape mismatch error during backpropagation in MLP optimizer #96

achal-khanna · 2024-10-10T12:13:16Z

This submission addresses the issue tracked in #78.

Root Cause

In optimizers like Adam and SGD, the self.cache was shared among all layers, leading to a situation where the cache keys were simply W and b. As a result, when different layers attempted to update their parameters, they all referred to the same cache entries. This led to shape mismatches because the updates for different layers were not properly isolated.

For instance, the cache should have unique keys like layer1-W, layer1-b, layer2-W, etc., but instead, all parameters were using the same keys, resulting in conflicts during backpropagation.

Solution

The solution involved ensuring that each layer maintained its own cache. This was done by creating a deepcopy of the optimizer linked to each specific layer during its initialization. This way, each layer could independently manage its cache.

All Submissions

Is the code you are submitting your own work?
Have you followed the contributing guidelines?
Have you checked to ensure there aren't other open Pull Requests for the same update/change?

Changes to Existing Models

Have you added an explanation of what your changes do and why you'd like us to include them?
Have you written new tests for your changes, as applicable?
Have you successfully ran tests with your changes locally?

Fix shape mismatch error during backpropagation in MLP optimizer

eb6fefc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix shape mismatch error during backpropagation in MLP optimizer #96

Fix shape mismatch error during backpropagation in MLP optimizer #96

Uh oh!

achal-khanna commented Oct 10, 2024

Uh oh!

Uh oh!

Fix shape mismatch error during backpropagation in MLP optimizer #96

Are you sure you want to change the base?

Fix shape mismatch error during backpropagation in MLP optimizer #96

Uh oh!

Conversation

achal-khanna commented Oct 10, 2024

All Submissions

Changes to Existing Models

Uh oh!

Uh oh!