Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

neural nets optimizer shape mismatch during backward pass #78

Open
srs3 opened this issue Aug 8, 2022 · 0 comments
Open

neural nets optimizer shape mismatch during backward pass #78

srs3 opened this issue Aug 8, 2022 · 0 comments

Comments

@srs3
Copy link

srs3 commented Aug 8, 2022

@ddbourgin Have an issue where updates to gradients cannot be performed since shapes conflict during backprop... specifically in the optimizer file.

Error reads:

C[param_name]["mean"] = d1 * mean + (1 - d1) * param_grad
ValueError: operands could not be broadcast together with shapes (100,10) (3072,100) 

Model architecture is as follows:

Input -> n_samples, 3072
FC1 -> 3072, 100
FC2 -> 100, 10

The model code is as follows:

def _build_model(self):
    self.model = OrderedDict()
    self.model['fc1'] = FullyConnected(n_out=self.layers[0],
                                       act_fn=ReLU(),
                                       init=self.initializer,
                                       optimizer=self.optimizer)


    self.model['fc2'] = FullyConnected(n_out=self.layers[1],
                                       act_fn=Affine(slope=1, intercept=0),
                                       init=self.initializer,
                                       optimizer=self.optimizer)


    self.model['out'] = Softmax(dim=-1,
                                optimizer=self.optimizer)

@property
def parameters(self):
    return {k: v.parameters for k, v in self.model.items()}

@property
def hyperparameters(self):
    return {k: v.hyperparameters for k, v in self.model.items()}

@property
def derived_variables(self):
    return {k: v.derived_variables for k, v in self.model.items()}

@property
def gradients(self):
    return {k: v.gradients for k, v in self.model.items()}

def forward(self, x):
    out = x
    for k, v in self.model.items():
        out = v.forward(out)
    return out

def backward(self, y, y_pred):
    """Compute dLdy and then backprop through the layers in self.model"""
    dY_pred = self.loss.grad(y, y_pred)
    for k, v in reversed(list(self.model.items())):
        dY_pred = v.backward(dY_pred)
        self._dv['d' + k] = dY_pred
    return dY_pred

def update(self, cur_loss):
    """Perform gradient updates"""
    for k, v in reversed(list(self.model.items())):
        v.update(cur_loss)
    self.flush_gradients()

Hoping we can fix this and also create an example for people to follow. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant