Skip to content

Feature/orthogonal initialization #110

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

miquelflorensa
Copy link
Collaborator

@miquelflorensa miquelflorensa commented Jan 20, 2025

Description

This PR added orthogonal initialization for Linear, Convolutional and LSTM layers.

For Linear and LSTM layers, orthogonality is achieved on matrices of size input × output. For CNN layers, orthogonality is enforced between the kernels. By initializing weight matrices with orthogonal structures, the norm of the input is preserved across layers, leading to a more stable training.

Changes Made

  • Added orthogonal init function in src/param_init.cpp.
  • Modified initialization of Linear, CNN and LSTM layers to accept orthogonal init in src/param_init.cpp.
  • Added #include <eigen3/Eigen/Dense> in include/param_init.h.

Checklist

  • I have followed the project's coding conventions and style guidelines.
  • I have updated the documentation, if applicable.
  • I have rebased my branch on the latest upstream code to incorporate any changes.
  • I have tested the changes locally.

Notes for Reviewers

I use the external library Eigen to perform SVD efficiently on C++. Hence it is necessary to install the library running:

sudo apt install libeigen3-dev

or manually installing the library from Eigen.

To use orthogonal initialization in Python, update the init_method parameter as shown below:

TAGI_CNN_BATCHNORM = Sequential(
    Conv2d(1, 32, 4, padding=1, in_width=28, in_height=28, bias=False, init_method="orthogonal"),
    ReLU(),
    BatchNorm2d(32),
    AvgPool2d(3, 2),
    Conv2d(32, 64, 5, bias=False, init_method="orthogonal"),
    ReLU(),
    BatchNorm2d(64),
    AvgPool2d(3, 2),
    Linear(64 * 4 * 4, 256, init_method="orthogonal"),
    ReLU(),
    Linear(256, 11, init_method="orthogonal"),
)

@miquelflorensa
Copy link
Collaborator Author

@lhnguyen102 I added the orthogonal initialization by using an external C++ library. Should I somehow try to implement a simple SVD or should we add it as another external library?

@lhnguyen102
Copy link
Owner

@miquelflorensa We tend to move away from external libraries but in this case, it will take time to implement SVD. I would rather focus on the relevant work. Please make sure it works on both MACOS and Ubuntu. I would appreciate if you can update the installation instructions in the doc for both OS

@jamesgoulet
Copy link
Collaborator

@miquelflorensa I tried installing your branch on my Mac and it does not compile.

Regarding the dependency on external library, I did not realized that one would have to manually install the two libraries in order to pip install .. It seems limiting from an external user point of view. Can you check the feasibility of using chatgpt to create a cpp svd function that would be equivalent to the one you are using? I can assume it will not be as efficient, but this SVD is only use in this initialization step. We can discuss it further over zoom or in person when you are back.

@lhnguyen102 lhnguyen102 marked this pull request as draft April 15, 2025 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants