A companion repository for Gradient Routing: Masking Gradients to Localize Computation in Neural Networks.
factored_representations
is for shared functionality, although in practice, code for different subprojects is mostly siloedmasklib.py
andmodel_expansion.py
implement Expand, Route, Ablate for any TransformerLens model.- Has some tests
projects
contains the code to reproduce the results in the paperminigrid
- localizing behavioral tendencies in a gridworld reinforcement learning agentmnist
- splitting representations of an MNIST autoencodernanoGPT-factrep
- training a model with a steering scalar, and unlearning virologytinystories
- unlearning a subset of TinyStories
shared_configs
is for commonly-used configurations, e.g. model definitions, standard training config options
- Install PDM
- Install the PDM project (ie. install the dependencies)
pdm install
- Install the recommended VSCode extensions
- Install the pre-commit git hooks
pdm run pre-commit install
You can then run Python scripts with pdm run python <script.py>
or by activating the
virtual environment specified by pdm info
. Eg:
source /pdm-venvs/factored-representations-Dp430888-3.12/bin/activate
.vscode/settings.json
is configured to automatically format and lint the code with
Ruff (using the extension) on save.
Run the tests with:
pdm run pytest
@article{cloud2024gradient,
title={Gradient Routing: Masking Gradients to Localize Computation in Neural Networks},
url={https://arxiv.org/abs/2410.04332v1},
journal={arXiv.org},
author={Cloud, Alex and Goldman-Wetzler, Jacob and Wybitul, Evžen and Miller, Joseph and Turner, Alexander Matt},
year={2024},
}