Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bash: lm-saes: command not found #83

Open
Tizzzzy opened this issue Feb 4, 2025 · 10 comments
Open

bash: lm-saes: command not found #83

Tizzzzy opened this issue Feb 4, 2025 · 10 comments

Comments

@Tizzzzy
Copy link

Tizzzzy commented Feb 4, 2025

Hi, I am new to this repo, and I got this error when I followed the readme to train the SAE:

(llama_scope) [email protected]:/Language-Model-SAEs$ lm-saes train examples/configuration/train.toml
bash: lm-saes: command not found

I created a conda environment with python 3.10, then I followed the instruction: pdm install, then downloaded bun, then run lm-saes train examples/configuration/train.toml. That is where I got the error.

Can you please take a look?
Thank you

@dest1n1s
Copy link
Collaborator

dest1n1s commented Feb 4, 2025

I'm sorry, but the current README and examples are actually outdated. We'll update them as soon as we have enough capacity.

Currently we recommend to use uv as the package manager (as a drop-in replacement of pdm). Then you could try training an SAE of Pythia with the following script:

import torch

from lm_saes import (
    ActivationFactoryConfig,
    ActivationFactoryDatasetSource,
    ActivationFactoryTarget,
    InitializerConfig,
    SAEConfig,
    TrainerConfig,
    TrainSAESettings,
    WandbConfig,
    train_sae,
)

if __name__ == "__main__":
    settings = TrainSAESettings(
        sae=SAEConfig(
            hook_point_in="blocks.3.ln1.hook_normalized",
            hook_point_out="blocks.3.ln1.hook_normalized",
            d_model=768,
            expansion_factor=8,
            act_fn="topk",
            norm_activation="token-wise",
            sparsity_include_decoder_norm=True,
            top_k=50,
            dtype=torch.float32,
            device="cuda",
        ),
        initializer=InitializerConfig(
            init_search=True,
            state="training",
        ),
        trainer=TrainerConfig(
            lp=1,
            initial_k=768 / 2,
            lr=4e-4,
            lr_scheduler_name="constantwithwarmup",
            total_training_tokens=600_000_000,
            log_frequency=1000,
            eval_frequency=1000000,
            n_checkpoints=5,
            check_point_save_mode="linear",
            exp_result_path="results",
        ),
        wandb=WandbConfig(
            wandb_project="pythia-160m-test",
            exp_name="pythia-160m-test",
        ),
        activation_factory=ActivationFactoryConfig(
            sources=[
                ActivationFactoryDatasetSource(
                    name="openwebtext",
                )
            ],
            target=ActivationFactoryTarget.BATCHED_ACTIVATIONS_1D,
            hook_points=["blocks.3.ln1.hook_normalized"],
            batch_size=2048,
            buffer_size=None,
            ignore_token_ids=[],
        ),
        sae_name="pythia-160m-test-L3",
        sae_series="pythia-160m-test",
    )
    train_sae(settings)

Hope to know if this setup works!

@Tizzzzy
Copy link
Author

Tizzzzy commented Feb 4, 2025

Hi,
Thank you for your reply. I have following questions:

  1. Can you tell me what command I should use to build environment using uv?
  2. Should I just copy the update script to a py file?
  3. Should I run the training using command python <new_script>.py?
  4. Is that all I need to do?

@Tizzzzy
Copy link
Author

Tizzzzy commented Feb 6, 2025

Hi,
Can you please provide me a clearer instruction? I have tried creating environment using uv pip install --sync, uv pip install -r uv.lock and none of them works. So I just used the environment that I created using pdm to run the newly provided python code, and I still got this error:

(llama_scope) [email protected]:/Language-Model-SAEs$ python examples/configuration/train.py
/opt/conda/envs/llama_scope/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py:275: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:81.)
  cpu = _conversion_method_template(device=torch.device("cpu"))
Traceback (most recent call last):
  File "/Language-Model-SAEs/examples/configuration/train.py", line 3, in <module>
    from lm_saes import (
ModuleNotFoundError: No module named 'lm_saes'

Can you please take a look and help me with it?

@dest1n1s
Copy link
Collaborator

dest1n1s commented Feb 6, 2025

Sorry for the late reply.

  1. Can you tell me what command I should use to build environment using uv?

Once you have uv installed (following the instructions on this), you do not need any explicit command to build the environment. uv will handle resolving & downloading the required packages when you actually run some scripts in the project. If you really want to have the packages explicitly downloaded (this may be necessary if your GPUs have no internet connections), you can run uv sync.

  1. Should I just copy the update script to a py file?

Yes.

  1. Should I run the training using command python <new_script>.py?

You should run uv run <new_script>.py to activate the uv venv environment and run the script.

  1. Is that all I need to do?

It should work smoothly with the above steps. Please let me know if there are any further problems!

@Tizzzzy
Copy link
Author

Tizzzzy commented Feb 6, 2025

Hi,
Thank you for your reply. I have now followed your new instruction. I created a new conda environment with python 3.12.0. Then I use pip install uv installed uv. Then I copied the new python script to train.py. Then I run the script using uv run ./examples/configuration/train.py. However, I got this error:

(llama-scope) [email protected]:/Language-Model-SAEs$ uv run ./examples/configuration/train.py
Traceback (most recent call last):
  File "/Language-Model-SAEs/./examples/configuration/train.py", line 64, in <module>
    train_sae(settings)
  File "/Language-Model-SAEs/src/lm_saes/runner.py", line 286, in train_sae
    activations_stream = activation_factory.process()
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Language-Model-SAEs/src/lm_saes/activation/factory.py", line 281, in process
    streams = [processor(**kwargs) for processor in self.pre_aggregation_processors]
               ^^^^^^^^^^^^^^^^^^^
  File "/Language-Model-SAEs/src/lm_saes/activation/factory.py", line 98, in process_dataset
    assert datasets is not None, "`datasets` must be provided for dataset sources"
AssertionError: `datasets` must be provided for dataset sources

I tried using the dataset from Huggingface path Skylion007/openwebtext. However it still doesn't work. I think this is because I didn't download the dataset to my local repo. How and where can I download the dataset?

@dest1n1s
Copy link
Collaborator

dest1n1s commented Feb 6, 2025

Have you tried the script above in this issue? The example script has not been up-to-date yet.

@Tizzzzy
Copy link
Author

Tizzzzy commented Feb 6, 2025

Hi,
Yes, the error is from the new python script:

import torch

from lm_saes import (
    ActivationFactoryConfig,
    ActivationFactoryDatasetSource,
    ActivationFactoryTarget,
    InitializerConfig,
    SAEConfig,
    TrainerConfig,
    TrainSAESettings,
    WandbConfig,
    train_sae,
)

if __name__ == "__main__":
    settings = TrainSAESettings(
        sae=SAEConfig(
            hook_point_in="blocks.3.ln1.hook_normalized",
            hook_point_out="blocks.3.ln1.hook_normalized",
            d_model=768,
            expansion_factor=8,
            act_fn="topk",
            norm_activation="token-wise",
            sparsity_include_decoder_norm=True,
            top_k=50,
            dtype=torch.float32,
            device="cuda",
        ),
        initializer=InitializerConfig(
            init_search=True,
            state="training",
        ),
        trainer=TrainerConfig(
            lp=1,
            initial_k=768 / 2,
            lr=4e-4,
            lr_scheduler_name="constantwithwarmup",
            total_training_tokens=600_000_000,
            log_frequency=1000,
            eval_frequency=1000000,
            n_checkpoints=5,
            check_point_save_mode="linear",
            exp_result_path="results",
        ),
        wandb=WandbConfig(
            wandb_project="pythia-160m-test",
            exp_name="pythia-160m-test",
        ),
        activation_factory=ActivationFactoryConfig(
            sources=[
                ActivationFactoryDatasetSource(
                    name="openwebtext",
                )
            ],
            target=ActivationFactoryTarget.BATCHED_ACTIVATIONS_1D,
            hook_points=["blocks.3.ln1.hook_normalized"],
            batch_size=2048,
            buffer_size=None,
            ignore_token_ids=[],
        ),
        sae_name="pythia-160m-test-L3",
        sae_series="pythia-160m-test",
    )
    train_sae(settings)

Can you please take a look and help me fix the bug?

@dest1n1s
Copy link
Collaborator

dest1n1s commented Feb 7, 2025

Hi,
It seems to be some bugs in the current train runner. It doesn't fit non-pre-generated datasets. I'll push a fix asap.

@Tizzzzy
Copy link
Author

Tizzzzy commented Feb 7, 2025

Thank you! Please update asap

@dest1n1s
Copy link
Collaborator

dest1n1s commented Feb 8, 2025

Hello, this should be fixed with #85 . Also, you can try separately generating activations and training the SAE, which can drastically improve the training speed as long as you have enough disk space to hold all the activations. Examples are updated in #85 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants