Skip to content

Commit

Permalink
Scale rescalings (#114)
Browse files Browse the repository at this point in the history
* add function to plot corner

* notebook to plot cosmological chains

* add files to ignore

* fix scale to fit

* fix cosmopower interface to kms

* modify the tutorial accordingly

* minor change in path

* extend documentation to also include documentation for developers

* include new documentation

* add weighting option

* update notebook for star params inference

* typo

* change val scaling to ind_rescaling

---------

Co-authored-by: Laura Cabayol Garcia <[email protected]>
Co-authored-by: Laura Cabayol Garcia <[email protected]>
  • Loading branch information
3 people authored Nov 27, 2024
1 parent d044878 commit e552f1f
Show file tree
Hide file tree
Showing 20 changed files with 655 additions and 105 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ __pycache__
.github/workflows/.python-tests.yml.swo
.github/workflows/.python-tests.yml.swp
docs/build/
docs/site/
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,8 +92,8 @@ NYX_PATH="/global/cfs/cdirs/desi/science/lya/y1-p1d/likelihood_files/nyx_files/"
- Before running LaCE, please precompute all cosmological information needed using CAMB and save IGM histories. This is done by running the following scripts. You do not need to do it if you are in NERSC.

```
python scripts/save_nyx_emu_cosmo.py
python scripts/save_nyx_IGM.py
python scripts/developers/save_nyx_emu_cosmo.py
python scripts/developers/save_nyx_IGM.py
```

## Emulator parameters:
Expand Down
104 changes: 104 additions & 0 deletions docs/docs/developers/CreateNewEmulator.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# CREATING NEW EMULATORS

The training of the emulators is done with the code `train.py`. This code is used to train custom emulators already defined with an emulator label. However, you might need to define a new emulator label with new hyperparameters. This tutorial will guide you through the process of creating a new emulator label.

The file `lace/emulator/constants.py` contains the definitions of the emulator labels, training sets, and the emulator parameters associated with each emulator label.

To create a new emulator label, you first need to add your new emulator label to the `EmulatorLabel` class in the `constants.py` file, for example:

```python
class EmulatorLabel(StrEnum):
...
NEW_EMULATOR = "New_Emulator"
```

"New emulator" is the name of the new emulator label that identifies it in the emulator calls, e.g. `NNEmulator(emulator_label="New_Emulator")`.

Then this label needs to be added to `GADGET_LABELS` or `NYX_LABELS` in the `constants.py` file, depending on the training set you used to train your emulator. For example, if this is a new Gadget emulator, you need to add it to `GADGET_LABELS`:

```python
GADGET_LABELS = {
...
EmulatorLabel.NEW_EMULATOR,
}
```

The dictionary `EMULATOR_PARAMS` also needs to be updated with the new emulator parameters. Here, one needs to add all the arguments needed to initialize the emulator class. For example:

```python
"Nyx_alphap_cov": {
"emu_params": [
"Delta2_p",
"n_p",
"alpha_p",
"mF",
"sigT_Mpc",
"gamma",
"kF_Mpc",
],
"emu_type": "polyfit",
"kmax_Mpc": 4,
"ndeg": 6,
"nepochs": 600,
"step_size": 500,
"nhidden": 6,
"max_neurons": 400,
"lr0": 2.5e-4,
"weight_decay": 8e-3,
"batch_size": 100,
"amsgrad": True,
"z_max": 5,
"include_central": False,
}
```

Finally, you need to add a description of the new emulator in the `EMULATOR_DESCRIPTIONS` dictionary:

```python
EMULATOR_DESCRIPTIONS = {
...
EmulatorLabel.NEW_EMULATOR: "Description of the new emulator",
}
```
With this, you have added a new emulator label to the code! You should be able to train your new emulator with the command:

```bash
python scripts/train.py --config=path/to/config.yaml
```
or call the emulator directly with:

```python
emulator = NNEmulator(emulator_label="New_Emulator",
archive=archive)
```


## Loading the new emulator
Once you have defined a new emulator label, you might want to save the trained emulator models and load them without the need of retraining. This can be done either specifying the `model_path` argument when initializing the emulator.

```python
emulator = NNEmulator(emulator_label="New_Emulator",
model_path="path/to/model.pt",
train=False,
archive=archive)
```
And also using the `emulator_manager` function:

```python
emulator = emulator_manager(emulator_label="New_Emulator"
archive=archive)
```

In the first case, since you are specifying the `model path`, there is no naming convention for the model file. However, in the second case, the saved models must be stored in the following way:
- The folder must be `data/NNmodels/` from the root of the repository.
- For a specific emulator label, you need to create a new folder, e.g. `New_Emulator`.
- For the emulator using all training simulations, the model file is named `New_Emulator.pt`.
- For the emulator using the training set excluding a given simulation, the model file is named `New_Emulator_drop_sim_{simulation suite}_{simulation index}.pt`. For example, if you exclude the 10th simulation from the mpg training set, the model file is named `New_Emulator_drop_sim_mpg_10.pt`.

The emulator manager will automatically find the correct model file for the given emulator label. To set this up, you need to add the new emulator label to the `folder` dictionary in the `emulator_manager.py` file.
```python
folder = {
...
EmulatorLabel.NEW_EMULATOR: "NNmodels/New_Emulator/",
}
```
26 changes: 26 additions & 0 deletions docs/docs/developers/advancedTesting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# MAINTAINING THE AUTOMATED TESTING

LaCE uses automated testing to ensure code quality and prevent regressions. This guide explains how to maintain and extend the test suite. This section is intended for developers who are maintaining the automated testing.

## Running Tests
Automated tests are run using pytest. The tests pipeline is at `.github/workflows/python-tests.yml`. To add another test, you have to:

1. In the section `Run tests`, in `.github/workflows/python-tests.yml`, add the command to run your test.
```yaml
- name: Run tests
run: |
...
pytest tests/test_your_test.py
pytest tests/test_your_other_test.py
```
2. Add the script with your test in the `tests` folder.
3. The testing function must start with `test_` (e.g., `test_my_function`). Tests can take fixtures as arguments.

In the `.github/workflows/python-tests.yml` file, you can specify when the test should be run. For example, currently tests are only run after a PR to the `main` branch.
```yaml
on:
push:
branches: 'main'
```

When a PR is merged into the `main` branch, the tests are run automatically at [Github Actions](https://github.com/igmhub/LaCE/actions).
25 changes: 25 additions & 0 deletions docs/docs/developers/documentation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# MAINTAINING THE DOCUMENTATION

LaCE uses `mkdocs` to build the documentation. The documentation is hosted at [LaCE documentation](https://igmhub.github.io/LaCE/). The documentation can be built locally using the following command:

```bash
mkdocs build
```
and then served using

```bash
mkdocs serve
```

The documentation is pushed to the `gh-pages` branch at each release (merge into `main`).
The `gh-pages` branch is automatically updated when a PR is merged into `main`.

In order to write documentation, you can use the following structure:

- `docs/docs/developers`: Documentation for developers
- `docs/docs/`: Documentation for users

You can add new pages by adding a new `.md` file to the `docs/docs/` folder. Remember to add the new page to the `mkdocs.yml` file so that it is included in the documentation. The new page will automatically be added to the navigation menu.

To have a cleaner structure, add the new page to the corresponding `index.md` file.

20 changes: 20 additions & 0 deletions docs/docs/developers/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# FOR DEVELOPERS

Welcome to the LaCE developer documentation! This section contains information for developers who want to contribute to LaCE or understand its internals better.

## Contents

- [Creating New Emulators](CreateNewEmulator.md): Learn how to create and add new emulator types to LaCE
- [Training Options](trainingOptions.md): Implemented solutions to improve the emulators performance
- [Code Testing](advancedTesting.md): Information to mantain and extend the automated testing
- [Documentation](documentation.md): How to write and maintain documentation

## Getting Started

If you're new to developing for LaCE, we recommend:

1. Reading the installation instructions
2. Setting up your development environment
3. ...

For any questions, please open an issue on GitHub or reach out to the maintainers.
80 changes: 80 additions & 0 deletions docs/docs/developers/trainingOptions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# TRAINING OPTIONS

There are several features that can be used to customize the training of the emulators. This tutorial will guide you through the process of training emulators with different options.

- [Weighting with a covariance matrix](#weighting-with-a-covariance-matrix)
- [Weighting simulations depending of the scalings (mean flux, temperature )](#weighting-simulations-depending-of-the-scalings-mean-flux-temperature)

## Weighting with a covariance matrix
The emulator supports weighting the training simulations with a covariance matrix. This covariance matrix is used to weight the training simulations during the training of the neural network.

To train an emulator with a covariance matrix, you need to provide a covariance matrix for the training simulations. Currently, the emulator only supports a diagonal covariance matrix. It is important that the covariance matrix is given in the __k__ binning of the training simulations.

The function '_load_DESIY1_err' in the `nn_emulator.py` file loads a covariance matrix. The covariance must be a json file with the relative error as a function of __z__ for each __k__ bin.

From the relative error file in 'data/DESI_cov/rel_err_DESI_Y1.npy', we can generate the json file with the following steps:

First we load the data from the relative error file:

```python
cov = np.load(PROJ_ROOT / "data/DESI_cov/rel_error_DESIY1.npy", allow_pickle=True)
# Load the data dictionary
data = cov.item()
```

Then we extract the arrays. This has a hidden important step. In the original relative error file, the values for the relative error are set to 100 correspond to the scales not measured by DESI. The value of 100 is set at random, and can be optimized for the training. Initial investigations indicated that setting the value to 5 was working well. However, this parameter could be furtehr refined. Currently is set to 5, but other values of this dummy value could be used.

```python
# Extract the arrays
z_values = data['z']
rel_error_Mpc = data['rel_error_Mpc']
rel_error_Mpc[rel_error_Mpc == 100] = 5

k_cov = data['k_Mpc']
```

Then we extract the __k__ values for the training simulations to ensure that the covariance matrix is given in the __k__ binning of the training simulations.

```python
testing_data_central = archive.get_testing_data('nyx_central')
testing_data = archive.get_testing_data('nyx_0')
k_Mpc_LH = testing_data[0]['k_Mpc'][testing_data[0]['k_Mpc']<4]
```
And then we create the dictionary with the relative error as a function of __z__ for each __k__ bin:

```python
# Load the data dictionary
data = cov.item()
z_values = data['z']

dict_={}
for z, rel_error_row in zip(z_values, rel_error_Mpc):
f = interp1d(k_cov, rel_error_row, fill_value="extrapolate")
rel_error_Mpc_interp = f(k_Mpc_LH)
rel_error_Mpc_interp[0:3] = rel_error_Mpc_interp[3]
dict_[f"{z}"]=rel_error_Mpc_interp.tolist()

# Create a new dictionary with z as keys and corresponding rel_error_Mpc rows as values
#z_to_rel_error_serializable = {float(z): rel_error_row.tolist() for z, rel_error_row in z_to_rel_error.items()}
```

And finally we save the dictionary to a json file:

```python
# Save the z_to_rel_error dictionary to a JSON file
with open(PROJ_ROOT / "data/DESI_cov/rerr_DESI_Y1.json", "w") as json_file:
json.dump(dict_, json_file, indent=4)
```

## Weighting simulations depending of the scalings (mean flux, temperature )

The `nn_emulator.py` file contains a function `_get_rescalings_weights` that allows to weight the simulations depending on the scalings. This can be used to give more importance to the snapshots with certain scalings. It is possible to weight differently based on the scaling value and the redshift. Initial investigations did not show an improvement in the emulator performance when weighting the simulations. However, might be worth to further investigate this option.

The function `_get_rescalings_weights` can be customized by changing the line:

```python
weights_rescalings[np.where([(d['val_scaling'] not in [0,1] and d['z'] in [2.8, 3,3.2,3.4]) for d in self.training_data])] = 1
```
The weight value of 1 does not have any effect on the training. To downweight certain snapshots, a value lower than 1 can be used. In this particular case, modifying it to a lower value, for example 0.5, would downweight the snapshots with a scaling value not equal to 0 or 1 (temparature scalings) and a redshift in the range [2.8, 3,3.2,3.4].

Initial investigations showed that very low values of the weights, for example 0.01 already led to a similar performance to the one of an emulator trained with equal weights.
10 changes: 1 addition & 9 deletions docs/docs/index.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# LaCE documentation!
# LaCE DOCUMENTATION

Welcome to the documentation for LaCE!
LaCE contains a set of emulators for the one-dimensional flux power spectrum of the Lyman-alpha forest. It has been used in the papers:
Expand All @@ -10,14 +10,6 @@ LaCE contains a set of emulators for the one-dimensional flux power spectrum of
Please cite at least https://arxiv.org/abs/2305.19064 if you use this emulator in your research.


## Table of Contents

- [Installation](installation.md)
- [Archive](archive.md)
- [Emulator Predictions](emulatorPredictions.md)
- [Emulators Training](emulatorTraining.md)
- [Compressed Parameters](compressedParameters.md)

## Prerequisites

Before proceeding, ensure that the following software is installed on your system:
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/installation.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Installation
# INSTALLATION
(Last updated: Nov 19 2024)

LaCE contains a submodule to estimate compressed parameters from the power spectrum that uses cosmopower. The LaCE installation is slightly different depending on whether you want to use cosmopower or not.
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/archive.md → docs/docs/users/archive.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Archive
# ARCHIVE

The LaCE emulators support two types of archives:
- Gadget archive: Contains the P1D of Gadget simulations described in [Pedersen+21](https://arxiv.org/abs/2011.15127).
Expand Down
Loading

0 comments on commit e552f1f

Please sign in to comment.