Scale rescalings (#114)

* add function to plot corner * notebook to plot cosmological chains * add files to ignore * fix scale to fit * fix cosmopower interface to kms * modify the tutorial accordingly * minor change in path * extend documentation to also include documentation for developers * include new documentation * add weighting option * update notebook for star params inference * typo * change val scaling to ind_rescaling --------- Co-authored-by: Laura Cabayol Garcia <[email protected]> Co-authored-by: Laura Cabayol Garcia <[email protected]>
igmhub · Nov 27, 2024 · e552f1f · e552f1f
1 parent d044878
commit e552f1f
Show file tree

Hide file tree

Showing 20 changed files with 655 additions and 105 deletions.
diff --git a/.gitignore b/.gitignore
@@ -6,3 +6,4 @@ __pycache__
 .github/workflows/.python-tests.yml.swo
 .github/workflows/.python-tests.yml.swp
 docs/build/
+docs/site/
diff --git a/README.md b/README.md
@@ -92,8 +92,8 @@ NYX_PATH="/global/cfs/cdirs/desi/science/lya/y1-p1d/likelihood_files/nyx_files/"
 - Before running LaCE, please precompute all cosmological information needed using CAMB and save IGM histories. This is done by running the following scripts. You do not need to do it if you are in NERSC.
 
 ```
-python scripts/save_nyx_emu_cosmo.py
-python scripts/save_nyx_IGM.py
+python scripts/developers/save_nyx_emu_cosmo.py
+python scripts/developers/save_nyx_IGM.py
 ```
 
 ## Emulator parameters:

diff --git a/docs/docs/developers/CreateNewEmulator.md b/docs/docs/developers/CreateNewEmulator.md
@@ -0,0 +1,104 @@
+# CREATING NEW EMULATORS
+
+The training of the emulators is done with the code `train.py`. This code is used to train custom emulators already defined with an emulator label. However, you might need to define a new emulator label with new hyperparameters. This tutorial will guide you through the process of creating a new emulator label.
+
+The file `lace/emulator/constants.py` contains the definitions of the emulator labels, training sets, and the emulator parameters associated with each emulator label.
+
+To create a new emulator label, you first need to add your new emulator label to the `EmulatorLabel` class in the `constants.py` file, for example:
+
+```python
+class EmulatorLabel(StrEnum):
+    ...
+    NEW_EMULATOR = "New_Emulator"
+```
+
+"New emulator" is the name of the new emulator label that identifies it in the emulator calls, e.g. `NNEmulator(emulator_label="New_Emulator")`.
+
+Then this label needs to be added to `GADGET_LABELS` or `NYX_LABELS` in the `constants.py` file, depending on the training set you used to train your emulator. For example, if this is a new Gadget emulator, you need to add it to `GADGET_LABELS`:
+
+```python
+GADGET_LABELS = {
+    ...
+    EmulatorLabel.NEW_EMULATOR,
+}
+```
+
+The dictionary `EMULATOR_PARAMS` also needs to be updated with the new emulator parameters. Here, one needs to add all the arguments needed to initialize the emulator class. For example:
+
+```python
+    "Nyx_alphap_cov": {
+        "emu_params": [
+            "Delta2_p",
+            "n_p",
+            "alpha_p",
+            "mF",
+            "sigT_Mpc",
+            "gamma",
+            "kF_Mpc",
+        ],
+        "emu_type": "polyfit",
+        "kmax_Mpc": 4,
+        "ndeg": 6,
+        "nepochs": 600,
+        "step_size": 500,
+        "nhidden": 6,
+        "max_neurons": 400,
+        "lr0": 2.5e-4,
+        "weight_decay": 8e-3,
+        "batch_size": 100,
+        "amsgrad": True,
+        "z_max": 5,
+        "include_central": False,
+    }
+```
+
+Finally, you need to add a description of the new emulator in the `EMULATOR_DESCRIPTIONS` dictionary:
+
+```python
+EMULATOR_DESCRIPTIONS = {
+    ...
+    EmulatorLabel.NEW_EMULATOR: "Description of the new emulator",
+}
+```
+With this, you have added a new emulator label to the code! You should be able to train your new emulator with the command:
+
+```bash
+python scripts/train.py --config=path/to/config.yaml
+```
+or call the emulator directly with:
+
+```python
+emulator = NNEmulator(emulator_label="New_Emulator",
+                      archive=archive)
+```
+
+
+## Loading the new emulator
+Once you have defined a new emulator label, you might want to save the trained emulator models and load them without the need of retraining. This can be done either specifying the `model_path` argument when initializing the emulator. 
+
+```python
+emulator = NNEmulator(emulator_label="New_Emulator",
+                      model_path="path/to/model.pt",
+                      train=False,
+                      archive=archive)
+```
+And also using the `emulator_manager` function:
+
+```python
+emulator = emulator_manager(emulator_label="New_Emulator"
+                            archive=archive)
+```
+
+In the first case, since you are specifying the `model path`, there is no naming convention for the model file. However, in the second case, the saved models must be stored in the following way:
+- The folder must be  `data/NNmodels/` from the root of the repository.
+- For a specific emulator label, you need to create a new folder, e.g. `New_Emulator`.
+- For the emulator using all training simulations, the model file is named `New_Emulator.pt`.
+- For the emulator using the training set excluding a given simulation, the model file is named `New_Emulator_drop_sim_{simulation suite}_{simulation index}.pt`. For example, if you exclude the 10th simulation from the mpg training set, the model file is named `New_Emulator_drop_sim_mpg_10.pt`.   
+
+The emulator manager will automatically find the correct model file for the given emulator label. To set this up, you need to add the new emulator label to the `folder` dictionary in the `emulator_manager.py` file.
+```python
+folder = {
+    ...
+    EmulatorLabel.NEW_EMULATOR: "NNmodels/New_Emulator/",
+}
+```
diff --git a/docs/docs/developers/advancedTesting.md b/docs/docs/developers/advancedTesting.md
@@ -0,0 +1,26 @@
+# MAINTAINING THE AUTOMATED TESTING
+
+LaCE uses automated testing to ensure code quality and prevent regressions. This guide explains how to maintain and extend the test suite. This section is intended for developers who are maintaining the automated testing.
+
+## Running Tests
+Automated tests are run using pytest. The tests pipeline is at `.github/workflows/python-tests.yml`. To add another test, you have to:
+
+1. In the section `Run tests`, in `.github/workflows/python-tests.yml`, add the command to run your test.
+```yaml
+    - name: Run tests
+      run: |
+        ...
+        pytest tests/test_your_test.py
+        pytest tests/test_your_other_test.py
+```
+2. Add the script with your test in the `tests` folder.
+3. The testing function must start with `test_` (e.g., `test_my_function`). Tests can take fixtures as arguments.
+
+In the `.github/workflows/python-tests.yml` file, you can specify when the test should be run. For example, currently tests are only run after a PR to the `main` branch.
+```yaml
+    on:
+    push:
+        branches: 'main'
+```
+
+When a PR is merged into the `main` branch, the tests are run automatically at [Github Actions](https://github.com/igmhub/LaCE/actions).
diff --git a/docs/docs/developers/documentation.md b/docs/docs/developers/documentation.md
@@ -0,0 +1,25 @@
+# MAINTAINING THE DOCUMENTATION
+
+LaCE uses `mkdocs` to build the documentation. The documentation is hosted at [LaCE documentation](https://igmhub.github.io/LaCE/). The documentation can be built locally using the following command:
+
+```bash
+mkdocs build
+``` 
+and then served using
+
+```bash
+mkdocs serve
+```
+
+The documentation is pushed to the `gh-pages` branch at each release (merge into `main`).
+The `gh-pages` branch is automatically updated when a PR is merged into `main`.
+
+In order to write documentation, you can use the following structure:
+
+- `docs/docs/developers`: Documentation for developers
+- `docs/docs/`: Documentation for users
+
+You can add new pages by adding a new `.md` file to the `docs/docs/` folder. Remember to add the new page to the `mkdocs.yml` file so that it is included in the documentation. The new page will automatically be added to the navigation menu. 
+
+To have a cleaner structure, add the new page to the corresponding `index.md` file.
+
diff --git a/docs/docs/developers/index.md b/docs/docs/developers/index.md
@@ -0,0 +1,20 @@
+# FOR DEVELOPERS    
+
+Welcome to the LaCE developer documentation! This section contains information for developers who want to contribute to LaCE or understand its internals better.
+
+## Contents
+
+- [Creating New Emulators](CreateNewEmulator.md): Learn how to create and add new emulator types to LaCE
+- [Training Options](trainingOptions.md): Implemented solutions to improve the emulators performance
+- [Code Testing](advancedTesting.md): Information to mantain and extend the automated testing
+- [Documentation](documentation.md): How to write and maintain documentation
+
+## Getting Started
+
+If you're new to developing for LaCE, we recommend:
+
+1. Reading the installation instructions
+2. Setting up your development environment
+3. ...
+
+For any questions, please open an issue on GitHub or reach out to the maintainers.
diff --git a/docs/docs/developers/trainingOptions.md b/docs/docs/developers/trainingOptions.md
@@ -0,0 +1,80 @@
+# TRAINING OPTIONS
+
+There are several features that can be used to customize the training of the emulators. This tutorial will guide you through the process of training emulators with different options.
+
+- [Weighting with a covariance matrix](#weighting-with-a-covariance-matrix)
+- [Weighting simulations depending of the scalings (mean flux, temperature )](#weighting-simulations-depending-of-the-scalings-mean-flux-temperature)
+
+## Weighting with a covariance matrix
+The emulator supports weighting the training simulations with a covariance matrix. This covariance matrix is used to weight the training simulations during the training of the neural network.
+
+To train an emulator with a covariance matrix, you need to provide a covariance matrix for the training simulations. Currently, the emulator only supports a diagonal covariance matrix. It is important that the covariance matrix is given in the __k__ binning of the training simulations.
+
+The function '_load_DESIY1_err' in the `nn_emulator.py` file loads a covariance matrix. The covariance must be a json file with the relative error as a function of __z__ for each __k__ bin.
+
+From the relative error file in 'data/DESI_cov/rel_err_DESI_Y1.npy', we can generate the json file with the following steps:
+
+First we load the data from the relative error file:
+
+```python
+cov =  np.load(PROJ_ROOT / "data/DESI_cov/rel_error_DESIY1.npy", allow_pickle=True)
+# Load the data dictionary
+data = cov.item()
+```
+
+Then we extract the arrays. This has a hidden important step. In the original relative error file, the values for the relative error are set to 100 correspond to the scales not measured by DESI. The value of 100 is set at random, and can be optimized for the training. Initial investigations indicated that setting the value to 5 was working well. However, this parameter could be furtehr refined. Currently is set to 5, but other values of this dummy value could be used.
+
+```python
+# Extract the arrays
+z_values = data['z']
+rel_error_Mpc = data['rel_error_Mpc']
+rel_error_Mpc[rel_error_Mpc == 100] = 5
+
+k_cov = data['k_Mpc']
+```
+
+Then we extract the __k__ values for the training simulations to ensure that the covariance matrix is given in the __k__ binning of the training simulations.
+
+```python
+testing_data_central = archive.get_testing_data('nyx_central')
+testing_data = archive.get_testing_data('nyx_0')
+k_Mpc_LH = testing_data[0]['k_Mpc'][testing_data[0]['k_Mpc']<4]
+```
+And then we create the dictionary with the relative error as a function of __z__ for each __k__ bin:
+
+```python
+# Load the data dictionary
+data = cov.item()
+z_values = data['z']
+
+dict_={}
+for z, rel_error_row in zip(z_values, rel_error_Mpc):
+    f = interp1d(k_cov, rel_error_row, fill_value="extrapolate")
+    rel_error_Mpc_interp = f(k_Mpc_LH)
+    rel_error_Mpc_interp[0:3] = rel_error_Mpc_interp[3]
+    dict_[f"{z}"]=rel_error_Mpc_interp.tolist()
+
+# Create a new dictionary with z as keys and corresponding rel_error_Mpc rows as values
+#z_to_rel_error_serializable = {float(z): rel_error_row.tolist() for z, rel_error_row in z_to_rel_error.items()}
+```
+
+And finally we save the dictionary to a json file:
+
+```python
+# Save the z_to_rel_error dictionary to a JSON file
+with open(PROJ_ROOT / "data/DESI_cov/rerr_DESI_Y1.json", "w") as json_file:
+    json.dump(dict_, json_file, indent=4)
+```
+
+## Weighting simulations depending of the scalings (mean flux, temperature )
+
+The `nn_emulator.py` file contains a function `_get_rescalings_weights` that allows to weight the simulations depending on the scalings. This can be used to give more importance to the snapshots with certain scalings. It is possible to weight differently based on the scaling value and the redshift. Initial investigations did not show an improvement in the emulator performance when weighting the simulations. However, might be worth to further investigate this option.
+
+The function `_get_rescalings_weights` can be customized by changing the line:
+
+```python
+weights_rescalings[np.where([(d['val_scaling'] not in [0,1] and d['z'] in [2.8, 3,3.2,3.4]) for d in self.training_data])] = 1
+```
+The weight value of 1 does not have any effect on the training. To downweight certain snapshots, a value lower than 1 can be used. In this particular case, modifying it to a lower value, for example 0.5, would downweight the snapshots with a scaling value not equal to 0 or 1 (temparature scalings) and a redshift in the range [2.8, 3,3.2,3.4].
+
+Initial investigations showed that very low values of the weights, for example 0.01 already led to a similar performance to the one of an emulator trained with equal weights.
diff --git a/docs/docs/index.md b/docs/docs/index.md
@@ -1,4 +1,4 @@
-# LaCE documentation!
+# LaCE DOCUMENTATION
 
 Welcome to the documentation for LaCE!
 LaCE contains a set of emulators for the one-dimensional flux power spectrum of the Lyman-alpha forest. It has been used in the papers:
@@ -10,14 +10,6 @@ LaCE contains a set of emulators for the one-dimensional flux power spectrum of
 Please cite at least https://arxiv.org/abs/2305.19064 if you use this emulator in your research.
 
 
-## Table of Contents
-
-- [Installation](installation.md)
-- [Archive](archive.md)
-- [Emulator Predictions](emulatorPredictions.md)
-- [Emulators Training](emulatorTraining.md)
-- [Compressed Parameters](compressedParameters.md)
-
 ## Prerequisites
 
 Before proceeding, ensure that the following software is installed on your system:

diff --git a/docs/docs/installation.md b/docs/docs/installation.md
@@ -1,4 +1,4 @@
-# Installation
+# INSTALLATION
 (Last updated: Nov 19 2024)
 
 LaCE contains a submodule to estimate compressed parameters from the power spectrum that uses cosmopower. The LaCE installation is slightly different depending on whether you want to use cosmopower or not.

diff --git a/docs/docs/archive.md → docs/docs/users/archive.md b/docs/docs/archive.md → docs/docs/users/archive.md
@@ -1,4 +1,4 @@
-# Archive
+# ARCHIVE
 
 The LaCE emulators support two types of archives:
 - Gadget archive: Contains the P1D of Gadget simulations described in [Pedersen+21](https://arxiv.org/abs/2011.15127).