Distributed mass balance simulations workflow

Right now there is not centralized way to perform distributed MB simulations using MBM. 

It is possible to make the distributed simulations using the following workflow:

```python
# Create geodata object
geoData = mbm.geodata.GeoData(df_grid_monthly)

# Computes and saves gridded MB for a year and glacier
path_glacier_dem = os.path.join(cfg.dataPath, path_xr_grids,
                       f"{glacier_name}_{year}.zarr")
geoData.gridded_MB_pred(df_grid_monthly,
                                    loaded_model,
                                    glacier_name,
                                    year,
                                    all_columns,
                                    path_glacier_dem,
                                    path_save_glw,
                                    save_monthly_pred=True,
                                    type_model='NN')
```

However, the part to generate the `df_grid_monthly` data, i.e. the distributed grids based on all glacier pixels to feed to the NN, is not easily generalizeable, and right now it's limited to a single glacier and year. Therefore, we should incorporate this as either a new class within MBM, or as an extra functionality of an existing class.

The current approach based on the [Switzerland notebook](https://github.com/ODINN-SciML/MassBalanceMachine/blob/main/regions/Switzerland/1.3.%20Opt.%20Glacier%20grids-RGI.ipynb) involved a double for loop through all the glaciers and years to apply the following function:

```python
dataset_grid_yearly = mbm.data_processing.Dataset(
                        cfg=cfg,
                        data=df_grid_y,
                        region_name='CH',
                        region_id=11,
                        data_path=cfg.dataPath+path_PMB_GLAMOS_csv)

                    # Convert to monthly time resolution
                    dataset_grid_yearly.convert_to_monthly(
                        meta_data_columns=cfg.metaData,
                        vois_climate=vois_climate + ['pcsr'],
                        vois_topographical=voi_topographical,
                    )

                    # Ensure 'pcsr' column exists before saving
                    if 'pcsr' not in dataset_grid_yearly.data.columns:
                        raise ValueError(
                            f"'pcsr' column not found in dataset for glacier '{glacier_name}' in year {year}"
                        )

                    # Save the dataset for the specific year
                    save_path = os.path.join(
                        folder_path, f"{glacier_name}_grid_{year}.parquet")
                    print(f'Saving gridded dataset to: {save_path}')
                    dataset_grid_yearly.data.to_parquet(save_path,
                                                        engine="pyarrow",
                                                        compression="snappy")
```

We should create a wrapper function which can automatically make this simulation for multiple years and glaciers, avoiding the double for loop in Python (which is super slow) and implementing some sort of parallelization. 

After this, we should also update the documentation accordingly and add a quick example and tutorial of how to run distributed simulations. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Distributed mass balance simulations workflow #140

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Distributed mass balance simulations workflow #140

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions