docs: add gida wiki

cuongth95 · cuongth95 · commit 530658738119 · 2024-12-20T12:39:10.000+01:00
diff --git a/ditec/docs/gida/installation.md b/ditec/docs/gida/installation.md
@@ -0,0 +1,21 @@
+# Manually download the data
+TODO: Update later
+
+# Instal the Data Interface
+Currently, GiDA is available for Python >= 3.10. Please setup a virtual environment before installation.
+
+As some libraries are tailored to your OS and CUDA, user should install them separately as follows:
+
+1. Install [PyTorch >= 2.3](https://pytorch.org/get-started/locally/)
+2. Instal [PyG >= 2.3](https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html)
+
+At this time, GiDA works best on PyTorch and PyG 2.3.
+
+Afterwards, you can clone GiDA or install it via pip:
+
+```python
+pip install git+https://github.com/DiTEC-project/DiTEC_WDN_dataset.git
+```
+
+Tada! GiDA data interface has been installed! 
+
diff --git a/ditec/docs/gida/overview.md b/ditec/docs/gida/overview.md
@@ -0,0 +1,44 @@
+# GiDA - The Gigantic Dataset
+
+This work includes a collection of synthetic scenarios devised from 36 **Water Distribution Networks (WDNs)**. 
+
+For the sake of clarity, it would be better to get into familiarized concepts:
+
+* **Scenario** denotes as a sequence of snapshots.
+
+* **Snapshot** represents a measured steady-state of a particular WDN and is often modelled as an undirect graph.
+
+* **Input parameters** includes simulation inputs, such as demands, pipe diameter, and so on.
+
+* **Output parameters** includes simulation outcomes which researchers are interested in (e.g., pressure, flow rate, head, ...)
+
+Both parameters are described as nodal/edge features in the snapshot graph. Their values are diverse but temporal correlated with those of other snapshots in the **same** scenario. 
+However, in GiDA, two scenarios are considered completely different WDNs despite their origin being the same network.
+
+
+
+# Acknowledgement
+This work is funded by the project DiTEC: Digital Twin for Evolutionary Changes in Water Networks (NWO 19454).
+
+# Citing GiDA
+
+* For the up-to-date dataset and interface, please use this:
+```
+TODO: UPDATE LATER
+```
+
+* For the older dataset versions, please use this:
+```tex
+@article{tello2024largescale,
+    AUTHOR = {Tello, Andrés and Truong, Huy and Lazovik, Alexander and Degeler, Victoria},
+    TITLE = {Large-Scale Multipurpose Benchmark Datasets for Assessing Data-Driven Deep Learning Approaches for Water Distribution Networks},
+    JOURNAL = {Engineering Proceedings},
+    VOLUME = {69},
+    YEAR = {2024},
+    NUMBER = {1},
+    ARTICLE-NUMBER = {50},
+    URL = {https://www.mdpi.com/2673-4591/69/1/50},
+    ISSN = {2673-4591},
+    DOI = {10.3390/engproc2024069050}
+}
+```
diff --git a/ditec/docs/gida/parameters.md b/ditec/docs/gida/parameters.md
@@ -0,0 +1,34 @@
+# Parameters
+An input attribute is named as `<component>_<attribute>`. Nodal components include `reservoir`, `junction`, and `tank`, while edge components involve `pipe`, `headpump`, `powerpump`, etc.
+
+Tip: Open `.zip` file to see available attributes as filename (csv) or folder name (zarr).
+
+On the other hand, another kind of attribute is simulation output that has no component prefix (e.g., velocity, pressure, ...). They concatenate features of components based on their type (node or link).Therefore, we might encounter a mismatch in size when striving to stack input and output parameters. Consider this example:
+```python
+# This should raise an error
+GidaV6(
+    zip_file_paths=[
+        r"./Dataset/simgen_Anytown_20240524_1202_csvdir_20240527_1205.zip",  # Anytown datset
+    ],
+    node_attrs=[
+        "junction_base_demand",                                         # load junc base_demand (#junctions)
+       ("reservoir_base_head", "junction_elevation", "tank_elevation"), # load node elevation(#reservoirs + #tanks + #junctions)             
+    ],  
+    num_records=100,  # take only 100 records
+)
+```
+Intuitively, we can observe the size inconsistency between `junction_base_demand` and the tuple of elevation-related parameters. However, we sometimes want to define `node_attrs` in this way.\
+To solve this, GiDA offers the `*` operator indicating a specific parameter whose size is less than others. Let's fix the above example:
+```python
+GidaV6(
+    zip_file_paths=[
+         r"./Dataset/simgen_Anytown_20240524_1202_csvdir_20240527_1205.zip"  # Anytown datset
+    ],
+    node_attrs=[
+        "*junction_base_demand",                                         # load junc base_demand (#junctions) with asterisk
+       ("reservoir_base_head", "junction_elevation", "tank_elevation"), # load node elevation(#reservoirs + #tanks + #junctions)             
+    ],  
+    num_records=100,  # take only 100 records
+)
+```
+In this way, GiDA pads the incomplete parameters according to the tuple or non-asterisk parameters.
diff --git a/ditec/docs/gida/quickstart.md b/ditec/docs/gida/quickstart.md
@@ -0,0 +1,36 @@
+
+# Download datasets
+TODO: UPDATE LATER.
+##  GiDA-V1
+Please go to [Gida-V1](https://zenodo.org/records/11353195), download the dataset, and place it into a folder, say `/Dataset`.
+
+
+# Tutorial
+For the first-time user, please refer to the `datasets.py` script and review the `GidaV6.__init__` function. A minimal example is also provided at the end of the script.
+
+The data interface `GidaV6` will take node (edge) attributes and output a set of records. Each records is a `Data` instance (visit [here](https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.data.Data.html#torch_geometric.data.Data) for more information). This `Data` contains a snapshot graph described by the (sparsed) adjacency matrix A, nodal feature X, and edge feature E. Also, if label is available, we have label Y corresponding to either node or edge. In the case both edge and node sets have their own labels, Y is for label of nodes, while E_Y stands for label of edges.
+
+Assume you want to load the train set of Anytown network, a very simple interface can be declared as follow:
+```
+from gigantic_dataset.core.datasets import GidaV6
+gida = GidaV6(
+            zip_file_paths=[
+                r"./Dataset/simgen_Anytown_20240524_1202_csvdir_20240527_1205.zip",    # Anytown datset
+            ],
+            node_attrs=[
+                "demand",                                                             # load nodal demand
+            ],                                               
+            edge_attrs=["pipe_diameter", "pipe_length"],                              # load some properites at edge
+            label_attrs=["pressure"],                                                 # expect labels Y are pressure
+            edge_label_attrs=["flowrate"],                                            # expect edge labels E_Y are flowrate
+            split_set="train",                                                        # take train set only
+            num_records=100,                                                          # take only 100 records
+            selected_snapshots=None,                                                  # take all snapshots
+        )
+# You can call a record directly
+print(gida[0]) # Data instance
+# Or via a data loader
+from torch_geometric.loader import DataLoader
+loader = DataLoader(gida, batch_size=1)
+print(next(iter(loader))) #Batch instance
+```
diff --git a/ditec/docs/index.md b/ditec/docs/index.md
@@ -0,0 +1,21 @@
+# Tut for writer
+
+This is the default view and should not be overriden.
+
+To create a wiki for your project so-called `Project A`, please do the following steps:
+
+1. Clone the project at [here](https://github.com/DiTEC-project/DiTEC-project.github.io).
+
+2. Run command `pip install mkdocs` to start working on wiki with MKDOCS.
+
+3. Create a new directory whose name is matched your project name.
+
+4. Add markdown files into the directory. Note that each markdown represents a page of the wiki.
+
+5. Check `mkdocs.yml` and layout your wiki structure.
+
+6. For development, run `mkdocs serve` to show the view on localhost.
+
+7. For deployment to the Github Page, follow (this tutorial)[https://www.mkdocs.org/user-guide/deploying-your-docs/]
+
+
diff --git a/ditec/mkdocs.yml b/ditec/mkdocs.yml
@@ -0,0 +1,17 @@
+site_name: DiTEC
+theme: readthedocs
+nav:
+  - GiDA- The Gigantic Dataset:
+    - Introduction: 
+      - Overview: gida/overview.md
+      - Install GiDA: gida/installation.md
+      - Quickstart: gida/quickstart.md
+    - Tutorials: 
+      - Datasets: gida/datasets.md
+      - Parameter stacking: gida/parameters.md
+      - Simulation Configuration: gida/simconfig_tut.md
+      - Scenario generation: gida/scene_gen.md
+    - Advance Topics:
+      - Hydraulic Parameter Optimization: gida/hpo.md
+      - Naive/ Manual HPO: gida/naive_manual_hpo.md
+      - PSO: gida/pso.md
diff --git a/requirements.txt b/requirements.txt
@@ -0,0 +1 @@
+mkdocs