Merge branch 'main' into normalization

Kevin2 · web-flow · commit 1be496ab4a18 · 2025-01-24T18:52:10.000+01:00
diff --git a/README.md b/README.md
diff --git a/README.rst b/README.rst
@@ -0,0 +1,85 @@
+=========================
+pyvisgen |ci| |codecov|
+=========================
+
+.. |ci| image:: https://github.com/radionets-project/pyvisgen/workflows/CI/badge.svg?branch=main
+    :target: https://github.com/radionets-project/pyvisgen/actions/workflows/ci.yml?branch=main
+    :alt: Test Status
+
+.. |codecov| image:: https://codecov.io/github/radionets-project/pyvisgen/badge.svg
+    :target: https://codecov.io/github/radionets-project/pyvisgen
+    :alt: Code coverage
+
+
+Python implementation of the VISGEN tool developed at `Haystack Observatory <https://www.haystack.mit.edu/astronomy/>`_.
+It uses the Radio Interferometer Measurement Equation (RIME) to simulate the measurement process of a radio interferometer.
+A gridder is also implemented to process the resulting visibilities and convert them to images suitable as input for
+the neural networks developed in the `radionets repository <https://github.com/radionets-project/radionets>`_.
+
+Installation
+============
+
+You can install the necessary packages in a conda environment of your choice by executing
+
+.. code::
+
+  $ pip install -e .
+
+
+Usage
+=====
+
+There are 3 possible modes at the moment:  ``simulate`` (default), ``slurm``, and ``gridding``. ``simulate`` and ``slurm`` both
+utilize the RIME formalism for creating visibilities data. With the option ``gridding``, these visibilities get gridded and prepared
+as input images for training a neural network from the radionets framework. The necessary options and variables are set with a ``toml``
+file. An exemplary file can be found in ``config/data_set.toml``.
+
+.. code::
+
+  $ pyvisgen_create_dataset --mode=simulate some_file.toml
+
+
+In the examples directory, you can find introductory jupyter notebooks which can be used as an entry point.
+
+Input images
+============
+
+As input images for the RIME formalism, we use GAN-generated radio galaxies created by `Rustige et. al. <https://doi.org/10.1093/rasti/rzad016>`_
+and `Kummer et. al. <https://doi.org/10.18420/inf2022_38>`_. Below, you can see four example images consisting of FRI and FRII sources.
+
+.. image:: https://github.com/radionets-project/pyvisgen/assets/23259659/285e36f6-74e7-45f1-9976-896a38217880
+   :alt: sources
+
+Any image can be used as input for the formalism, as long as they are stored in the h5 format, generated with |h5py|_.
+
+.. |h5py| replace:: ``h5py``
+.. _h5py: https://www.h5py.org/
+
+RIME
+====
+
+Currently, we use the following expression for the simulation process:
+
+$$\\mathbf{V}_{\\mathrm{pq}}(l, m) = \\sum_{l, m} \\mathbf{E}_{\\mathrm{p}}(l, m) \\mathbf{K}_{\\mathrm{p}}(l, m) \\mathbf{B}(l, m) \\mathbf{K}^{H}_{\\mathrm{q}}(l, m) \\mathbf{E}^{H}_{\\mathrm{q}}(l, m)$$
+
+Here, $\\mathbf{B}(l, m)$ corresponds to the source distribution, $\\mathbf{K}(l, m) = \\exp(-2\\pi\\cdot i\\cdot (ul + vm))$ represents
+the phase delay, and $\\mathbf{E}(l, m) = \\mathrm{jinc}\\left(\\frac{2\\pi}{\\lambda}d\\cdot \\theta_{lm}\\right)$ the telescope properties,
+with $\\mathrm{jinc(x)} = \\frac{J_1(x)}{x}$ and $J_1(x)$ as the first Bessel function. An exemplary result can be found below.
+
+.. image:: https://github.com/radionets-project/pyvisgen/assets/23259659/858a5d4b-893a-4216-8d33-41d33981354c
+   :alt: visibilities
+
+Visualization of Jones matrices
+===============================
+
+In this section, you can see visualizations of the matrices $\\mathbf{E}(l, m)$  and $\\mathbf{K}(l, m)$.
+
+Visualization of the $\\mathbf{E}$ matrix
+-----------------------------------------
+.. image:: https://github.com/radionets-project/pyvisgen/assets/23259659/194a321b-77cd-423b-9d01-c18c0741d6c5
+   :alt: visualize_E
+
+Visualization of the $\\mathbf{K}$ matrix
+-----------------------------------------
+.. image:: https://github.com/radionets-project/pyvisgen/assets/23259659/501f487a-498b-4143-b54a-eb0e2f28e417
+   :alt: visualize_K
diff --git a/docs/changes/45.maintenance.rst b/docs/changes/45.maintenance.rst
@@ -0,0 +1,2 @@
+- Switch README to reStructuredText
+- Add Codecov badge
diff --git a/docs/changes/46.feature.rst b/docs/changes/46.feature.rst
@@ -0,0 +1 @@
+- ``pyvisgen.layouts.get_array_layout`` now also accepts custom layouts stored in a ``pd.DataFrame``
diff --git a/docs/changes/48.feature.rst b/docs/changes/48.feature.rst
@@ -0,0 +1 @@
+- Added optional auto scaling for batchsize in vis_loop
diff --git a/environment.yml b/environment.yml
@@ -16,3 +16,5 @@ dependencies:
   - pytest
   - pytest-cov
   - pytest-runner
+  - pip:
+    - toma
diff --git a/pyproject.toml b/pyproject.toml
@@ -28,25 +28,26 @@ classifiers = [
 requires-python = ">=3.10"
 
 dependencies = [
-  "numpy",
+  "astroplan",
   "astropy<=6.1.0",
-  "torch",
-  "matplotlib",
+  "click",
+  "h5py",
   "ipython",
-  "scipy",
+  "jupyter",
+  "matplotlib",
+  "natsort",
+  "numexpr",
+  "numpy",
   "pandas",
-  "toml",
+  "pre-commit",
   "pytest",
   "pytest-cov",
-  "jupyter",
-  "astroplan",
+  "scipy",
+  "toma",
+  "toml",
+  "torch",
   "torch",
   "tqdm",
-  "numexpr",
-  "click",
-  "h5py",
-  "natsort",
-  "pre-commit",
 ]
 
 [project.scripts]
diff --git a/pyvisgen/layouts/layouts.py b/pyvisgen/layouts/layouts.py
@@ -24,34 +24,46 @@ def __getitem__(self, i):
         return Stations(*[getattr(self, f.name)[i] for f in fields(self)])
 
 
-def get_array_layout(array_name, writer=False):
+def get_array_layout(array_layout: str | Path | pd.DataFrame, writer: bool = False):
     """Reads telescope layout txt file and converts it into a dataclass.
+    Also allows a DataFrame to be passed that is then converted into a dataclass
+    object.
     Available arrays:
     - EHT
 
     Parameters
     ----------
-    array_name : str
-        Name of telescope array
+    array_layout : str or pathlib.Path or pd.DataFrame
+        Name of telescope array or pd.DataFrame containing
+        the array layout.
+    writer : bool, optional
+        If ``True``, return ``array`` DataFrame instead of
+        ``Stations`` dataclass object.
 
     Returns
     -------
     dataclass objects
-        Station infos combinde in dataclass
+        Station infos combined in dataclass
     """
-    f = array_name + ".txt"
-    array = pd.read_csv(file_dir / f, sep=r"\s+")
-    if array_name == "vla":
-        loc = EarthLocation.of_site("VLA")
-        array["X"] += loc.value[0]
-        array["Y"] += loc.value[1]
-        array["Z"] += loc.value[2]
-
-    if array_name == "test_layout":
-        loc = EarthLocation.of_address("dortmund")
-        array["X"] += loc.value[0]
-        array["Y"] += loc.value[1]
-        array["Z"] += loc.value[2]
+    if isinstance(array_layout, str):
+        f = array_layout + ".txt"
+        array = pd.read_csv(file_dir / f, sep=r"\s+")
+
+        if array_layout == "vla":
+            # Change relative positions to absolute positions
+            # for the VLA layout
+            loc = EarthLocation.of_site("VLA")
+            array["X"] += loc.value[0]
+            array["Y"] += loc.value[1]
+            array["Z"] += loc.value[2]
+
+    elif isinstance(array_layout, pd.DataFrame):
+        array = array_layout
+    else:
+        raise TypeError(
+            "Expected array_layout to be of type str, "
+            "pathlib.Path, or pandas.DataFrame!"
+        )
 
     # drop name col and convert to tensor
     tensor = torch.from_numpy(array.iloc[:, 1:].values)
diff --git a/pyvisgen/simulation/visibility.py b/pyvisgen/simulation/visibility.py
@@ -1,7 +1,8 @@
 from dataclasses import dataclass, fields
 
 import torch
-from tqdm import tqdm
+import toma
+from tqdm.autonotebook import tqdm
 
 import pyvisgen.simulation.scan as scan
 
@@ -44,13 +45,19 @@ def vis_loop(
     num_threads=10,
     noisy=True,
     mode="full",
-    batch_size=100,
+    batch_size="auto",
     show_progress=False,
     normalize=True,
 ):
     torch.set_num_threads(num_threads)
     torch._dynamo.config.suppress_errors = True
 
+    if not (
+        isinstance(batch_size, int)
+        or (isinstance(batch_size, str) and batch_size == "auto")
+    ):
+        raise ValueError("Expected batch_size to be 'auto' or of type int")
+
     SI = torch.flip(SI, dims=[1])
 
     # define unpolarized sky distribution
@@ -106,10 +113,78 @@ def vis_loop(
     else:
         raise ValueError("Unsupported mode!")
 
-    batches = torch.arange(bas[:].shape[1]).split(batch_size)
+    if batch_size == "auto":
+        batch_size = bas[:].shape[1]
+
+    visibilities = toma.explicit.batch(
+        _batch_loop,
+        batch_size,
+        visibilities,
+        vis_num,
+        obs,
+        B,
+        bas,
+        lm,
+        rd,
+        noisy,
+        show_progress,
+    )
+
+    return visibilities
 
-    if show_progress:
-        batches = tqdm(batches)
+
+def _batch_loop(
+    batch_size: int,
+    visibilities,
+    vis_num: int,
+    obs,
+    B: torch.tensor,
+    bas,
+    lm: torch.tensor,
+    rd: torch.tensor,
+    noisy: bool | float,
+    show_progress: bool,
+):
+    """Main simulation loop of pyvisgen. Computes visibilities
+    batchwise.
+
+    Parameters
+    ----------
+    batch_size : int
+        Batch size for loop over Baselines dataclass object.
+    visibilities : Visibilities
+        Visibilities dataclass object.
+    vis_num : int
+        Number of visibilities.
+    obs : Observation
+        Observation class object.
+    B : torch.tensor
+        Stokes matrix containing stokes visibilities.
+    bas : Baselines
+        Baselines dataclass object.
+    lm : torch.tensor
+        lm grid.
+    rd : torch.tensor
+        rd grid.
+    noisy : float or bool
+        Simulate noise as SEFD with given value. If set to False,
+        no noise is simulated.
+    show_progress :
+        If True, show a progress bar tracking the loop.
+
+    Returns
+    -------
+    visibilities : Visibilities
+        Visibilities dataclass object.
+    """
+    batches = torch.arange(bas[:].shape[1]).split(batch_size)
+    batches = tqdm(
+        batches,
+        position=0,
+        disable=not show_progress,
+        desc="Computing visibilities",
+        postfix=f"Batch size: {batch_size}",
+    )
 
     for p in batches:
         bas_p = bas[:][:, p]
@@ -157,6 +232,7 @@ def vis_loop(
 
         visibilities.add(vis)
         del int_values
+
     return visibilities
 
 
diff --git a/tests/data/test_layout.txt b/tests/data/test_layout.txt
@@ -0,0 +1,11 @@
+station_name X Y Z dish_dia el_low el_high SEFD altitude
+t000 -4000 2000 0 25 15.0 85.0 110.0 1000.0
+t001 8000 -2000 0 25 15.0 85.0 110.0 1000.0
+t002 1000 1000 0 25 15.0 85.0 110.0 1000.0
+t003 -4000 6000 0 25 15.0 85.0 110.0 1000.0
+t004 -3000 -3000 0 25 15.0 85.0 110.0 1000.0
+t005 6000 -5000 0 25 15.0 85.0 110.0 1000.0
+t006 8000 2000 0 25 15.0 85.0 110.0 1000.0
+t007 2000 -4000 0 25 15.0 85.0 110.0 1000.0
+t008 -6000 3000 0 25 15.0 85.0 110.0 1000.0
+t009 -2000 8000 0 25 15.0 85.0 110.0 1000.0
diff --git a/tests/test_layouts.py b/tests/test_layouts.py
diff --git a/tests/test_simulation.py b/tests/test_simulation.py

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+- Switch README to reStructuredText`
	`2`	`+- Add Codecov badge`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	+- ``pyvisgen.layouts.get_array_layout`` now also accepts custom layouts stored in a ``pd.DataFrame``
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+- Added optional auto scaling for batchsize in vis_loop`