Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

documentation review #1118

Merged
merged 3 commits into from
Jan 28, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 18 additions & 18 deletions docs/hpc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,52 +6,52 @@ Parallelization
Embarrassingly Parallel Problem
------------------------------------

``Sorcha``’s design lends itself perfectly to parallelization – when it simulates a large number of solar system objects, each one is considered in turn independently of all other objects. If you have access to a large number of computing cores, you can run ``Sorcha`` much more quickly by dividing up the labor: giving a small part of your model population to each core.
``Sorcha``’s design lends itself perfectly to parallelization – when it simulates a large number of Solar System objects, each one is considered in turn independently of all other objects. If you have access to a large number of computing cores, you can run ``Sorcha`` much more quickly by dividing up the labor: giving a small part of your model population to each core.

This involves two subtasks: breaking up your model population into an appropriate number of input files with unique names and organizing a large number of cores to simultaneously run ``Sorcha`` on their own individually-named input files. Both of these tasks are easy in theory, but tricky enough in practice that we provide some guidance below.


Slurm
---------

Slurm Workload Manager is a resource management utility commonly used by computing clusters. We provide starter code for running large parallel batches using slurm, though general guidance we provide is applicable to any system. Documentation for slurm is available `here <https://slurm.schedmd.com/>`_. Please note that your HPC (High Performance Computing) facility’s slurm setup may differ from those on which ``Sorcha`` was tested, and it is always a good idea to read any facility-specific documentation or speak to the HPC maintainers before you begin to run jobs.
Slurm Workload Manager is a resource management utility commonly used by computing clusters. We provide starter code for running large parallel batches using Slurm, though general guidance we provide is applicable to any system. The documentation for Slurm is available `here <https://slurm.schedmd.com/>`_. Please note that your HPC (High Performance Computing) facility’s Slurm setup may differ from those on which ``Sorcha`` was tested, and it is always a good idea to read any facility-specific documentation or speak to the HPC maintainers before you begin to run jobs.

Quickstart
--------------

We provide as a starting point our example scripts for running on HPC facilities using slurm. Some modifications will be required to make them work for your facility.
We provide as a starting point our example scripts for running on HPC facilities using Slurm. Some modifications will be required to make them work for your facility.

Below is a very simple slurm script example designed to run the :ref:`demo files <quickstart>` three times on three cores in parallel. Here, one core has been assigned to each ``Sorcha`` run, with each core assigned 1Gb of memory.
Below is a very simple Slurm script example designed to run the :ref:`demo files <quickstart>` three times on three cores in parallel. Here, one core has been assigned to each ``Sorcha`` run, with each core assigned 1Gb of memory.

.. literalinclude:: ./example_files/sorcha.sh
:language: text

Please note that time taken to run and memory required will vary enormously based on the size of your input files, your input population, and the chunk size assigned in the ``Sorcha`` configuration file: we therefore recommend test runs before you commit to very large runs. The chunk size is an especially important parameter: too small and ``Sorcha`` will take a very long time to run, too large and the memory footprint may become prohibitive. We have found that chunk sizes of 1000 to 10,000 work best.
Please note that time taken to run and memory required will vary enormously based on the size of your input files, your input population, and the chunk size assigned in the ``Sorcha`` configuration file - we therefore recommend test runs before you commit to very large runs. The chunk size is an especially important parameter; too small and ``Sorcha`` will take a very long time to run, too large and the memory footprint may become prohibitive. We have found that chunk sizes of 1,000 to 10,000 work best.

Below is a more complex example of a slurm script. Here, multi_sorcha.sh calls multi_sorcha.py, which splits up an input file into a number of ‘chunks’ and runs ``Sorcha`` in parallel on a user-specified number of cores.
Below is a more complex example of a Slurm script. Here, multi_sorcha.sh calls multi_sorcha.py, which splits up an input file into a number of ‘chunks’ and runs ``Sorcha`` in parallel on a user-specified number of cores.

multi_sorcha.sh:
``multi_sorcha.sh``:

.. literalinclude:: ./example_files/multi_sorcha.sh
:language: text

multi_sorcha.py:
``multi_sorcha.py``:

.. literalinclude:: ./example_files/multi_sorcha.py
:language: python

.. note::
We provide these here for you to copy, paste, and edit as needed. You might have to some slight modifications to both the slurm script and multi_sorcha.py depending if you're using ``Sorcha`` without calling the stats file.
We provide these here for you to copy, paste, and edit as needed. You might have to some slight modifications to both the Slurm script and multi_sorcha.py, for example if you're using ``Sorcha`` without calling the stats file.

multi_sorcha.sh requests many parallel Slurm jobs of multi_sorcha.py, feeding each a different --instance parameter. After changing ‘my_orbits.csv’, ‘my_colors.csv’, ‘my_pointings.db’, ‘my_config.ini’, and the various slurm parameters to match the above, you could generate 10 jobs, each with 4 cores running 25 orbits each, as follows::
``multi_sorcha.sh`` requests many parallel Slurm jobs of ``multi_sorcha.py``, feeding each a different --instance parameter. After changing ‘my_orbits.csv’, ‘my_colors.csv’, ‘my_pointings.db’, ‘my_config.ini’, and the various Slurm parameters to match the above, you could generate 10 jobs, each with 4 cores running 25 orbits each, as follows::

sbatch --array=0-9 multi_sorcha.sh 25 4

You can run multi_sorcha.py on the command line as well::

python multi_sorcha.py --config sorcha_config_demo.ini --input_orbits mba_sample_1000_orbit.csv --input_physical mba_sample_1000_physical.csv --pointings baseline_v2.0_1yr.db --path ./ --chunksize 1000 --norbits 250 --cores 4 --instance 0 --stats mbastats --cleanup --copy_inputs
python multi_sorcha.py --config sorcha_config_demo.ini --input_orbits mba_sample_1000_orbit.csv --input_physical mba_sample_1000_physical.csv --pointings baseline_v2.0_1yr.db --path ./ --chunksize 1000 --norbits 250 --cores 4 --instance 0 --stats mbastats --cleanup --copy_inputs

This will generate a single output file. It should work fine on a laptop, and be a bit, but not 4x, faster than the single-core equivalent due to overheads (time sorcha run -c sorcha_config_demo.ini -pd baseline_v2.0_1yr.db -o ./ -t 0_0 --st mbatats_0 -ob mba_sample_1000_orbit.csv -p mba_sample_1000_physical.csv).
This will generate a single output file. It should work fine on a laptop, and be a bit (but not quite 4x) faster than the single-core equivalent due to overheads.

.. note::
This ratio improves as input file sizes grow. Make sure to experiment with different numbers of cores to find what’s fastest given your setup and file sizes.
Expand All @@ -60,25 +60,25 @@ This will generate a single output file. It should work fine on a laptop, and be
Sorcha’s Helpful Utilities
---------------------------------

``Sorcha`` comes with a tool designed to combine the results of multiple runs and the input files used to create them into tables on a SQL database. This can make exploring your results easier. To see the usage of this tool, on the command line, run::
``Sorcha`` comes with a tool designed to combine the results of multiple runs and the input files used into tables on a SQL database. This can make exploring your results easier. To see how to use this tool, on the command line, run::

sorcha outputs create-sqlite –help
sorcha outputs create-sqlite –-help

``Sorcha`` also has a tool designed to search for and check the logs of a large number of runs. This tool can make sure all runs completed successfully, and output to either the terminal or a .csv file the names of the runs which have not completed and the relevant error message, if applicable. To see the usage of this tool, on the command line run::

sorcha outputs check-logs –help
sorcha outputs check-logs –-help


Best Practices/Tips and Tricks
-------------------------------------

1. We strongly recommend that HPC users download the auxiliary files needed to run the ASSIST+REBOUND into a known, named directory, and use the -ar command line flag in their **sorcha run** call to point ``Sorcha`` to those files. You can download the auxiliary files using::
1. We strongly recommend that HPC users download the auxiliary files needed to run the ASSIST+REBOUND into a known, named directory, and use the --ar command line flag in their :code:`sorcha run` call to point ``Sorcha`` to those files. You can download the auxiliary files using::

sorcha bootstrap --cache <directory>

And then run ``Sorcha`` via::

sorcha run … -ar /path/to/folder/
sorcha run … --ar /path/to/folder/

This is because ``Sorcha`` will otherwise attempt to download the files into the local cache, which may be on the HPC nodes rather than in your user directory, potentially triggering multiple slow downloads.

Expand All @@ -90,5 +90,5 @@ This is because ``Sorcha`` will otherwise attempt to download the files into the


.. tip::
You can use the **sorcha init** command to copy ``Sorcha``'s :ref:`example configuration files <example_configs>` into a directory of your choice.
You can use the :code:`sorcha init` command to copy ``Sorcha``'s :ref:`example configuration files <example_configs>` into a directory of your choice.

17 changes: 7 additions & 10 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,21 +23,18 @@ works, tutorials, and demonstration notebooks that show how each of the various
For a more detailed description of ``Sorcha`` and how it works, please see `Merritt et al. (submitted) <https://www.dropbox.com/scl/fi/secetw7n0a936iynzxmau/sorcha_paper_2025_Jan_submission_version.pdf?rlkey=pbhchiattrw5bna8sfo6ljvto&dl=0>`_ and `Holman et al. (submitted) <https://www.dropbox.com/scl/fi/lz1lmua2s0yf9t9a2gpmm/sorcha_ephemeris_generation_paper.pdf?rlkey=blm9u4zbk0ci1i4lc5yqz8dbs&dl=0>`_.

.. warning::
This documentation site and the software package it describes are currently under review. The code in the repository has been validated.
We will release ``Sorcha`` v1.0 on PyPI and conda-forge when accepted. We ask
that if you're external to the Sorcha team that you please wait to use Sorcha in your science papers until v1.0 is released.
This documentation site and the software package it describes are currently under review. The code in the repository has been validated (see the :ref:`various validation notebooks we provide <demonotebooks>`).
We will release ``Sorcha`` v1.0 on PyPI and conda-forge when the papers describing how they work are accepted. We ask
that if you're external to the ``Sorcha`` team that you please wait to use ``Sorcha`` in your science papers until v1.0 is released.


What is Sorcha?
------------------------------------------

``Sorcha`` (pronounced "surk-ha") is an open-source Solar System survey simulator written in Python.
``Sorcha`` means light or brightness in Irish and Scots Gaelic. Sorcha estimates the brightness of
simulated Solar System small bodies and determines which ones the survey could detect in
each of the survey's observations based on user set criteria. ``Sorcha`` has been designed
with the `Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) <https://rubinobservatory.org>`_
in mind. The software has a modular design, and our code can be adapted to be
used with any survey.
``Sorcha`` (pronounced "sur-kha"; derived from the Old Irish word for 'light' or 'brightness') is an open-source Solar System survey simulator written in Python.
``Sorcha`` estimates the brightness of simulated Solar System small bodies and determines which ones the survey could detect in
each of the survey's observations based on user set criteria. ``Sorcha`` has been designed with the `Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) <https://rubinobservatory.org>`_
in mind. The software has a modular design, and our code can be adapted to be used with any survey.

.. toctree::
:hidden:
Expand Down
2 changes: 2 additions & 0 deletions docs/notebooks.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _demonotebooks:

Demo Notebooks
========================================================================================

Expand Down