The SDSS Parameter Explorer is a custom data visualization application developed for SDSS-V's Data Release 19, built using FastAPI
, solara
, bokeh
and vaex
. It is designed to specifically interface with custom SDSS datafiles, providing filtered access, statistics, and visualization to the parameter data products from SDSS-V.
Explorer ships with two components
sdss_explorer.dashboard
:: main dashboard app, available online.sdss_explorer.server
:: a FastAPI backend for serving custom dataset renders.
To just install it tracking main
, run:
pip install git+https://www.github.com/sdss/explorer.git@main
or, to have a local copy on your machine, clone and install as an editable package via pip
:
git clone https://www.github.com/sdss/explorer.git ./sdss-explorer
cd sdss-explorer
pip install -e .
These instructions are the same as in conda
.
We recommend using uv to install this project directly for development.
git clone https://www.github.com/sdss/explorer.git ./sdss-explorer
cd sdss-explorer
uv sync
Otherwise, install like any other package:
git clone https://www.github.com/sdss/explorer.git ./sdss-explorer
cd sdss-explorer
python -m .venv venv # ensure you're using python 3.10
source .venv/bin/activate
pip install -r requirements.txt
pip install -e .
Explorer uses stacked, custom HDF5 renders of the astra summary files for each data release. You can download the HDF5's from here. These are proprietary SDSS data files, and should not be shared outside the collaboration.
It additionally uses a custom parquet render of the mappings used in semaphore, available in the same directory.
New datafiles for the dashboard must be generated each time the source SDSS summary catalog files change, as well as for each new Data Release. The files should be located here with the following structure:
- (release): a directory for each data release
- columnsAll(Star|Visit)-
(astra_version)
.json: JSON files providing a list of all columns for each catalog file used in the dashboard. One file per star/visit catalogs. - explorerAll(Star|Visit)-
(astra_version)
.hdf5: HDF5 files of the summary catalogs aggregated into a single file. One file per star/visit catalogs.
- columnsAll(Star|Visit)-
- dr19_dminfo.json: a JSON of the datamodel column descriptions of the catalog summary files, used for populating the dashboard column glossary.
- mappings.parquet: compiled datafile of all the sdss targeting cartons and programs
- explorer: a directory used as a scratch space for user's downloading subsets via the dashboard.
New columnsXXX.json
and explorerXXX.hdf5
files are generated following instructions at https://github.com/sdss/explorer-filegen. Also see the docs at Explorer Dev DataFiles.
dr19_dminfo.json
contains, for each datamodel column name, the following fields: name
, description
, type
, unit
. Original version of this file was the ipl3_partial.json
. This file can be produced by running scripts/gen_datamodel_ref.py
.
To run, the environment variables must be exported to the shell environment. The base ones are:
EXPLORER_DATAPATH
:: path to data files (proprietary SDSS data, found on the SAS).- In the deployment context a folder is mounted onto the VM.
- Files are expected to be placed as:
./[release]/[explorer|columns]All[datatype]-[vastra].[hdf5|parquet]
VASTRA
:: specific astra reduction versions to read.VAEX_HOME
:: path to store cache and log files during runtime. Defaults on startup to$HOME/.vaex
.VALIS_API_URL
:: url for valis. This is required for login authentication (to be implemented).
Additionally, using the download server requires:
EXPLORER_SCRATCH
:: path to a scratch spaceAPI_URL
:: API url for the download server. Defaults to localhost (so you might not need to set this)EXPLORER_NPROCESSES
:: max concurrent processes for custom summary file renders.
There is additionally:
EXPLORER_NWORKERS
:: how many gunicorn/uvicorn workers to use. Defaults to 1 so you don't connect to different workers each time
The Explorer can utilize a hybrid memory and disk cache . To set these up, use the following environment variables on runtime:
VAEX_CACHE="memory,disk"
VAEX_CACHE_DISK_SIZE_LIMIT="10GB"
-- this can be higher/lowerVAEX_CACHE_MEMORY_SIZE_LIMIT="1GB
-- this can also be higher/lower
These are automatically set when using the bundled shell scripts and docker.
To run the dashboard, use:
solara run sdss_explorer.dashboard
Then, run the FastAPI
backend with:
uvicorn --reload sdss_explorer.server:app --port=8050
This will start purely the app in development refresh mode on two uvicorn instances. To run in production mode, add --production
to the solara
command, and remove the --reload
flag from the uvicorn
call.
This repo comes included with a basic production docker image. This docker image deploys only the dashboard "download server", a lightweight FastAPI server to handle downloading data subsets from the UI dashboard.
To build, run:
docker build -t explorer -f Dockerfile .
To start a container, run:
docker run -p 8050:8050 -v $EXPLORER_SCRATCH:/root/scratch valis-dev -v $EXPLORER_DATAPATH:/root/data -e EXPLORER_DATAPATH=/root/data -e EXPLORER_SCRATCH=/root/scratch
Additionally, add -e EXPLORER_MOUNT_DASHBOARD
to mount the dashboard to the same docker.
This repo comes bundled with shell scripts to run the application via solara
. They don't particularly do anything different to just running it manually. To ensure they work, make the scripts executable:
chmod +x run.sh run_production.sh
To run using the provided shell scripts, use one of:
./run.sh # runs development mode
or
./run_production.sh # runs in production mode; no auto-refresh
This application is currently embedded in the SDSS valis API for deployment. You can test and develop the app within this deployment context through a poetry install
:
git clone https://github.com/sdss/valis.git
cd valis
poetry install -E solara
Load the relevant created virtual environment (generally in poetry cache unless stated otherwise) and deploy as:
uvicorn valis.wsgi:app --reload
The local web server is exposed at http://localhost:8000
, with the solara app at http://localhost:8000/valis/solara/dashboard
.
This project is Copyright (c) 2024, Riley Thai. All rights reserved.
We love contributions! explorer
is open source, built on open source, and we`d love to have you hang out in our community.
Imposter syndrome disclaimer: We want your help. No, really.
There may be a little voice inside your head that is telling you that you're not ready to be an open source contributor; that your skills aren't nearly good enough to contribute. What could you possibly offer a project like this one?
We assure you - the little voice in your head is wrong. If you can write code at all, you can contribute code to open source. Contributing to open source projects is a fantastic way to advance one's coding skills. Writing perfect code isn't the measure of a good developer (that would disqualify all of us!); it's trying to create something, making mistakes, and learning from those mistakes. That's how we all improve, and we are happy to help others learn.
Being an open source contributor doesn't just mean writing code, either. You can help out by writing documentation, tests, or even giving feedback about the project (and yes - that includes giving feedback about the contribution process). Some of these contributions may be the most valuable to the project as a whole, because you're coming to the project with fresh eyes, so you can see the errors and assumptions that seasoned contributors have glossed over.
Note: This disclaimer was originally written by Adrienne Lowe for a PyCon talk, and was adapted by explorer
based on its use in the README file for the MetPy project.