Skip to content

SSEP‐3 ‐ Data Containers

Shane Maloney edited this page Apr 24, 2025 · 6 revisions

SSEP-3 - Data Containers

SSEP 3
Title Data Containers
Author(s) First Last
Contact Email [email protected]
Date-created YYYY-MM-DD
Date-updated YYYY-MM-SS
Type Standard
Discussion link to discussion if available
Status < Discussiom | Accepted | Rejected >

Introduction

Spectral X-ray data used in solar and heliospheric science come in a wide variety of shapes, formats, and dimensions, depending on the instrument and observational mode. A crucial part of the Sunkit-SPEX architecture is the implementation of standardized, extensible data containers that provide a consistent API for working with spectral data and associated metadata.

As heliophysics moves beyond Earth-centric observations, assumptions about observer location, instrument distance from the source, and projection geometry must be explicitly encoded in data structures. Instruments now produce event lists, 1D spectra (e.g., energy), 2D (e.g., time-energy), 3D (e.g., space-space-energy), and even 4D datasets (space-space-energy-time). Future missions may add additional dimensions such as polarisation.

Moreover, many analyses require transforming or collapsing higher-dimensional spectra into lower dimensions (e.g., summing over time in a 2D spectra to obtain time-integrated 1D spectra), and this must be reflected consistently across both the data and the instrument response

Note

This SSEP addresses spectral data containers. Event lists will be addressed in a separate proposal.

In addition to the spectral measurements, any scientific analysis must also incorporate the spectral response of the instrument. These response functions often come in separate files and diverse formats, and their structure may not directly mirror the data itself. However, any transformation applied to the spectral data (e.g., binning or slicing) must also be applied coherently to the response. For instance, summing adjacent energy bins requires rebinning the corresponding columns in the response matrix—often a non-trivial operation.

A core requirement is that the spectral data and the instrument response share a compatible and coordinated API to ensure joint operations are applied consistently. Both objects must also preserve and expose relevant metadata, including coordinate frames, exposure information, and instrumental context.

This SSEP outlines the high-level design and required features for spectral data containers in Sunkit-SPEX.

Discussion

Inspiration from Exising Data Containers

Sunkit-SPEX can draw inspiration and leverage design patterns from several existing Python packages:

sunpy

Two data containers which can be drawn from namely Map and TimeSeries. As the position of the instruments can no longer be fixed and assumed to be roughly earth located the maps coordinate frame and observer location properties provide a possible approach to incorporating this information.

specutils

Spectrum1D, SpectrumCollection and SpectrumList are designed to hold exactly the kind of spectral data need to but from a slightly different domain. The differences and some of the potential issues are discussed on this XraySpectrum1D class with loaders now admittedly very old PR. Essentially the difficulties in mapping operations and information from the spectra to the response.

NDCube 2.0

NDCube provides a number of high level object for representing multidimensional data with a particular focus on WCS and GWCS namely NDCube, NDCubeSequence and NDCollection. The similarity between the specutils containers and the NDCub containers is no accident specutils is based on the NDCube objects. Further sunpy Map is in the processing of being refactored to eventually be and NDCube. Recent progress on sliceable metadata makes NDCube

One Container, Two Container, 3

We propose the definition of at least two primary container classes:

  1. Spectrum A flexible, multi-dimensional container for spectral measurements.
  • Based on or inspired by NDCube.
  • Must support slicing, indexing, and collapsing operations (e.g., summing over time).
  • Encapsulates metadata such as observation time, instrument position, and exposure.
  • Should support transformations like rebinning, stacking, or unit conversion.
  1. SpectrometerResponse
  • A container that mirrors the Spectrum API and represents the instrument’s response.
  • Supports consistent operations with the Spectrum container (e.g., rebinning, masking).
  • Supports different (e.g., RMF+ARF or SRM) when applicable.
  • May support chained responses (e.g., detector + optics).
  • Should support lazy loading and chunking for large file sizes.
  • Should support modelling i.e parameters of the response can be fit from data

Unified API Principles

  • Identical indexing/slicing semantics for data and response.
  • Coordinate-aware metadata and observer location tracking.
  • Compatible with Astropy Units and Quantities, WCS and GWCS

What is data and what metadata

Following NDCube we can define metadata as

Proposal

Spectrum

class Spectrum(NDCube):
    r"""
    Spectrum container for data with one spectral axis.

    Note that "1D" in this case refers to the fact that there is only one
    spectral axis.  `Spectrum` can contain "vector 1D spectra" by having the
    ``flux`` have a shape with dimension greater than 1.
    
    Notes
    -----
    A stripped down version of `Spectrum1D` from `specutils`.

    Parameters
    ----------
    data : `~astropy.units.Quantity`
        The data for this spectrum. This can be a simple `~astropy.units.Quantity`,
        or an existing `~Spectrum1D` or `~ndcube.NDCube` object.
    uncertainty : `~astropy.nddata.NDUncertainty`
        Contains uncertainty information along with propagation rules for
        spectrum arithmetic. Can take a unit, but if none is given, will use
        the unit defined in the flux.
    spectral_axis : `~astropy.units.Quantity` or `~specutils.SpectralAxis`
        Dispersion information with the same shape as the dimension specified by spectral_dimension
        of shape plus one if specifying bin edges.
    spectral_dimension : `int` default 0
        The dimension of the data which represents the spectral information default to first dimension index 0.
    mask : `~numpy.ndarray`-like
        Array where values in the flux to be masked are those that
        ``astype(bool)`` converts to True. (For example, integer arrays are not
        masked where they are 0, and masked for any other value.)
    meta : dict
        Arbitrary container for any user-specific information to be carried
        around with the spectrum container object.

    ------------------------------------------

    exposure_time : `~astropy.units.Quantity`
        The effective or actual exposure time use to normalise the data
    area : `~astropy.units.Quantity`
    """

Response

class Response(NDCube):
    r"""
    Response container for spectrometer response.


    Parameters
    ----------
    redistribution_martix

    transmission

    effective_area
    """

Decision

Clone this wiki locally