Skip to content

[GSOC]-Metadata for atomic data #442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

AkashJana18
Copy link

@AkashJana18 AkashJana18 commented Mar 29, 2025

📝 Description

Type: 🚀 feature
Implements metadata support for Carsus atomic data outputs as specified in the first project objective:

  1. Metadata Table

    • New MetadataHandler class stores:
      • Physical units (validated via Astropy, e.g., "angstrom", "Hz")
      • Journal article references (DOIs, manual entry)
      • Git commit hashes (automated when run in a repository)
  2. Output Formats

    • HDF5: Metadata stored in /metadata group (units, references, git info)
    • Pandas DataFrame: Metadata accessible via read_hdf_with_metadata()
  3. Automation (Bonus)

    • Git commit hash auto-detection
    • Unit validation (rejects invalid units like "not_a_unit")

📜 Example Usage

from carsus.io import save_to_hdf, MetadataHandler

# Initialize with manual metadata
handler = MetadataHandler(data_source="NIST")
handler.add_units("wavelength", "angstrom")  # Physical unit
handler.add_reference(
    doi="10.1051/0004-6361/201526937",      # Journal article DOI
    description="NIST Atomic Spectra Database"
)

# Save to HDF5 (or use with Pandas)
save_to_hdf(
    df=atomic_data, 
    path="output.h5",
    metadata_handler=handler
)

🚦 Testing

Unit Tests: pytest tests/test_metadata.py (100% coverage)
image

Manual Verification:

  • Confirmed HDF5 metadata structure with h5ls
  • Validated unit enforcement
  • Verified reference persistence

☑️ Checklist

  • Requested reviewers: @andreasflörs @andrewfullard
  • I updated the documentation according to my changes
  • Added comprehensive docstrings

Note: If you are not allowed to perform any of these actions, ping (@) a contributor.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Contributor

github-actions bot commented Mar 29, 2025

*beep* *bop*
Hi human,
I ran ruff on the latest commit (2fb87ab).
Here are the outputs produced.
Results can also be downloaded as artifacts here.
Summarised output:

2	F401	unused-import
1	E902	io-error

Complete output(might be large):

carsus/io/__init__.py:7:36: F401 `carsus.io.chianti_.ChiantiIonReader` imported but unused; consider removing, adding to `__all__`, or using a redundant alias
carsus/io/hdf.py:9:30: F401 [*] `astropy.units` imported but unused
carsus/metadata/atomic_data_with_metadata.h5:1:1: E902 stream did not contain valid UTF-8
Found 3 errors.
[*] 1 fixable with the `--fix` option.

@wkerzendorf wkerzendorf requested a review from Copilot April 1, 2025 14:01
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements metadata support for Carsus atomic data outputs by introducing a MetadataHandler class, updating HDF5 I/O functions to include metadata, and adding corresponding tests and an example notebook.

  • Introduces metadata handling for physical units, references, and git information.
  • Updates HDF5 save/read functions to store and retrieve metadata alongside atomic data.
  • Adds tests and an example notebook to demonstrate and verify the new metadata functionality.

Reviewed Changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
carsus/tests/test_metadata.py Unit tests for metadata functionality including units, references, and git info.
carsus/metadata/metadata_example.ipynb Notebook example demonstrating the usage of the metadata handler and HDF5 I/O functions.
carsus/metadata/metadata.py New implementation of MetadataHandler with support for adding units and references.
carsus/io/hdf.py Updated functions to save and read HDF5 files with additional metadata.
carsus/io/init.py Updated module exports to include the new HDF5 I/O functions.
Files not reviewed (1)
  • docs/metadata.rst: Language not supported
Comments suppressed due to low confidence (1)

carsus/metadata/metadata.py:111

  • Providing a url in add_reference will override the automatically generated URL from a provided doi, which may not be the intended behavior. Consider clarifying or documenting the precedence of URL assignment to avoid unexpected overrides.
if url is not None:

def _write_metadata(self, hdf: h5py.File, group: str) -> None:
"""Write metadata to HDF5 group."""
if group in hdf:
del hdf[group]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you deleting this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants