Skip to content

[GSOC] Add a metadata table to an existing Carsus output #433

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

manas-dhyani
Copy link
Contributor

@manas-dhyani manas-dhyani commented Mar 7, 2025

📝 Description
Type: 🚀 feature

This PR introduces metadata attachment to Carsus DataFrames (levels and lines). Metadata includes:

DOI: Citation for data source
Reference: Source publication
Units: Mapping of relevant physical units

Changes

  • Modified _get_levels_lines to attach metadata (doi, reference, and units) to levels and lines.
  • Used attrs.update(metadata) to store metadata in Pandas DataFrames.

#Verification

  • Printed levels.attrs and lines.attrs to verify metadata assignment.

📌 Resources

🔗 Verner et al. 1996 DOI

🚦 Testing

How did you test these changes?

  • Testing pipeline
  • Other method (describe)
  • My changes can't be tested (explain why)

☑️ Checklist

  • I requested two reviewers for this pull request
  • I updated the documentation according to my changes
  • I built the documentation by applying the build_docs label

Note: If you are not allowed to perform any of these actions, ping (@) a contributor.

Adding Metadata to existing carsus output

from carsus.io.cmfgen import CMFGENReader, CMFGENEnergyLevelsParser, CMFGENOscillatorStrengthsParser

levels_path = "al2_osc_split.dat"  
lines_path = "al2_osc_split.dat"    

al2_lvl_parser = CMFGENEnergyLevelsParser(levels_path)
al2_osc_parser = CMFGENOscillatorStrengthsParser(lines_path)

al2_lvl_data = {
    (13, 2): {
        "levels": al2_lvl_parser.base,  # Energy levels
        "lines": al2_osc_parser.base    # Spectral lines (oscillator strengths)
    }
}

reader = CMFGENReader(al2_lvl_data)
# Print metadata from levels
print(reader.levels.attrs)

# Print metadata from lines
print(reader.lines.attrs)
print(reader.levels)  # Print levels DataFrame
print(reader.lines) 

Screenshot 2025-03-09 at 12 46 07 PM

Saving DataFrames with Metadata to HDF5

import pandas as pd

# Assume reader.levels and reader.lines contain metadata in .attrs
hdf_filename = "carsus_output.h5"

with pd.HDFStore(hdf_filename, "w") as store:
    store.put("levels", reader.levels)
    store.put("lines", reader.lines)

    # Attach metadata to HDF5 store
    store.get_storer("levels").attrs.metadata = reader.levels.attrs
    store.get_storer("lines").attrs.metadata = reader.lines.attrs

print(f"Saved Carsus output to {hdf_filename}")

Loading HDF5 for metadata

import pandas as pd

hdf_filename = "carsus_output.h5"

with pd.HDFStore(hdf_filename, "r") as store:
    levels = store["levels"]
    lines = store["lines"]

    levels_metadata = store.get_storer("levels").attrs.metadata
    lines_metadata = store.get_storer("lines").attrs.metadata

print("Loaded Levels Metadata:", levels_metadata)
print("Loaded Lines Metadata:", lines_metadata)

Screenshot 2025-03-10 at 2 29 05 PM

Automation

def save_with_metadata(filename, df_dict):
    """
    Saves multiple Pandas DataFrames with metadata to an HDF5 file.

    Parameters:
        filename (str): Name of the HDF5 file.
        df_dict (dict): Dictionary of {name: dataframe} pairs, where each dataframe has .attrs metadata.
    """
    with pd.HDFStore(filename, "w") as store:
        for name, df in df_dict.items():
            store.put(name, df)
            store.get_storer(name).attrs.metadata = df.attrs

def load_with_metadata(filename):
    """
    Loads DataFrames and their metadata from an HDF5 file.

    Parameters:
        filename (str): Name of the HDF5 file.

    Returns:
        dict: Dictionary of {name: (dataframe, metadata)} pairs.
    """
    data = {}
    with pd.HDFStore(filename, "r") as store:
        for key in store.keys():
            df = store[key]
            metadata = store.get_storer(key.lstrip("/")).attrs.metadata
            data[key.lstrip("/")] = (df, metadata)
    return data

# Example usage:
save_with_metadata("carsus_output.h5", {"levels": reader.levels, "lines": reader.lines})

loaded_data = load_with_metadata("carsus_output.h5")
print("Loaded Metadata for Levels:", loaded_data["levels"][1])  # Check metadata
print("Loaded Metadata for Lines:", loaded_data["lines"][1])  # Check metadata

Screenshot 2025-03-10 at 2 29 05 PM

…rames

-Modify _get_levels_lines to attach metadata (DOI, reference, and units) to levels and lines.

-Verified metadata is correctly assigned to attrs.
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Contributor

github-actions bot commented Mar 7, 2025

*beep* *bop*
Hi human,
I ran ruff on the latest commit (6d008e1).
Here are the outputs produced.
Results can also be downloaded as artifacts here.
Summarised output:

48	F405	[ ] undefined-local-with-import-star-usage
5	E741	[ ] ambiguous-variable-name
1	F401	[*] unused-import
1	F403	[ ] undefined-local-with-import-star
1	F541	[*] f-string-missing-placeholders

Complete output(might be large):

carsus/io/cmfgen/base.py:9:19: F401 [*] `scipy.interpolate` imported but unused
carsus/io/cmfgen/base.py:15:1: F403 `from .util import *` used; unable to detect undefined names
carsus/io/cmfgen/base.py:34:18: F405 `parse_header` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:35:23: F405 `find_row` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:55:26: F405 `find_row` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:106:18: F405 `parse_header` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:107:23: F405 `find_row` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:158:35: F405 `to_float` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:159:35: F405 `to_float` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:179:18: F405 `parse_header` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:180:23: F405 `find_row` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:191:18: F405 `find_row` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:197:26: F405 `find_row` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:202:44: F405 `to_float` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:218:39: F405 `to_float` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:282:67: F405 `to_float` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:289:41: F405 `to_float` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:300:18: F405 `parse_header` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:301:14: F405 `open_cmfgen_file` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:354:18: F405 `parse_header` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:364:23: F405 `find_row` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:373:20: E741 Ambiguous variable name: `l`
carsus/io/cmfgen/base.py:401:12: E741 Ambiguous variable name: `l`
carsus/io/cmfgen/base.py:448:12: E741 Ambiguous variable name: `l`
carsus/io/cmfgen/base.py:566:25: F405 `CMFGEN_ATOM_DICT` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:610:25: E741 Ambiguous variable name: `l`
carsus/io/cmfgen/base.py:681:21: F405 `HC_IN_EV_ANGSTROM` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:681:62: F405 `RYD_TO_EV` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:684:42: F405 `CrossSectionType` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:685:35: F405 `get_null_phixs_table` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:688:21: F405 `CrossSectionType` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:689:21: F405 `CrossSectionType` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:690:21: F405 `CrossSectionType` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:701:21: F405 `CrossSectionType` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:702:21: F405 `CrossSectionType` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:714:39: F405 `get_null_phixs_table` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:717:39: F405 `get_seaton_phixs_table` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:721:44: F405 `CrossSectionType` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:731:43: F405 `get_hydrogenic_n_phixs_table` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:741:21: F405 `CrossSectionType` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:742:21: F405 `CrossSectionType` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:754:35: F405 `get_hydrogenic_nl_phixs_table` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:761:44: F405 `CrossSectionType` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:770:35: F405 `get_opproject_phixs_table` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:774:44: F405 `CrossSectionType` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:783:35: F405 `get_hummer_phixs_table` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:787:44: F405 `CrossSectionType` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:796:35: F405 `get_vy95_phixs_table` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:800:44: F405 `CrossSectionType` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:810:39: F405 `get_leibowitz_phixs_table` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:816:29: F541 [*] f-string without any placeholders
carsus/io/cmfgen/base.py:818:39: F405 `get_null_phixs_table` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:929:42: F405 `HC_IN_EV_ANGSTROM` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:932:46: F405 `RYD_TO_EV` may be undefined, or defined from star imports
carsus/io/cmfgen/base.py:937:29: E741 Ambiguous variable name: `l`
carsus/io/cmfgen/base.py:940:35: F405 `RYD_TO_EV` may be undefined, or defined from star imports
Found 56 errors.
[*] 2 fixable with the `--fix` option.

@manas-dhyani manas-dhyani marked this pull request as draft March 7, 2025 09:47
@manas-dhyani manas-dhyani marked this pull request as ready for review March 10, 2025 13:50
@manas-dhyani
Copy link
Contributor Author

@afloers @andrewfullard could you please review this.


# Generate metadata dynamically
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is dynamic if the values are hardcoded, though it does demonstrate your method works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took this as an example because the first GSoC objective stated to attach a DOI reference and physical units to one of the outputs. However, I agree that a more dynamic approach would be beneficial in future and plan to add parameters to the relevant functions so that metadata can be passed as a dictionary.

@manas-dhyani
Copy link
Contributor Author

Hi @andrewfullard, I’ve submitted my application on the GSoC official page—thank you so much for your guidance throughout the process!
I had also submitted my proposal for review through the Google Form shared on Gitter last Saturday, but I didn’t receive any feedback. I understand the team may have been busy with the upcoming deadline, or perhaps my message in Gitter may have been missed in the ongoing conversation.
If time permits, could you kindly take a look? I did really appreciate any feedback you might have.
Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants