Skip to content

[GSoC] Objective 1: Add metadata integration notebook and script #441

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

adel3li
Copy link

@adel3li adel3li commented Mar 27, 2025

✏️ Description

Type: 🚀 feature

This PR contributes a demonstration of Objective 1 from the GSoC 2025 project: Metadata Integration for Atomic Data in Carsus.

  • Simulates a Carsus-style levels DataFrame
  • Adds a metadata table with:
    • DOI reference
    • Units (eV)
    • Timestamp
    • Git commit hash
    • Notes
  • Adds a citation table for A_ij and Υ_ij
  • Exports all to HDF5 using pandas.HDFStore
  • Includes both a notebook and a Python script version

🔖 Submitted as part of the GSoC application for the Carsus/TARDIS project.
I’m happy to iterate based on mentor feedback.


📎 Resources


✅ Testing

  • Manual test: Executed both the notebook and script successfully
  • Verified HDF5 outputs and structure
  • Pipeline tests not applicable at this stage

✅ Checklist

  • I requested two reviewers for this pull request
  • I updated the documentation (README.md) with context
  • I am not authorized to build documentation, but can revise if needed

Please let me know if you'd like me to adapt this code structure to Carsus conventions or expand to other data tables.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Contributor

*beep* *bop*
Hi human,
I ran ruff on the latest commit (c5787cb).
Here are the outputs produced.
Results can also be downloaded as artifacts here.
Summarised output:

90	    	[ ] syntax-error
5	E402	[ ] module-import-not-at-top-of-file
4	F401	[*] unused-import
3	F821	[ ] undefined-name
2	E722	[ ] bare-except

Complete output(might be large):

README.md:3:6: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:3:26: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:3:38: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:3:42: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:3:48: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:3:58: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:3:61: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:3:72: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:3:87: SyntaxError: Compound statements are not allowed on the same line as simple statements
README.md:3:95: SyntaxError: Expected 'in', found name
README.md:3:102: SyntaxError: Expected ':', found name
README.md:3:115: SyntaxError: Expected an expression
README.md:4:1: SyntaxError: Expected an expression
README.md:4:10: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:4:19: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:4:22: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:4:29: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:4:36: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:4:41: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:4:49: SyntaxError: Expected an identifier
README.md:4:51: SyntaxError: Expected an expression
README.md:8:3: SyntaxError: Expected an expression
README.md:8:5: SyntaxError: Got unexpected token `
README.md:8:50: SyntaxError: Got unexpected token `
README.md:8:51: SyntaxError: Expected an expression
README.md:8:55: SyntaxError: Expected an expression
README.md:9:1: SyntaxError: Unexpected indentation
README.md:9:5: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:9:13: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:9:22: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:9:30: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:9:34: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:9:37: SyntaxError: Expected an expression
README.md:10:14: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:10:21: SyntaxError: Got unexpected token `
README.md:10:28: SyntaxError: Got unexpected token `
README.md:11:12: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:11:21: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:11:31: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:11:54: SyntaxError: Expected ',', found 'and'
README.md:11:62: SyntaxError: Expected ',', found name
README.md:11:69: SyntaxError: Expected ',', found name
README.md:12:14: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:12:23: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:13:10: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:13:21: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:13:26: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:13:31: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:15:1: SyntaxError: Expected a statement
README.md:15:3: SyntaxError: Expected an expression
README.md:15:5: SyntaxError: Got unexpected token `
README.md:15:35: SyntaxError: Got unexpected token `
README.md:15:36: SyntaxError: Expected an expression
README.md:15:40: SyntaxError: Expected an expression
README.md:16:1: SyntaxError: Unexpected indentation
README.md:16:5: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:16:12: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:16:20: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:16:23: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:16:27: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:16:36: SyntaxError: Compound statements are not allowed on the same line as simple statements
README.md:16:76: SyntaxError: Expected ':', found name
README.md:16:85: SyntaxError: Expected an identifier
README.md:20:1: SyntaxError: Expected a statement
README.md:20:6: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:21:6: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:21:11: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:22:27: SyntaxError: Expected ',', found ':'
README.md:22:28: SyntaxError: Expected ',', found '//'
README.md:26:6: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:26:32: SyntaxError: Expected a statement
README.md:26:40: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:26:43: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:26:47: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:26:59: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:26:67: SyntaxError: Compound statements are not allowed on the same line as simple statements
README.md:26:78: SyntaxError: Expected 'in', found name
README.md:26:85: SyntaxError: Expected ':', found name
README.md:26:88: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:26:93: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:26:98: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:26:104: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:26:108: SyntaxError: Simple statements must be separated by newlines or semicolons
README.md:26:116: SyntaxError: Expected an expression
README.md:27:1: SyntaxError: Expected an expression
README.md:27:12: SyntaxError: Compound statements are not allowed on the same line as simple statements
README.md:27:23: SyntaxError: Expected 'in', found name
README.md:27:28: SyntaxError: Got unexpected token –
README.md:27:45: SyntaxError: Expected an expression
README.md:27:46: SyntaxError: Expected an identifier
examples/generate_metadata.py:16:1: F821 Undefined name `get_ipython`
examples/generate_metadata.py:17:1: F821 Undefined name `get_ipython`
examples/generate_metadata.py:23:1: E402 Module level import not at top of file
examples/generate_metadata.py:24:1: E402 Module level import not at top of file
examples/generate_metadata.py:25:1: E402 Module level import not at top of file
examples/generate_metadata.py:26:1: E402 Module level import not at top of file
examples/generate_metadata.py:26:8: F401 [*] `os` imported but unused
examples/generate_metadata.py:27:1: E402 Module level import not at top of file
examples/generate_metadata.py:27:21: F401 [*] `pathlib.Path` imported but unused
examples/generate_metadata.py:55:5: E722 Do not use bare `except`
examples/generate_metadata.py:113:5: F821 Undefined name `display`
notebooks/gsoc2025_metadata_objective1.ipynb:cell 3:4:8: F401 [*] `os` imported but unused
notebooks/gsoc2025_metadata_objective1.ipynb:cell 3:5:21: F401 [*] `pathlib.Path` imported but unused
notebooks/gsoc2025_metadata_objective1.ipynb:cell 5:5:5: E722 Do not use bare `except`
Found 104 errors.
[*] 4 fixable with the `--fix` option.

@adel3li
Copy link
Author

adel3li commented Mar 27, 2025

Hi @afloers @andrewfullard 👋

Following up on your email, @afloers — I'm submitting this PR as part of the first objective for the GSoC 2025 project: Metadata Integration for Atomic Data in Carsus.

This contribution includes:

  • ✅ A Jupyter notebook demonstrating metadata integration (DOI, physical units, git commit, timestamp, notes) and citation tables
  • ✅ Export to HDF5 using pandas.HDFStore
  • ✅ A script version for automation (examples/generate_metadata.py)
  • ✅ A README with a quick project summary

I'd be grateful for your feedback on structure, completeness, or anything you'd like to see improved or extended.
Also happy to pick up further tasks or issues to help shape the next steps of the project.

Thanks again for the guidance and support!

— Adel Ali

@andrewfullard
Copy link
Contributor

This looks sensible to me. Please make an application.

@wkerzendorf wkerzendorf requested a review from Copilot April 1, 2025 14:01
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR demonstrates the first objective of the GSoC 2025 project by simulating a Carsus-like atomic levels DataFrame, attaching detailed metadata, and exporting the results to an HDF5 file both via a Jupyter notebook and a standalone Python script.

  • Adds an examples/generate_metadata.py script that builds a metadata and citation table for atomic data
  • Updates the README.md to document the new metadata integration feature

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
examples/generate_metadata.py Script demonstrating metadata integration and HDF5 export
README.md Documentation update introducing the metadata integration

Comment on lines +16 to +17
get_ipython().system('pip install git+https://github.com/tardis-sn/carsus.git')
get_ipython().system('pip install gitpython uncertainties')
Copy link
Preview

Copilot AI Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using get_ipython() to execute system commands may lead to errors when running this script outside of a notebook environment. Consider using subprocess.run or subprocess.check_call for reliability in a standalone script.

Suggested change
get_ipython().system('pip install git+https://github.com/tardis-sn/carsus.git')
get_ipython().system('pip install gitpython uncertainties')
subprocess.run(['pip', 'install', 'git+https://github.com/tardis-sn/carsus.git'], check=True)
subprocess.run(['pip', 'install', 'gitpython', 'uncertainties'], check=True)

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

Comment on lines +16 to +17
get_ipython().system('pip install git+https://github.com/tardis-sn/carsus.git')
get_ipython().system('pip install gitpython uncertainties')
Copy link
Preview

Copilot AI Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using get_ipython() to run installation commands may fail in non-notebook environments. It is recommended to replace these with subprocess-based calls for a more robust script execution.

Suggested change
get_ipython().system('pip install git+https://github.com/tardis-sn/carsus.git')
get_ipython().system('pip install gitpython uncertainties')
subprocess.run(['pip', 'install', 'git+https://github.com/tardis-sn/carsus.git'], check=True)
subprocess.run(['pip', 'install', 'gitpython', 'uncertainties'], check=True)

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

def get_git_commit():
try:
return subprocess.check_output(["git", "rev-parse", "HEAD"]).decode().strip()
except:
Copy link
Preview

Copilot AI Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid using a bare except clause; catch specific exceptions, e.g. subprocess.CalledProcessError, to improve error handling and debugging.

Suggested change
except:
except subprocess.CalledProcessError:

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

print("Available datasets:")
print(store.keys())
print("\nMetadata preview:")
display(store["levels_metadata"])
Copy link
Preview

Copilot AI Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of display() is specific to interactive environments like notebooks; if this script is intended for standalone execution, consider using print() or an alternative method to output metadata.

Suggested change
display(store["levels_metadata"])
print(store["levels_metadata"])

Copilot is powered by AI, so mistakes are possible. Review output carefully before use.

@adel3li
Copy link
Author

adel3li commented Apr 6, 2025

Hi @andrewfullard 👋

Just wanted to follow up and share my final GSoC proposal, now officially submitted!

📄 Proposal:
(It includes a summary of this PR and a link to a more detailed version inside)

This opportunity is deeply important to me. I had to step away from college to support my family and worked my way back through self-learning and experience. Contributing to Carsus through GSoC would be a huge milestone in that journey.

Thank you again for your guidance and support 🙏
— Adel Ali

@adel3li
Copy link
Author

adel3li commented Apr 9, 2025

Dear @andrewfullard ,

Quick note: I submitted my GSoC application but mistakenly uploaded the wrong proposal file under the Carsus project slot. The selected project is correct, and this PR is part of it — but the file doesn’t match. I’ve contacted GSoC support and shared the correct proposal with you via email.

This project truly means a lot to me, and I hope the actual work done here can still be considered.

— Adel Ali

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants