Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added the ability for XGI to load data collections #540

Merged
merged 14 commits into from
Jun 24, 2024
Merged

Conversation

nwlandry
Copy link
Collaborator

Refactored the following to handle collections of data:

  • load_xgi_data
  • download_xgi_data
  • write_json
  • read_json

Also did the following:

  • Moved the contents of write_json to to_hypergraph_dict
  • Renamed dict_to_hypergraph to from_hypergraph_dict

Copy link

codecov bot commented May 20, 2024

Codecov Report

Attention: Patch coverage is 97.14286% with 3 lines in your changes missing coverage. Please review.

Project coverage is 92.41%. Comparing base (9355c38) to head (65606b3).
Report is 48 commits behind head on main.

Files with missing lines Patch % Lines
xgi/readwrite/xgi_data.py 93.02% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #540      +/-   ##
==========================================
+ Coverage   92.35%   92.41%   +0.05%     
==========================================
  Files          60       60              
  Lines        4500     4548      +48     
==========================================
+ Hits         4156     4203      +47     
- Misses        344      345       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@maximelucas
Copy link
Collaborator

Idea sounds nice! Can you elaborate a bit, what do you mean by collection and what's your intended workflow with this?

@nwlandry
Copy link
Collaborator Author

So my original intent was to figure out how to store collections similar to this one. I opened Issue 20 in XGI-DATA for this issue. The choices in my mind were (1) make a huge file containing all the hypergraphs or (2) continue to have a file for each hypergraph, but in addition have a collection file specifying metadata of the collection as well as where to find each dataset in the collection. There might be a more elegant way, but this is the best I've come up with so far. Note that with this PR, you can call xgi.load_xgi_data("hyperbard") and load the hyperbard collection.

@maximelucas
Copy link
Collaborator

Just to check, in the end you chose option (2) right? And this is implemented as a single record in Zenodo for the collection containing many .json files (one for each dataset).

xgi.load_xgi_data("hyperbard") loads a dict of datasets, so xgi.load_xgi_data("hyperbard")["coriolanus"] should load a single dataset if I understand correctly.

This is very nice and good for now thanks Nich.

As I mentioned in our call, I'm wondering if don't want to be able to access them directly from something like xgi.load_xgi_data("hyperbard-coriolanus") to be able to iterate over all datasets in XGI-data. They could also appear in the list we get with xgi.load_xgi_data() this way? This could be a future PR too.

Btw, related to #548 I don't think the loading function is in the online docs.

@nwlandry
Copy link
Collaborator Author

Thanks for the review! I completely agree, but this will take a bit longer to figure out how to implement. Let me know if you'd rather I wait to implement accessing/viewing individual datasets to merge or if I should open an issue on this. What do you mean when you say that the loading function isn't in the online docs?

@maximelucas
Copy link
Collaborator

Let me know if you'd rather I wait to implement accessing/viewing individual datasets to merge or if I should open an issue on this.

Yes I think we can just open an issue about this!

What do you mean when you say that the loading function isn't in the online docs?

I mean that xgi.load_xgi_data("<dataset_name>") is not documented anywhere (except from its own docstring).
As far as I can tell it only appears here, but without all its arguments: https://xgi.readthedocs.io/en/stable/xgi-data.html.

@nwlandry
Copy link
Collaborator Author

Thanks!! I added two issues from your comments.

@nwlandry nwlandry merged commit 6e5c25b into main Jun 24, 2024
24 checks passed
@nwlandry nwlandry deleted the extend-xgi-data branch June 24, 2024 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants