Skip to content

Cleanup public interface #589

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
Open

Cleanup public interface #589

wants to merge 17 commits into from

Conversation

cthoyt
Copy link
Member

@cthoyt cthoyt commented May 23, 2025

This PR does the following:

  1. Implements parse_tsv() and parse_csv() as aliases for parse_sssom_table that don't try and guess the right separator, which makes it easier to understand exactly what the function is for
  2. Guts and rewrites private separator inference and file opening utilities
  3. Exposes functions for parsing SSSOM at the top level, i.e. so you can do sssom.parse_tsv() instead of needing to know it's hiding in sssom.writers.parse_tsv
  4. Get rid of the sssom.parse function which just wraps pd.read_csv(), and is only used by a test
  5. Quality of life improvement: enables all write functions to take in a file path (in addition to being able to taking a TextIO)
  6. Update README to show off high level I/O functionality in a "getting started" section
  7. Improve docs and typing

Demo

import sssom

# other SSSOM files can be found on https://mapping-commons.github.io
url = "https://raw.githubusercontent.com/mapping-commons/mh_mapping_initiative/master/mappings/mp_hp_eye_impc.sssom.tsv"

# TSV can be parsed into a mapping set dataframe object,
# which includes a pandas DataFrame, a curies.Converter,
# and metadata
msdf = sssom.parse_tsv(url)

# SSSOM comes with several "write" functions
sssom.write_tsv(msdf, "test.tsv")
sssom.write_json(msdf, "test.json")
sssom.write_owl(msdf, "test.owl")
sssom.write_rdf(msdf, "test.ttl")

@cthoyt cthoyt marked this pull request as ready for review May 23, 2025 09:20
@cthoyt cthoyt requested a review from matentzn May 23, 2025 09:36
@cthoyt
Copy link
Member Author

cthoyt commented May 23, 2025

@matentzn @joeflack4 I'd appreciate a review!

@cthoyt cthoyt requested a review from joeflack4 May 23, 2025 09:36
Copy link
Collaborator

@joeflack4 joeflack4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cthoyt Thanks for adding these aliases! Other misc changes and docs look good, too.

Just one more thing on my wishlist: method/func names and locations

with open(yml_filepath, "w") as y:
yaml.safe_dump(meta, y)


def write_tsv(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method/Func names and locations

@cthoyt @matentzn One of the main ideas I had in #517 was to improve UX by mirroring pandas.

pandas.read_csv() --> sssom.read_csv()

df.to_csv(sep='\t') --> msdf.to_csv() # sep defaults to or can only be tabs

I'd also be willing to name or alias these as *_tsv() instead of *_csv().

What do you guys think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mildly opposed to any notion that CSV is an okay format, we should only ever use TSV

whether it's read_* or parse_* make no difference to me. However, in this PR I matched the existing nomenclature

wrt adding in the to_* methods on the MSDF - that should be in a separate PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re CSV I personally agree with you but mapping-commons/sssom#444

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate that pandas put csv in these methods, but they do support other separators. In any case, I'll approve!

Future method(s) would be appreciated! For now we can leave #517 open for that purpose.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in 5d936d7 I made a more explicit parse_csv and parse_tsv that don't infer the file type (and also fixed several bugs in the file type inference code ;)).

After thinking about it more, I wouldn't want to have a pandas-like interface, since they use CSV by default and I think it's a bad idea for this to be default functionality in SSSOM-py (it'll cause everyone headaches)

joeflack4
joeflack4 previously approved these changes May 24, 2025
@cthoyt cthoyt enabled auto-merge (squash) May 24, 2025 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants