Skip to content

Custom integrations #140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 19, 2025
Merged

Custom integrations #140

merged 11 commits into from
Jun 19, 2025

Conversation

amyfromandi
Copy link
Collaborator

Standardized the ingest_map function to handle metadata file integrations for .csv, .xls, and .tsv. There are options to merge data into the polygons, lines, or points table. Additionally, the macrostrat.public schema migration code was updated to add read-only permissions to the rockd-reader user.

@amyfromandi amyfromandi requested a review from davenquinn May 20, 2025 18:33
@amyfromandi
Copy link
Collaborator Author

Finished staging setup for japan_full_map. View map here: https://dev2.macrostrat.org/maps/ingestion/3032/

@amyfromandi amyfromandi force-pushed the custom_integrations branch 2 times, most recently from 8158287 to 54e2052 Compare May 22, 2025 21:15
@amyfromandi amyfromandi removed the request for review from davenquinn June 18, 2025 15:44
@amyfromandi
Copy link
Collaborator Author

New metadata map ingestion pipeline is an option that is built within the standard single map ingestion process. New parameters added:

--legend-file TEXT metadata URL to merge into the sources polygons/
lines/points table. [default: None]                                    
--legend-key TEXT  primary key to left join the metadata into the 
sources polygons/lines/points table. [default: None]                                    
--legend-table TEXT  Options: polygons, lines, or points. specifies the
 table in which the legend metadata is merged into. It defaults to 
sources polygons. [default: polygons]

Example command to ingest map and specified metadata file:

macrostrat maps staging japan_full /Users/afromandi/Macrostrat/Maps/Japan/Japan\ Full --name "Japan Full Map" Users/afromandi/Macrostrat/Maps/Japan/Japan\ Full/legend.tsv symbol polygons

image

Tests:

  • .tsv metadata ingested with the Japan full map
  • .csv metadata to test
  • .xls/.xlsx metadata to test

@davenquinn This is architecturally up for discussion. Let me know your thoughts

@amyfromandi amyfromandi requested a review from davenquinn June 18, 2025 16:22
Copy link
Member

@davenquinn davenquinn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this!

A few small changes that would be ideal:

  • break up the database migration
  • look at whether we need to actually change the GitHub action to run CI on this branch

Otherwise, the other comments are more for general architectural considerations that presumably can be addressed in a future update after this is merged.

legend_table: str = Option(
"polygons",
help="Options: polygons, lines, or points. specifies the table in which the legend metadata is merged into. It defaults to sources polygons",
),
Copy link
Member

@davenquinn davenquinn Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you have legend info for multiple data types? we don't have to solve that right now but it would require either

  • expanding this to take options for each data type (e.g., --polygons-legend, --polygons-legend-key)
  • creating a parallel approach to merge a CSV after the table is created. That way we could first upload the base table and then merge in whatever other info we wanted, bit by bit.

I kind of like the latter approach – it feels much more flexible. Perhaps we want to make an issue to track that for a future update. (I also feel like you suggested this last month and I said no, so sorry if so!)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do have a legend_table parameter to specify polygons, lines, or points.

legend_file: str = None,
legend_key: str = None,
legend_table: str = "polygons",

Based on this string, the merged df from preprocess_dataframe will insert into the associated table. Is this what you're mentioning?

# applies legend merge only to the whatever the legend_table is specified as
if legend_path and legend_table == feature_suffix:
df = preprocess_dataframe(df, legend_path=legend_path, join_col=join_col)

Also the CSV's/tsv and all metadata is merged after the table is created in the db

)

print(
f"\nFinished staging setup for {slug}. View map here: https://dev2.macrostrat.org/maps/ingestion/{source_id}/ \n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this URL would not be hard-coded/would point to the instance that is actually being used. But that's minor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could pull it out as a variable and add a # TODO comment to add actual configurability

merged_df = df.merge(legend_df, on=join_col, how="left")
return merged_df


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what a fancy looking function 😁

@amyfromandi amyfromandi merged commit 2061744 into main Jun 19, 2025
1 of 4 checks passed
Copy link
Collaborator Author

@amyfromandi amyfromandi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super helpful feedback. I added some of the nice-to-haves as TODO's in the code

legend_table: str = Option(
"polygons",
help="Options: polygons, lines, or points. specifies the table in which the legend metadata is merged into. It defaults to sources polygons",
),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do have a legend_table parameter to specify polygons, lines, or points.

legend_file: str = None,
legend_key: str = None,
legend_table: str = "polygons",

Based on this string, the merged df from preprocess_dataframe will insert into the associated table. Is this what you're mentioning?

# applies legend merge only to the whatever the legend_table is specified as
if legend_path and legend_table == feature_suffix:
df = preprocess_dataframe(df, legend_path=legend_path, join_col=join_col)

Also the CSV's/tsv and all metadata is merged after the table is created in the db

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants