-
Notifications
You must be signed in to change notification settings - Fork 1
Custom integrations #140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom integrations #140
Conversation
Finished staging setup for japan_full_map. View map here: https://dev2.macrostrat.org/maps/ingestion/3032/ |
8158287
to
54e2052
Compare
New metadata map ingestion pipeline is an option that is built within the standard single map ingestion process. New parameters added:
Example command to ingest map and specified metadata file:
Tests:
@davenquinn This is architecturally up for discussion. Let me know your thoughts |
…trat into custom_integrations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this!
A few small changes that would be ideal:
- break up the database migration
- look at whether we need to actually change the GitHub action to run CI on this branch
Otherwise, the other comments are more for general architectural considerations that presumably can be addressed in a future update after this is merged.
legend_table: str = Option( | ||
"polygons", | ||
help="Options: polygons, lines, or points. specifies the table in which the legend metadata is merged into. It defaults to sources polygons", | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if you have legend info for multiple data types? we don't have to solve that right now but it would require either
- expanding this to take options for each data type (e.g.,
--polygons-legend
,--polygons-legend-key
) - creating a parallel approach to merge a CSV after the table is created. That way we could first upload the base table and then merge in whatever other info we wanted, bit by bit.
I kind of like the latter approach – it feels much more flexible. Perhaps we want to make an issue to track that for a future update. (I also feel like you suggested this last month and I said no, so sorry if so!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do have a legend_table parameter to specify polygons, lines, or points.
legend_file: str = None,
legend_key: str = None,
legend_table: str = "polygons",
Based on this string, the merged df from preprocess_dataframe will insert into the associated table. Is this what you're mentioning?
# applies legend merge only to the whatever the legend_table is specified as
if legend_path and legend_table == feature_suffix:
df = preprocess_dataframe(df, legend_path=legend_path, join_col=join_col)
Also the CSV's/tsv and all metadata is merged after the table is created in the db
) | ||
|
||
print( | ||
f"\nFinished staging setup for {slug}. View map here: https://dev2.macrostrat.org/maps/ingestion/{source_id}/ \n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this URL would not be hard-coded/would point to the instance that is actually being used. But that's minor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could pull it out as a variable and add a # TODO
comment to add actual configurability
merged_df = df.merge(legend_df, on=join_col, how="left") | ||
return merged_df | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what a fancy looking function 😁
map-integration/macrostrat/map_integration/custom_integrations/japan_full_map.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super helpful feedback. I added some of the nice-to-haves as TODO's in the code
legend_table: str = Option( | ||
"polygons", | ||
help="Options: polygons, lines, or points. specifies the table in which the legend metadata is merged into. It defaults to sources polygons", | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do have a legend_table parameter to specify polygons, lines, or points.
legend_file: str = None,
legend_key: str = None,
legend_table: str = "polygons",
Based on this string, the merged df from preprocess_dataframe will insert into the associated table. Is this what you're mentioning?
# applies legend merge only to the whatever the legend_table is specified as
if legend_path and legend_table == feature_suffix:
df = preprocess_dataframe(df, legend_path=legend_path, join_col=join_col)
Also the CSV's/tsv and all metadata is merged after the table is created in the db
Standardized the ingest_map function to handle metadata file integrations for .csv, .xls, and .tsv. There are options to merge data into the polygons, lines, or points table. Additionally, the macrostrat.public schema migration code was updated to add read-only permissions to the rockd-reader user.