Custom integrations #140

amyfromandi · 2025-05-20T18:33:25Z

Standardized the ingest_map function to handle metadata file integrations for .csv, .xls, and .tsv. There are options to merge data into the polygons, lines, or points table. Additionally, the macrostrat.public schema migration code was updated to add read-only permissions to the rockd-reader user.

amyfromandi · 2025-05-20T18:41:51Z

Finished staging setup for japan_full_map. View map here: https://dev2.macrostrat.org/maps/ingestion/3032/

amyfromandi · 2025-06-18T16:22:01Z

New metadata map ingestion pipeline is an option that is built within the standard single map ingestion process. New parameters added:

--legend-file TEXT metadata URL to merge into the sources polygons/
lines/points table. [default: None]                                    
--legend-key TEXT  primary key to left join the metadata into the 
sources polygons/lines/points table. [default: None]                                    
--legend-table TEXT  Options: polygons, lines, or points. specifies the
 table in which the legend metadata is merged into. It defaults to 
sources polygons. [default: polygons]

Example command to ingest map and specified metadata file:

macrostrat maps staging japan_full /Users/afromandi/Macrostrat/Maps/Japan/Japan\ Full --name "Japan Full Map" Users/afromandi/Macrostrat/Maps/Japan/Japan\ Full/legend.tsv symbol polygons

Tests:

.tsv metadata ingested with the Japan full map
.csv metadata to test
.xls/.xlsx metadata to test

@davenquinn This is architecturally up for discussion. Let me know your thoughts

…m_integrations

…trat into custom_integrations

davenquinn

I like this!

A few small changes that would be ideal:

break up the database migration
look at whether we need to actually change the GitHub action to run CI on this branch

Otherwise, the other comments are more for general architectural considerations that presumably can be addressed in a future update after this is merged.

.github/workflows/run-makefile.yml

cli/macrostrat/cli/database/migrations/legacy_baseline/00-baseline.sql

map-integration/macrostrat/map_integration/__init__.py

davenquinn · 2025-06-18T20:17:24Z

map-integration/macrostrat/map_integration/__init__.py

+    legend_table: str = Option(
+        "polygons",
+        help="Options: polygons, lines, or points. specifies the table in which the legend metadata is merged into. It defaults to sources polygons",
+    ),


What if you have legend info for multiple data types? we don't have to solve that right now but it would require either

expanding this to take options for each data type (e.g., --polygons-legend, --polygons-legend-key)

creating a parallel approach to merge a CSV after the table is created. That way we could first upload the base table and then merge in whatever other info we wanted, bit by bit.

I kind of like the latter approach – it feels much more flexible. Perhaps we want to make an issue to track that for a future update. (I also feel like you suggested this last month and I said no, so sorry if so!)

I do have a legend_table parameter to specify polygons, lines, or points.

legend_file: str = None, legend_key: str = None, legend_table: str = "polygons",

Based on this string, the merged df from preprocess_dataframe will insert into the associated table. Is this what you're mentioning?

# applies legend merge only to the whatever the legend_table is specified as if legend_path and legend_table == feature_suffix: df = preprocess_dataframe(df, legend_path=legend_path, join_col=join_col)

Also the CSV's/tsv and all metadata is merged after the table is created in the db

davenquinn · 2025-06-18T20:19:01Z

map-integration/macrostrat/map_integration/__init__.py

+            )
+
+        print(
+            f"\nFinished staging setup for {slug}. View map here: https://dev2.macrostrat.org/maps/ingestion/{source_id}/ \n"


Ideally this URL would not be hard-coded/would point to the instance that is actually being used. But that's minor.

You could pull it out as a variable and add a # TODO comment to add actual configurability

davenquinn · 2025-06-18T20:21:14Z

map-integration/macrostrat/map_integration/commands/ingest.py

+    merged_df = df.merge(legend_df, on=join_col, how="left")
+    return merged_df
+
+


what a fancy looking function 😁

map-integration/macrostrat/map_integration/commands/ingest.py

map-integration/macrostrat/map_integration/custom_integrations/japan_full_map.py

map-integration/macrostrat/map_integration/__init__.py

map-staging/macrostrat/map_staging/japan-scraper.py

amyfromandi

Super helpful feedback. I added some of the nice-to-haves as TODO's in the code

.github/workflows/run-makefile.yml

cli/macrostrat/cli/database/migrations/legacy_baseline/00-baseline.sql

map-integration/macrostrat/map_integration/__init__.py

amyfromandi · 2025-06-20T15:01:55Z

map-integration/macrostrat/map_integration/__init__.py

+    legend_table: str = Option(
+        "polygons",
+        help="Options: polygons, lines, or points. specifies the table in which the legend metadata is merged into. It defaults to sources polygons",
+    ),


I do have a legend_table parameter to specify polygons, lines, or points.

legend_file: str = None, legend_key: str = None, legend_table: str = "polygons",

Based on this string, the merged df from preprocess_dataframe will insert into the associated table. Is this what you're mentioning?

# applies legend merge only to the whatever the legend_table is specified as if legend_path and legend_table == feature_suffix: df = preprocess_dataframe(df, legend_path=legend_path, join_col=join_col)

Also the CSV's/tsv and all metadata is merged after the table is created in the db

map-integration/macrostrat/map_integration/custom_integrations/japan_full_map.py

amyfromandi requested a review from davenquinn May 20, 2025 18:33

amyfromandi and others added 2 commits May 22, 2025 12:12

Reapply cleaned custom integration files after rebase

cd8193b

Format code and sort imports

54e2052

amyfromandi force-pushed the custom_integrations branch 2 times, most recently from 8158287 to 54e2052 Compare May 22, 2025 21:15

Format code and sort imports

034a21b

amyfromandi removed the request for review from davenquinn June 18, 2025 15:44

amyfromandi requested a review from davenquinn June 18, 2025 16:22

amyfromandi added 8 commits June 18, 2025 12:59

Merge branch 'main' of github.com:UW-Macrostrat/macrostrat into custo…

57b3bc8

…m_integrations

minor edits to the metadata pipeline

5c436dc

Merge branch 'custom_integrations' of github.com:UW-Macrostrat/macros…

c004784

…trat into custom_integrations

testing CI workflow

a3d24db

testing the macrostrat test suite CI

3c20b92

testing the macrostrat test suite CI

11ceb4b

testing CI

6f905d6

test suite CI

2061744

davenquinn requested changes Jun 18, 2025

View reviewed changes

amyfromandi merged commit 2061744 into main Jun 19, 2025
1 of 4 checks passed

amyfromandi commented Jun 20, 2025

View reviewed changes

		merged_df = df.merge(legend_df, on=join_col, how="left")
		return merged_df

Custom integrations #140

Custom integrations #140

Uh oh!

Conversation

amyfromandi commented May 20, 2025

Uh oh!

amyfromandi commented May 20, 2025

Uh oh!

amyfromandi commented Jun 18, 2025

Uh oh!

davenquinn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

davenquinn Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amyfromandi Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

davenquinn Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

davenquinn Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

davenquinn Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amyfromandi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amyfromandi Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

davenquinn Jun 18, 2025 •

edited

Loading