Skip to content

Add entity geocoding tool#8003

Open
ksuderman wants to merge 5 commits into
galaxyproject:mainfrom
ksuderman:geocode-entities
Open

Add entity geocoding tool#8003
ksuderman wants to merge 5 commits into
galaxyproject:mainfrom
ksuderman:geocode-entities

Conversation

@ksuderman
Copy link
Copy Markdown

Summary

  • Adds entity geocoding tool to convert NER output to GeoJSON format
  • Uses Nominatim geocoding service for location resolution
  • Outputs interactive GeoJSON datasets for map visualization in Galaxy
  • Works with JSON output from spaCy, Stanza, and CoreNLP NER tools
  • Includes entity aggregation and coordinate mapping capabilities

Test plan

  • Tool passes planemo lint validation
  • Test data included with spaCy NER JSON input
  • README documentation provided
  • .shed.yml configured for IUC submission
  • GeoJSON output format properly configured

🤖 Generated with Claude Code

ksuderman and others added 2 commits May 19, 2026 19:16
- Extracts location entities from NLP-annotated JSON (spaCy/Stanza)
- Geocodes GPE, LOC, FAC, and ORG entities using Nominatim
- Dual output: GeoJSON for interactive maps + tabular summary
- Galaxy OpenLayers integration for map visualization
- Configurable Nominatim server (public or self-hosted)
- Entity type selection and deduplication
- Pure Python with urllib.request (no geopy dependency)
- Rate-limited public API or unlimited self-hosted options
- Comprehensive tests and documentation
- Enables spatial analysis of text corpora

Tool: geocode_entities (v1.0.0+galaxy1)
Categories: Text Manipulation, Natural Language Processing
Citation: OpenStreetMap contributors
- Geocodes named entities from NER JSON to GeoJSON format
- Uses Nominatim geocoding service for location resolution
- Outputs interactive GeoJSON for map visualization in Galaxy
- Works with JSON output from spaCy, Stanza, and CoreNLP NER
- Includes entity aggregation and coordinate mapping

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@RZ9082 RZ9082 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is adding two different directories with identical content! So please let's start by removing one of them.

ksuderman added 2 commits May 20, 2026 12:30
Keep only tools/geocode/ to resolve duplicate tools issue
- Update profile from 21.05 to 24.1
- Remove macros.xml and inline version
- Fix homepage_url and remote_repository_url to point to IUC repository
- Add Galaxy copyright notice to Python script
- Add ftype attributes and has_n_rows to test assertions
@ksuderman
Copy link
Copy Markdown
Author

Addressed review issues - removed duplicate tools/geocode_entities/ directory and applied standard IUC fixes

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
<tool id="geocode_entities" name="Geocode Named Entities" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="24.1">
<macros>
<token name="@TOOL_VERSION@">1.0.0</token>
<token name="@VERSION_SUFFIX@">1</token>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<token name="@VERSION_SUFFIX@">1</token>
<token name="@VERSION_SUFFIX@">0</token>

<requirement type="package" version="3.12">python</requirement>
</requirements>
<command detect_errors="exit_code"><![CDATA[
python '$__tool_directory__/geocode_entities.py'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please add for all those files a https://docs.galaxyproject.org/en/latest/dev/schema.html#tool-required-files entry this will help in Pulsar settings.

<param name="nominatim|source" value="public"/>
<output name="geojson_output" ftype="geojson">
<assert_contents>
<has_text text="FeatureCollection"/>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use the json asserts

Comment thread tools/geocode/README.md

## Version History

- **1.0.0+galaxy1**: Initial release with GeoJSON output and OpenLayers integration
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was never released, not sure if its useful to keep the history here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants