-
Notifications
You must be signed in to change notification settings - Fork 234
Add simply connected interior validation and test suite #1472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add simply connected interior validation and test suite #1472
Conversation
dfa491f to
7398dfe
Compare
michaelkirk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello! Thank you for the pull request. This would be a great feature to incorporate, but I've got some questions to work through.
| let edge = RefCell::borrow(edge); | ||
| edge.edge_intersections() | ||
| .iter() | ||
| .filter_map(|ei| coord_to_bits(&ei.coordinate())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: f64::to_bits
Is this in pursuit of getting the floating points to fit into a HashSet? There's a reason you can't put floats in a HashSet, and it's not clear to me that this usage is valid.
Can you talk more about this? Maybe it's OK, but I've just never used floats this way.
The only problematic examples I can think of off the top of my head are:
-
NaN coords which have many representations, but we don't make any guarantees for NaN coords anyway, so that's probably fine
-
-0.0 vs. +0.0 - topologically those are the same number, but they'd appear as distinct entries, which seems like it could confound this algorithm.
I'm not sure if there are any others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on this comment, the algo is now using ordered points via total_cmp -- gave a performance boost too.
| RingRole::Exterior, | ||
| RingRole::Interior(0) | ||
| )] | ||
| let errors = polygon.validation_errors(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the assert_validation_errors macro. If you're having trouble I can help, but I'm not sure if the issue is you tried and it didn't work or if this is just what the AI spit out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just what the AI spit out.
I have to say I spent a long time deliberating whether or not to make this PR. I've never written a line of rust before, and I really didn't want to pollute the codebase with AI slop. Nor did I want to waste your time with overly verbose / poor code.
I find the solution readable, and the tests make sense to me (and are validated using shapely), so I hope I'm not too far off track!
geo/CHANGES.md
Outdated
| ## 0.32.0 - 2025-12-05 | ||
|
|
||
| - Add simply connected interior validation for polygons. Polygons with holes that touch at vertices in ways that disconnect the interior (e.g., two holes sharing 2+ vertices, or cycles of holes each sharing a vertex) are now detected as invalid via `Validation::is_valid()`. This aligns with OGC Simple Features and matches PostGIS behavior. | ||
| - Performance: Polygon validation now uses `PreparedGeometry` to cache R-tree structures, improving validation speed by 26-45% for polygons with many holes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PreparedGeometry is just used for this newly added validation right? So there's no world in which people validating before would see a speedup, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also added a let prepared_exterior = PreparedGeometry::from(&polygon_exterior); for the exterior-interior relate calls that has a nice improvement in the benchmarks for polygons with lots of holes.
These tests document the missing validation check for simply connected polygon interiors, as noted in polygon.rs:13-14. The OGC Simple Feature Specification requires that "the interior of every Surface is a connected point set." Test cases: - Two L-shaped holes sharing 2 vertices (disconnects interior) - Checkerboard level 0: 13 holes sharing vertices at grid points - Checkerboard level 1: Nested checkerboard with 26 holes - Valid cases: holes sharing 1 vertex, non-touching holes The checkerboard pattern demonstrates that a simple "two holes sharing 2+ vertices" check is insufficient - four holes meeting at single vertices can also disconnect the interior. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add validation to detect when a polygon's interior is not simply connected, which violates the OGC Simple Feature Specification requirement that "the interior of every Surface is a connected point set." The validation detects two types of disconnection: - Multi-touch: Two rings sharing 2+ vertices at different coordinates - Cycle detection: Rings forming a cycle through distinct single-touch points Key implementation details: - Build a touch graph where nodes are rings and edges connect touching rings - Detect vertex-vertex touches (same coordinate on both rings) - Detect vertex-on-edge touches (vertex lies on another ring's edge) - Use DFS to find cycles through distinct coordinates The error message now includes the problematic coordinates to aid debugging, e.g., "polygon interior is not simply connected at coordinate(s): (2, 2), (3, 3)" Tests cover: - Two L-shaped holes sharing two vertices (invalid) - Checkerboard patterns with shared vertices (invalid) - Four holes forming a cycle of single touches (invalid) - Three holes meeting at one vertex (valid - interior still connected) - Holes with vertex-on-edge touches to exterior (invalid) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Remove the 1.2MB validate.geojson file and complex GeoJSON parsing code. Use existing fixtures from geo_test_fixtures instead: - east_baton_rouge polygon - nl_zones MultiPolygon - nl_plots_wgs84 MultiPolygon This keeps the benchmark coverage while eliminating the custom fixture and 50+ lines of parsing code. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replace the custom translate_ring() function with the existing Translate::translate() method from geo. This avoids duplicating functionality that already exists in the crate. Added geo as a dependency to geo-test-fixtures to access the Translate trait. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replace the to_bits() hashing approach with total_cmp-based sorting for grouping coordinates by equality. This approach: - Uses GeoFloat::total_cmp for consistent ordering (handles -0.0 == +0.0) - Groups consecutive equal coordinates after sorting - Sorts by edge_idx within groups for efficient deduplication - Maintains O(n log n) complexity with better cache locality Benchmarks show 5-15% improvement over the previous hashing approach. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Replace manual error checking with the assert_validation_errors! macro for InteriorNotSimplyConnected tests. Use exact coordinates for the expected touch points instead of pattern matching. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Consolidate the performance note into the main validation bullet point to clarify that PreparedGeometry is used for interior/exterior containment checks, not just the new simply-connected check. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
935d0ad to
15efe63
Compare
This test documents that iOverlay currently produces polygons with disconnected interiors when boolean operations result in holes sharing 2+ vertices. The new simply-connected interior validation detects this. When iOverlay fixes issue georust#27, it should split the result into multiple valid polygons instead of producing one invalid polygon. Related: - iOverlay issue: iShape-Rust/iOverlay#27 - geo issue: georust#1470 - Supersedes PR georust#1471 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
CHANGES.mdif knowledge of this change could be valuable to users.Add validation to detect when a polygon's interior is not simply connected, which violates the OGC Simple Feature Specification requirement that "the interior of every Surface is a connected point set."
What This Detects
The validation detects two types of interior disconnection:
Algorithm
Uses the
GeometryGraphedge intersection data to find where rings touch:(x, y, edge_idx)usingtotal_cmpfor consistent ordering, then group by coordinateA "touch" requires at least one ring to have the point as an actual vertex (not a mid-segment crossing).
Key Insight
Multiple rings meeting at a single coordinate does NOT disconnect the interior. For example, three triangular holes meeting at one point still have a connected interior around their perimeter. Disconnection only occurs when touches happen at distinct coordinates.
Error Reporting
The error message includes the problematic coordinates to aid debugging:
Performance
Uses
PreparedGeometryto cache R-tree structures for interior/exterior containment checks. The sorting-based coordinate grouping provides better cache locality than hash-based approaches.