performance for multipolygon contains much worse than for geos #1184

gauteh · 2024-05-18T20:37:47Z

Hi,

I'm trying to convert some code for roaring-landmask (https://github.com/gauteh/roaring-landmask) from geos to geo (gauteh/roaring-landmask#27). However, checking if a point is contained by a large multipolygon (with all countries from GSHHG) is far slower with geo than with geos. With geos I used PreparedGeometry, does something similar exist?

geos:

test shapes::tests::benches::test_contains_in_ocean ... bench:       1,586 ns/iter (+/- 45)
test shapes::tests::benches::test_contains_on_land  ... bench:       1,499 ns/iter (+/- 169)

geo:

test shapes::tests::benches::test_contains_in_ocean ... bench:  10,713,374 ns/iter (+/- 356,454)
test shapes::tests::benches::test_contains_on_land  ... bench:   6,709,172 ns/iter (+/- 255,050)

The text was updated successfully, but these errors were encountered:

urschrei · 2024-05-18T20:52:21Z

No, but you can decompose the multipolygon into polygons and add them to an rstar spatial index: https://docs.rs/rstar/latest/rstar/struct.RTree.html. The rtree is serializable, re-exported by geo (so it should "just work" for bulk loading the multipolygon Vec), and you can associate extra data with each polygon by using https://docs.rs/rstar/latest/rstar/primitives/struct.GeomWithData.html

gauteh · 2024-05-18T21:01:59Z

Thanks, for the tip. Do you know if that is how geos is doing it?

urschrei · 2024-05-18T21:20:53Z

A geos prepared geometry is an rtree behind the scenes, yes. So things like point in multipolygon query the tree transparently.

gauteh · 2024-05-19T07:23:38Z

I'm trying to add all my Polygons to an RTree, and then check if a point is contained by any of the polygons. But I'm getting this error:

   Compiling roaring-landmask v0.7.3 (/home/gauteh/met/roaring-landmask)
error[E0599]: the method `nearest_neighbor` exists for struct `RTree<Polygon>`, but its trait bounds were not satisfied
  --> src/shapes.rs:74:28
   |
74 |         let po = self.geom.nearest_neighbor(&p);
   |                            ^^^^^^^^^^^^^^^^ method cannot be called on `RTree<Polygon>` due to unsatisfied trait bounds
   |
  ::: /home/gauteh/.cargo/registry/src/index.crates.io-6f17d22bba15001f/geo-types-0.7.13/src/geometry/polygon.rs:71:1
   |
71 | pub struct Polygon<T: CoordNum = f64> {
   | ------------------------------------- doesn't satisfy `geo::Polygon: PointDistance`
   |
   = note: the following trait bounds were not satisfied:
           `geo::Polygon: PointDistance`

gauteh/roaring-landmask@b8e3a21

How can I search through the Polygons and check if the point is contained?

gauteh · 2024-05-19T07:36:59Z

I think Polygons should implement contains_point, but not PointDistance. Only contains_point and envelope-stuff is necessary for this check? Maybe PointDistance needs to be split.

gauteh · 2024-05-19T07:59:14Z

I wrapped the polygons and implemented contains (gauteh/roaring-landmask@60bc592)

The performance is better, but still roughly 1000x worse than geos :/

test shapes::tests::benches::test_contains_in_ocean ... bench:   1,403,390 ns/iter (+/- 175,985)
test shapes::tests::benches::test_contains_on_land  ... bench:   1,395,731 ns/iter (+/- 111,216)

urschrei · 2024-05-19T09:15:13Z

Can you post a link to your benchmark? I just want to make sure I'm understanding everything here.

gauteh · 2024-05-19T09:24:37Z

Hi, the benchmark is here: https://github.com/gauteh/roaring-landmask/blob/685ed6f43e944a604545c452273ded4ab901ce42/src/shapes.rs#L201

You run it with: cargo bench shapes --features nightly, requires python at least 3.10. main branch is with geos, the linked PR is with geo.

Thanks

kylebarron · 2024-05-20T20:33:39Z

FWIW I believe a GEOS prepared geometry indexes every coordinate pair (Line in geo terminology). This is vastly different than indexing each polygon, especially for large polygons like country/land boundaries. I think what you want is #803.

I think that geo will automatically build in-effect a prepared geometry before doing the intersection, but there's currently no API to allow the user to persist this. So if you're doing repeated intersections I'd expect bad performance compared to GEOS

gauteh · 2024-05-21T07:06:36Z

Thanks, that seems like something that needs to be done on geo level. It appears to have got some work, but still require some effort?

gauteh · 2024-06-06T08:41:05Z

Would it be enough to persist/cache this intermediate step? Any way to do a quick test if that makes any difference?

urschrei · 2024-06-06T11:02:59Z

You can enable serialise / deserialise support for trees by enabling the serde feature on the rstar dependency. I haven't done it myself, but it should be pretty straightforward using something like https://docs.rs/bincode/latest/bincode/

gauteh · 2024-06-06T11:48:19Z

It's not the rstar, but what is mentioned here: #1184 (comment) . Rstar didn't help enough.

michaelkirk · 2024-06-06T18:08:12Z

Any way to do a quick test if that makes any difference?

As discussed in #803, https://github.com/georust/geo/compare/mkirk/prepared-geom?expand=1 could be a good starting point.

gauteh · 2024-08-16T08:38:49Z

Now testing out prepared geometries through #1198, is it better if I create an RTree with prepared polygons, or should I just create one prepared multipolygon? I am testing many points against all the polygons.

gauteh · 2024-08-18T06:54:38Z

Using a RTree of prepared geometries (gauteh/roaring-landmask@e413778), slightly worse:

test shapes::tests::benches::test_contains_in_ocean ... bench:  41,531,670.10 ns/iter (+/- 4,594,429.18)
test shapes::tests::benches::test_contains_on_land  ... bench:  42,088,323.30 ns/iter (+/- 3,958,897.37)

airbreather · 2024-10-03T07:42:48Z

I've worked on NTS, the C# port of JTS (whereas GEOS is the C++ port of JTS). Since it looks like part of this discussion focuses SPECIFICALLY on the "contains point" case, while it's true that JTS / NTS / GEOS use an R-tree for this, there's some nuance that I don't see being discussed here (it's somewhat addressed here, but it's the details of that index that make a huge difference). If I've scanned too quickly and missed it, please ignore me...

For single-point tests specifically, it's NOT a typical 2-dimensional R-tree holding all the line segments. Rather, it uses a 1-dimensional "interval R-tree" of those segments.

Mathematically, calculating "where is (x, y) located relative to this (multi/)polygon?" is done by taking a ray that starts at (x, y), following it to infinity in any direction, and counting how many times it crosses a polygon ring segment. Ignoring the "exactly on the boundary" case for this discussion (not because it's unimportant generally, but because it doesn't affect indexing), if the result is even (including zero), then the point is outside. Otherwise, it's inside.

The math doesn't care which direction we choose, but there are four interesting ones: positive / negative, parallel to the x / y axis. GEOS happens to pick positive X, so I'll talk about that specifically, but it can work with any of the four.

Once you commit to choosing choosing that direction, you can index all of the line segments of the (multi/)polygon using a variant of the R-tree that only looks at their min/max Y values. With that index, a query for point (x, y) can cheaply discard any segments that can't possibly influence the result because they're either fully above or fully below the query point. What's more, you can very cheaply rule out a lot of the remaining segments as well by just checking if they're fully on the other side of the ray from your query point, leaving a very small subset of the segments where you have to run the more expensive test, almost all of which actually do need to influence the result in some way, almost all of the time.

I hope this helps.

gauteh mentioned this issue May 18, 2024

Georust: Use pure rust libraries for shapes and geometries gauteh/roaring-landmask#27

Open

urschrei mentioned this issue May 18, 2024

Note availability of rstar spatial index early in docs #1185

Open

gauteh mentioned this issue May 21, 2024

Prepared Geometries #803

Closed

gauteh mentioned this issue Jul 27, 2024

Georust: use one big prepared geometry gauteh/roaring-landmask#28

Open

This was referenced Aug 18, 2024

GeometryGraph and PreparedGeometry multi-threaded (Send) #1198

Open

Georust: Use pure rust libraries for shapes and geometries (RTree) gauteh/roaring-landmask#29

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance for multipolygon contains much worse than for geos #1184

performance for multipolygon contains much worse than for geos #1184

gauteh commented May 18, 2024

urschrei commented May 18, 2024

gauteh commented May 18, 2024

urschrei commented May 18, 2024

gauteh commented May 19, 2024

gauteh commented May 19, 2024

gauteh commented May 19, 2024

urschrei commented May 19, 2024

gauteh commented May 19, 2024

kylebarron commented May 20, 2024

gauteh commented May 21, 2024

gauteh commented Jun 6, 2024

urschrei commented Jun 6, 2024 •

edited

Loading

gauteh commented Jun 6, 2024

michaelkirk commented Jun 6, 2024

gauteh commented Aug 16, 2024

gauteh commented Aug 18, 2024 •

edited

Loading

airbreather commented Oct 3, 2024 •

edited

Loading

performance for multipolygon contains much worse than for geos #1184

performance for multipolygon contains much worse than for geos #1184

Comments

gauteh commented May 18, 2024

urschrei commented May 18, 2024

gauteh commented May 18, 2024

urschrei commented May 18, 2024

gauteh commented May 19, 2024

gauteh commented May 19, 2024

gauteh commented May 19, 2024

urschrei commented May 19, 2024

gauteh commented May 19, 2024

kylebarron commented May 20, 2024

gauteh commented May 21, 2024

gauteh commented Jun 6, 2024

urschrei commented Jun 6, 2024 • edited Loading

gauteh commented Jun 6, 2024

michaelkirk commented Jun 6, 2024

gauteh commented Aug 16, 2024

gauteh commented Aug 18, 2024 • edited Loading

airbreather commented Oct 3, 2024 • edited Loading

urschrei commented Jun 6, 2024 •

edited

Loading

gauteh commented Aug 18, 2024 •

edited

Loading

airbreather commented Oct 3, 2024 •

edited

Loading