Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cascaded / Unary union #1246

Open
wants to merge 33 commits into
base: main
Choose a base branch
from
Open

Cascaded / Unary union #1246

wants to merge 33 commits into from

Conversation

urschrei
Copy link
Member

@urschrei urschrei commented Oct 31, 2024

  • I agree to follow the project's code of conduct.
  • I added an entry to CHANGES.md if knowledge of this change could be valuable to users.

This is a draft since it currently has unknown perf characteristics, and there may be better ways of defining the trait implementers

@lnicola
Copy link
Member

lnicola commented Oct 31, 2024

Did you ask for some polygons? They're not WKT (I can help with the conversion to if needed), but here's a bunch of them I just happened to have on hand: https://data.europa.eu/data/datasets/d18f6f95-1d96-4b86-a4ba-291a416e45bb?locale=en.

They're a mix of polygons and multipolygons, but I'm not sure why that would be a problem.

@urschrei
Copy link
Member Author

urschrei commented Oct 31, 2024

Some preliminary perf data:

a 73.8 mb WKT file, containing 39490 polygons (extract from large-scale agricultural data, thanks @lnicola!) can be unioned into a MultiPolygon in around 11 seconds in release mode. The naïve approach:

let unioned = data
    .iter()
    .fold(MultiPolygon::<f64>::new(vec![]), |acc, poly| {
        acc.union(poly)
    });

hadn't terminated after 18 minutes, so this is at least a 100x improvement (?).

Next I need to verify that the algorithm produces valid output (no reason to think it doesn't)

cc @dabreegster please take this for a spin with more real-world data when you get a chance!

@urschrei urschrei changed the title Cascaded union Cascaded / Unary union Oct 31, 2024
@urschrei
Copy link
Member Author

urschrei commented Nov 1, 2024

Since we can union MultiPolygons, I'd like to make this available for them too, but I don't think it's possible to generically express that, because you need concrete types for the init, fold, and reduce function definitions and they vary slightly between Polygon and MultiPolygon.

I'd also be happy to write a second impl for MultiPolygon, but rustc also complains about that for some reason, even though the bounds are different:

impl<T, Container> UnaryUnion for Container
where
    T: BoolOpsNum,
    Container: IntoIterator<Item = MultiPolygon<T>> + Clone,
    MultiPolygon<T>: RTreeObject,
{ ... }

@dabreegster
Copy link
Contributor

I'm seeing great results from INSPIRE parcels. https://www.dropbox.com/scl/fi/8w9ufilq2c7wq1vp4fei0/London_Borough_of_Southwark.zip?rlkey=ejcug1i5fnkf35d2xjwcogz9m&dl=0

Using mapshaper v1.geojson -dissolve -explode -o v2.geojson format=geojson geojson-type=FeatureCollection, it takes 1.3s to dissolve 40k features into 7.5k. Using the new API in this PR (via https://github.com/acteng/will-it-fit/tree/use_unary_union), it takes 5.1s to dissolve to 1.1k. This implementation handles inputs that already overlap better than mapshaper.

Mapshaper output, darker area is features overlapping originally that weren't dissolved:
image
Rust output handles these:
image

There's more, larger INSPIRE inputs. I'm on battery power right now, so not going to push it :) But a larger run in Birmingham on 60k features ran in about 30s (I didn't time it exactly) and produced 17k outputs. I have even larger OS data I could try, but can't share the inputs.
https://github.com/acteng/will-it-fit/blob/use_unary_union/data_prep/inspire_one_area.sh

It's particularly impressive/useful that the base boolean ops work on both projected and un-projected inputs!

No feedback on this API; it seems fine to me. I have a more niche use case of limiting the maximum area of unioned polygons, so I'll probably adapt what's here to do that. I don't think baking in support for that level of customization makes sense.

@michaelkirk
Copy link
Member

michaelkirk commented Nov 1, 2024

un-projected inputs

As I understand it, the math is all fundamentally euclidean, so you're playing with fire if you're doing these calculations without first projecting to the euclidean plane.

Fire can be kind of fun though.

@michaelkirk
Copy link
Member

speaking of a multithreading flag - this feature might be a good candidate for rayon. (probably not in this PR, but it'd be interesting to see)

@kylebarron
Copy link
Member

this feature might be a good candidate for rayon

It might be good to have a separate discussion thread about rayon API design. In particular, I'd love it if any rayon integration could use a rayon thread pool provided by the caller, so that higher level apps that already use rayon don't have work split across multiple rayon pools.

@michaelkirk
Copy link
Member

michaelkirk commented Nov 1, 2024

Since we can union MultiPolygons, I'd like to make this available for them too,

Could you implement it in terms of impl BooleanOps<Scalar = Self::Scalar> (both MultiPolygon and Polygon implement that)

Something like:

- Container: IntoIterator<Item = Polygon<T>> + Clone,
+ Container: IntoIterator<Item = impl BooleanOps<Scalar = T>> + Clone,

This should work for both Iterator<Item=Polygon> and Iterator<Item=MultiPolygon>, but if you wanted a single iterator, you'd need to do something like Iterator<Item=Box<dyn BooleanOps>>

@urschrei
Copy link
Member Author

urschrei commented Nov 2, 2024

Since we can union MultiPolygons, I'd like to make this available for them too,

Could you implement it in terms of impl BooleanOps<Scalar = Self::Scalar> (both MultiPolygon and Polygon implement that)

Something like:

- Container: IntoIterator<Item = Polygon<T>> + Clone,
+ Container: IntoIterator<Item = impl BooleanOps<Scalar = T>> + Clone,

This should work for both Iterator<Item=Polygon> and Iterator<Item=MultiPolygon>, but if you wanted a single iterator, you'd need to do something like Iterator<Item=Box<dyn BooleanOps>>

You can't use impl in trait bounds, so that won't work as written (and it has to be an RTreeObject but that can probably be tagged on.

I think we will actually need separate impls even if I can figure out the above because I'm struggling to imagine how we'd specialise the fold function definition inside unary_union:

it's currently

let fold = |mut accum: MultiPolygon<T>, poly: &Polygon<T>| -> MultiPolygon<T> {
            accum = accum.union(poly);
            accum
        };

but for MultiPolygon it would have to be

let fold = |mut accum: MultiPolygon<T>, mpoly: &MultiPolygon<T>| -> MultiPolygon<T> {
            accum = accum.union(mpoly);
            accum
        };

@urschrei
Copy link
Member Author

urschrei commented Nov 2, 2024

I wonder if this would work if Polygon and MultiPolygon were enum members and BooleanOps was impl'd on the enum (followed by UnaryUnion).

@urschrei
Copy link
Member Author

urschrei commented Nov 2, 2024

No point in tying myself in knots playing type Tetris: MultiPolygon doesn't impl RTreeObject anyway (we should probably do that) so it'll have to be a manual op for now (it's not hugely complex: you flat_map the MultiPolygon containers).

I have a more niche use case of limiting the maximum area of unioned polygons, so I'll probably adapt what's here to do that. I don't think baking in support for that level of customization makes sense.

I think this might be tricky to implement either on your side or in geo as the fold operation definitionally has a scalar output: there's no way to skip geometries or end early – those geometries then wouldn't appear in the output at all, which isn't what you want.

@urschrei urschrei marked this pull request as ready for review November 2, 2024 16:30
@urschrei
Copy link
Member Author

urschrei commented Nov 2, 2024

For follow-on work, a preliminary parallel version was proposed in georust/rstar#80 (comment):

fn parallel_bottom_up_fold_reduce<T, S, I, F, R>(tree: &RTree<T>, init: I, fold: F, reduce: R) -> S
where
    T: RTreeObject,
    RTreeNode<T>: Send + Sync,
    S: Send,
    I: Fn() -> S + Send + Sync,
    F: Fn(S, &T) -> S + Send + Sync,
    R: Fn(S, S) -> S + Send + Sync,
{
    fn inner<T, S, I, F, R>(parent: &ParentNode<T>, init: &I, fold: &F, reduce: &R) -> S
    where
        T: RTreeObject,
        RTreeNode<T>: Send + Sync,
        S: Send,
        I: Fn() -> S + Send + Sync,
        F: Fn(S, &T) -> S + Send + Sync,
        R: Fn(S, S) -> S + Send + Sync,
    {
        parent
            .children()
            .into_par_iter()
            .fold(init, |accum, child| match child {
                RTreeNode::Leaf(value) => fold(accum, value),
                RTreeNode::Parent(parent) => {
                    let value = inner(parent, init, fold, reduce);

                    reduce(accum, value)
                }
            })
            .reduce(init, reduce)
    }

    inner(tree.root(), &init, &fold, &reduce)
}

@michaelkirk
Copy link
Member

type Tetris

The type tetris isn't too bad. But concerningly the test doesn't pass for multipolygon.

I'm not sure if this represents a bug in BooleanOps or in UnaryUnion (or something else): 39a2f62

I'm OK without merging support for MultiPolygons at present, but wanted to highlight this behavior in case it's meaningful to you.

@urschrei
Copy link
Member Author

urschrei commented Nov 2, 2024

type Tetris

The type tetris isn't too bad. But concerningly the test doesn't pass for multipolygon.

I'm not sure if this represents a bug in BooleanOps or in UnaryUnion (or something else): 39a2f62

I'm OK without merging support for MultiPolygons at present, but wanted to highlight this behavior in case it's meaningful to you.

Oh nice job, I burned a couple of hours on these today. The test result should be the same AFAIK. What's the actual output geometry??

@adamreichold
Copy link
Member

Some preliminary perf data:

One thing I wonder about the performance is whether the [CachedEnvelope] combinator can help with the R tree construction due the relatively expensive envelope computation?

@urschrei
Copy link
Member Author

urschrei commented Nov 3, 2024

type Tetris

The type tetris isn't too bad. But concerningly the test doesn't pass for multipolygon.
I'm not sure if this represents a bug in BooleanOps or in UnaryUnion (or something else): 39a2f62
I'm OK without merging support for MultiPolygons at present, but wanted to highlight this behavior in case it's meaningful to you.

Oh nice job, I burned a couple of hours on these today. The test result should be the same AFAIK. What's the actual output geometry??

The correct output should be:

Polygon { exterior:
    LineString([
        Coord { x: 200.38308698623, y: 288.3387931645567 },
        Coord { x: 204.07584924189865, y: 288.2701221168692 },
        Coord { x: 210.99999999939024, y: 291.99999999857494 },
        Coord { x: 209.99999999939024, y: 289.99999999857494 },
        Coord { x: 212.24082540659725, y: 285.47846008694717 },
        Coord { x: 205.630617245422, y: 287.73853614038774 },
        Coord { x: 204.00000000684082, y: 287.0000000060255 },
        Coord { x: 200.38308698623, y: 288.3387931645567 }]), interiors: [] }]

But what we get is:

[Polygon { exterior: 
    LineString([
        Coord { x: 200.38308698977124, y: 288.3387931571061 },
        Coord { x: 204.07584924543988, y: 288.2701221094186 },
        Coord { x: 205.63061724896323, y: 287.73853613293716 },
        Coord { x: 204.00000001038205, y: 287.0000000060255 },
        Coord { x: 200.38308698977124, y: 288.3387931571061 }]), interiors: [] },
Polygon { exterior: LineString([
    Coord { x: 204.0758492379893, y: 288.2701221168692 },
    Coord { x: 210.9999999954809, y: 291.99999999857494 },
    Coord { x: 209.9999999954809, y: 289.99999999857494 },
    Coord { x: 212.24082541013848, y: 285.47846008694717 },
    Coord { x: 212.24082539523732, y: 285.47846009439775 },
    Coord { x: 212.2408254026879, y: 285.47846008694717 },
    Coord { x: 205.63061724896323, y: 287.73853613293716 },
    Coord { x: 205.6306172564138, y: 287.73853614038774 },
    Coord { x: 204.07584926034104, y: 288.2701221094186 },
    Coord { x: 204.07584924543988, y: 288.2701221094186 },
    Coord { x: 204.0758492379893, y: 288.2701221168692 }]), interiors: [] }
];

That's not any kind of correct. Hm.

@urschrei
Copy link
Member Author

urschrei commented Nov 3, 2024

type Tetris

The type tetris isn't too bad. But concerningly the test doesn't pass for multipolygon.
I'm not sure if this represents a bug in BooleanOps or in UnaryUnion (or something else): 39a2f62
I'm OK without merging support for MultiPolygons at present, but wanted to highlight this behavior in case it's meaningful to you.

Oh nice job, I burned a couple of hours on these today. The test result should be the same AFAIK. What's the actual output geometry??

The correct output should be:

Polygon { exterior:
    LineString([
        Coord { x: 200.38308698623, y: 288.3387931645567 },
        Coord { x: 204.07584924189865, y: 288.2701221168692 },
        Coord { x: 210.99999999939024, y: 291.99999999857494 },
        Coord { x: 209.99999999939024, y: 289.99999999857494 },
        Coord { x: 212.24082540659725, y: 285.47846008694717 },
        Coord { x: 205.630617245422, y: 287.73853614038774 },
        Coord { x: 204.00000000684082, y: 287.0000000060255 },
        Coord { x: 200.38308698623, y: 288.3387931645567 }]), interiors: [] }]

But what we get is:

[Polygon { exterior: 
    LineString([
        Coord { x: 200.38308698977124, y: 288.3387931571061 },
        Coord { x: 204.07584924543988, y: 288.2701221094186 },
        Coord { x: 205.63061724896323, y: 287.73853613293716 },
        Coord { x: 204.00000001038205, y: 287.0000000060255 },
        Coord { x: 200.38308698977124, y: 288.3387931571061 }]), interiors: [] },
Polygon { exterior: LineString([
    Coord { x: 204.0758492379893, y: 288.2701221168692 },
    Coord { x: 210.9999999954809, y: 291.99999999857494 },
    Coord { x: 209.9999999954809, y: 289.99999999857494 },
    Coord { x: 212.24082541013848, y: 285.47846008694717 },
    Coord { x: 212.24082539523732, y: 285.47846009439775 },
    Coord { x: 212.2408254026879, y: 285.47846008694717 },
    Coord { x: 205.63061724896323, y: 287.73853613293716 },
    Coord { x: 205.6306172564138, y: 287.73853614038774 },
    Coord { x: 204.07584926034104, y: 288.2701221094186 },
    Coord { x: 204.07584924543988, y: 288.2701221094186 },
    Coord { x: 204.0758492379893, y: 288.2701221168692 }]), interiors: [] }
];

That's not any kind of correct. Hm.

I think this is a bug in the MultiPolygon union algorithm (not clear if it stems from i_overlay):

Using the test polygons, you should get the same result if you do:

let empty = MultiPolygon::<f64>::new(vec![]);
// multi_poly_1 is [poly1, poly2]
let empty_union = empty.union(&multi_poly_1);

// and
let mp1 = MultiPolygon::new(vec![poly1.clone()]);
let mp2 = MultiPolygon::new(vec![poly2.clone()]);
let mpu = &mp1.union(&mp2);

…but you don't. That's a bug unless I'm missing a detail about how MultiPolygon union is supposed to work.

@michaelkirk
Copy link
Member

michaelkirk commented Nov 3, 2024

I'm not sure if it's relevant, but it might be interesting that MultiPolygon(vec![poly1, poly2]) is not a valid multipolygon

Screenshot 2024-11-02 at 21 20 58
MULTIPOLYGON(
  ((204 287,206.69670020700084 288.2213844497616,200.38308697914755 288.338793163584,204 287)),
  ((210 290,204.07584923592933 288.2701221108328,212.24082541367974 285.47846008552216,210 290))
)

It'd be nice to be able to handle this case - but it might be a hint towards resolving it.

@michaelkirk
Copy link
Member

michaelkirk commented Nov 3, 2024

I was able to get the tests passing by making the multipolygon valid: 5900563

Note the usage of assert_relative_eq, some of the output points are slightly shifted (1e-10), but I think that's expected with the grid snapping in i_overlay.

@urschrei urschrei mentioned this pull request Nov 3, 2024
2 tasks
@urschrei
Copy link
Member Author

urschrei commented Nov 3, 2024

This will pass when #1254 merges.

I was able to get the tests passing by making the multipolygon valid: 5900563

Note the usage of assert_relative_eq, some of the output points are slightly shifted (1e-10), but I think that's expected with the grid snapping in i_overlay.

Urgh I should have checked validity before anything else. Sorry.

@urschrei
Copy link
Member Author

urschrei commented Nov 7, 2024

(I've snuck in some minor edits to geo top-level docs too. Sorry, sorry)

Copy link
Member

@michaelkirk michaelkirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get this when building:

   |
12 |     primitives::CachedEnvelope, primitives::ObjectRef, ParentNode, RTree, RTreeNode, RTreeObject,
   |                                 ^^^^^^^^^^^^^^^^^^^^^ no `ObjectRef` in `primitives`

Do you need to bump the minimum version of rstar?

@urschrei
Copy link
Member Author

urschrei commented Nov 7, 2024

Sorry, done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants