Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement geo-traits for writing to WKT & perf improvement #124

Draft
wants to merge 7 commits into
base: kyle/geo-traits
Choose a base branch
from

Conversation

kylebarron
Copy link
Member

@kylebarron kylebarron commented Oct 26, 2024

  • I agree to follow the project's code of conduct.
  • I added an entry to CHANGES.md if knowledge of this change could be valuable to users.
  • I ran cargo fmt

Changes:

This slightly changes the implementation for writing to WKT, partially based on @b4l's PR here geoarrow/geoarrow-rs#788.

While porting the code to use traits, I realized that there was an awful lot of string concatenation being performed. So for example, when writing a Polygon to WKT, every single coordinate would be allocated to a separate string, and then the coords of each ring would be concatenated and then all rings would be concatenated.

wkt/src/types/polygon.rs

Lines 42 to 52 in e1d838d

let strings = self
.0
.iter()
.map(|l| {
l.0.iter()
.map(|c| format!("{} {}", c.x, c.y))
.collect::<Vec<_>>()
.join(",")
})
.collect::<Vec<_>>()
.join("),(");

This instead writes coordinates directly to the output writer. So perhaps the performance differences in the discussion here #118 were not primarily due to ryu.

This PR also implements writing for geo-traits objects, and delegates the std::fmt::Display implementation to the underlying geo-traits impl.

Benchmark against main:

> cargo bench --bench write
Gnuplot not found, using plotters backend
to_string small wkt     time:   [719.98 ns 723.19 ns 726.51 ns]
                        change: [-33.175% -32.566% -32.039%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  5 (5.00%) low severe
  3 (3.00%) low mild
  2 (2.00%) high mild
  7 (7.00%) high severe

to_string big wkt       time:   [2.3337 ms 2.3425 ms 2.3524 ms]
                        change: [-44.006% -43.647% -43.297%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

geo: serialize small wkt string
                        time:   [818.73 ns 826.57 ns 836.32 ns]
                        change: [-44.024% -37.443% -31.192%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) high mild
  2 (2.00%) high severe

geo: serialize big wkt string
                        time:   [2.4689 ms 2.5142 ms 2.5899 ms]
                        change: [-45.157% -42.370% -39.852%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe

geo: write small wkt    time:   [815.24 ns 821.34 ns 831.63 ns]
                        change: [-30.751% -28.806% -27.455%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

geo: write big wkt      time:   [2.4814 ms 2.4877 ms 2.4941 ms]
                        change: [-49.589% -45.273% -42.171%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 23 outliers among 100 measurements (23.00%)
  14 (14.00%) low severe
  4 (4.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

I also added another bench for writing the geo::Geometry object directly instead of first converting it to a wkt::Wkt. It's not 100% apples to oranges because I can't use std::io::sink() for a std::fmt::Write trait input. But it's still faster 🤷‍♂️

geo: write small wkt    time:   [809.53 ns 812.46 ns 815.71 ns]
                        change: [-1.9040% -1.1064% -0.5136%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low severe
  4 (4.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe

geo: write big wkt      time:   [2.5784 ms 2.6684 ms 2.8063 ms]
                        change: [+3.6982% +7.2652% +13.562%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
  3 (3.00%) high mild
  11 (11.00%) high severe

geo: write small wkt using trait
                        time:   [689.63 ns 692.96 ns 696.94 ns]
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe

geo: write big wkt using trait
                        time:   [2.3611 ms 2.3872 ms 2.4283 ms]
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) high mild
  6 (6.00%) high severe

@kylebarron
Copy link
Member Author

All existing tests pass; just a question of whether geo-traits should be excluded from default features because it depends on geo-types, which is excluded.

Copy link
Member

@michaelkirk michaelkirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, can you address the todo! before marking a PR as ready for review? I don't want to accidentally miss one and then merge code that willfully explodes on the happy path. It seems like a good way to get users to not trust you.

@kylebarron
Copy link
Member Author

Well I consider the todo (at least for this pr) "I don't know the right way to handle this, I'm open to advice"

Is there a better way to signify this?

@michaelkirk
Copy link
Member

Draft PRs are a great way to get line by line feedback and signify "please don't merge this" as is.

I'll also often leave comments on my own PR with specific questions and relevant context at the spot I'm seeking help.

@urschrei
Copy link
Member

Another alternative / complementary approach is to deny todo! in Clippy so a merge becomes gated on no todos

@kylebarron
Copy link
Member Author

Draft PRs are a great way to get line by line feedback and signify "please don't merge this" as is.

Ok! I've found different expectations in different communities on github of whether draft means "don't look at this yet" or "I'm soliciting feedback on things, but it's not yet ready for merge"

@kylebarron kylebarron marked this pull request as draft October 29, 2024 18:51
@kylebarron
Copy link
Member Author

Here the remaining questions are:

  • Should geo-traits not be a required dependency, because it depends on geo-types?
  • How should we handle input geometries with unknown dimension? Should we default 3D to XYZ and document it? Should we have a user parameter for which dimension to interpret unknown 3D geometries as?

I should be able to fix the rect and triangle issues separately

@kylebarron
Copy link
Member Author

Other questions:

  • Should we change the existing geo_types_to_wkt behavior? That clones geo-types objects into wkt objects, and then writes those. Could we deprecate that entire module?
  • This is a PR into the other branch. Any opinions on whether to review and merge that PR first?

let max_coord = rect.max();

// Note: Even if the rect has more than 2 dimensions, we omit the other dimensions when
// converting to a Polygon.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we do this for Rect's?

Instead can it look more like the triangle impl which avoids this loss of dims and heap allocation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Triangle and Line store all coordinates, just with the simplifying case that they only have a specific number of points. Rect doesn't do that, or else Rect would store 4 coordinates, not 2. We need to do some sort of conversion from the min and max coordinates to a continuous ring.

It's obvious how to do this for 2D input, but less clear how to do this for 3D or 4D. I'd suggest that we should error for 3D/4D, as the user can choose how to transform a 3D or 4D Rect.

Or perhaps we could even change the RectTrait so that it doesn't have the same dimensionality as the others. Because Rect means "axis aligned bounding box" and when you have 3 or 4 axes, it's less clear how that maps to simple-features.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh geeze, yeah that's a mess.

Given that we're only handling 2d, can we at least avoid the vec allocation in that case?

As for erroring on 3-d/4-d, that seems reasonable, assuming you mean a runtime error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can avoid the vec allocation, just makes the code slightly more complicated.

Also, what should we do with the error? Currently it returns std::fmt::Error and there's no error type in this crate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to propose something.

}
}

pub fn point_to_wkt<T: CoordNum + WktNum + fmt::Display, G: PointTrait<T = T>, W: Write>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming nit for all the foo_to_wkt methods: Idiomatically, I'd expect point_to_wkt to do something like From::from(point) -> Wkt.

But this doesn't convert to Wkt. Internally it conceptually converts to something like a String of wkt, but we don't return that either.

A closer analogy might be the Format::write method or write macro, implying something like:

write_wkt_point(&mut writer, &my_point)?;
write_wkt_polygon(&mut writer, &my_polygon)?;

In fact, I think we can exclude wkt from the method names since it's redundant with the crate.

write_point(&mut writer, &my_point)?;
write_polygon(&mut writer, &my_polygon)?;

If calling users want to be explicit about the fact that it's wkt at the callsite, they could:

wkt::write_point(&mut writer, &my_point)?;
wkt::write_polygon(&mut writer, &my_point)?;

Note: this would entail re-exporting all these methods at the top-level of the crate.

Additionally, taking advantage of georust/geo#1157 (comment) maybe there should be a plain old:

fn write(&mut writer, any_geometry: impl GeometryTrait) {
   write_geometry(writer, any_geometry)
}

So the caller can do:

wkt::write(&mut writer, &my_point);
wkt::write(&mut writer, &my_polygon);

I think that'd work, right? It'd be slightly less performant if you already know the type, but probably negligible for most, and nicer to write.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a good point. I had the same realization when cleaning up my wkb crate: kylebarron/wkb#4

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I shortened names but didn't shorten write_geometry to write. We should be able to just pub use write_geometry as write?

@@ -31,22 +30,6 @@ where
pub m: Option<T>,
}

impl<T> fmt::Display for Coord<T>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though it seems unlikely anyone is using it, can we leave it as it was and deprecate it first, for sake of avoiding a breaking change?

Copy link
Member Author

@kylebarron kylebarron Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

is it possible to deprecate a trait impl? Seems not.

https://users.rust-lang.org/t/deprecate-an-implementation/90350

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Darn. Sorry for the noise!

Copy link
Member Author

@kylebarron kylebarron Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well we could just leave this trait impl in even though it's no longer used 🤷‍♂️

@@ -1,5 +1,7 @@
use crate::{Wkt, WktNum};

pub mod geo_trait_impl;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love the name of this module. Especially if it's becoming the de-facto writer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case you're copying this naming convention from the geojson crate pull request, I went with this name knowing that it wouldn't be exposed publicly. I didn't put a lot of thought into it so feel free to stray from it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this really depends on georust/geo#1241 and/or requiring a geo-types dependency here. With this PR all writing goes through the trait impl, but the traits currently depend on geo-types.

In that case, we could make geo_trait_impl private and re-export the functions through wkt::to_wkt::*

Copy link
Member Author

@kylebarron kylebarron Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants