Switch to ryu for float to string #120

kylebarron · 2024-10-08T02:49:42Z

I agree to follow the project's code of conduct.
I added an entry to CHANGES.md if knowledge of this change could be valuable to users.
I ran cargo fmt

I'm not sure that this is using ryu for all geometry types, because I had to update all the point/linestring tests because ryu was printing out 1.0 instead of 1. But I didn't have to change the polygon tests:

wkt/src/types/polygon.rs

Line 157 in e1d838d

"POLYGON((0 0,20 40,40 0,0 0),(5 5,20 30,30 5,5 5))",

Closes #118

lnicola · 2024-10-08T06:56:11Z

I'm not convinced this is a good idea. From the rustc issue:

As I understand it, the fact that short numbers perform worse is an inherent aspect of the algorithm. Dtoa goes left to right and avoids generating too many noise digits, so the performance is better on short numbers as reflected in the table. Ryū generates an exact representation and then goes right to left removing noise digits, so the performance is better on the longest numbers.

The Ryū benchmarks all focus on a random distribution of numbers, and those have on average a lot of digits.

dtoa might be a better choice, then, given that real-world coordinates should have fewer fractional digits?

Also: https://xkcd.com/2170/.

michaelkirk · 2024-10-08T17:42:34Z

Since this is seemingly a straight forward performance proposal, we should be able to measure it. Can you include any benchmarks using plausible real world data that shows improvement?

Related: #121

Note: There were no measurable perf differences for me with the existing write benchmark. It's likely that the existing data we've included in the benchmarks is not representative of real world data, so I'd be happy to include different data in the benchmarks if it's useful to show the improvements here.

That said, if we have to contrive really crazy data to get a measurable perf improvement — e.g. if it's only an improvement for people providing gigabytes of molecular level GPS precision (as an extreme example), and a regression for everybody else, then obviously we shouldn't do it.

michaelkirk · 2024-10-08T17:47:07Z

src/geo_types_to_wkt.rs

@@ -42,11 +43,12 @@ where
 ///
 /// let point: Point<f64> = point!(x: 1., y: 2.);
 ///
-/// assert_eq!(point.wkt_string(), "POINT(1 2)");
+/// assert_eq!(point.wkt_string(), "POINT(1.0 2.0)");


I don't know how to feel about this change in behavior. On the one hand it makes our serialization less efficient which is bad, OTOH it conveys to the reader that this was serialized from floating point, so I guess that's useful.

In the interest of not leaving ambiguous comments, I'll say I prefer the old version, which serialized efficiently, but I'd be willing to let that small demerit go if there are substantial other wins.

That said, if there's a way to use faster float serialization while still maintaining the integer display for .0, I think that'd be best.

ryu doesn't have any options as far as I can tell. So if we switched to ryu I don't think we'd be able to serialized floats without the .0

We can still check if the string ends with .0 and strip that off.

lnicola · 2024-10-08T17:59:57Z

src/types/coord.rs

+                let mut buffer = ryu::Buffer::new();
+                let y = buffer.format(self.y);
+
+                write!(f, "{} {}", x, y)?;


Nit: I suspect a sequence of f.write_str calls would be a little faster here.

Possibly, it might muddy the waters on perf improvements though. I'd prefer to keep everything except ryu the same in this PR.

lnicola · 2024-10-08T18:00:39Z

There were no measurable perf differences for me with the existing write benchmark.

We could set up Criterion, I guess. And it might be worth comparing to dtoa, which has the same API.

michaelkirk · 2024-10-08T18:13:24Z

There were no measurable perf differences for me with the existing write benchmark.

We could set up Criterion, I guess. And it might be worth comparing to dtoa, which has the same API.

What do you mean? We're already using criterion AFAICT.

lnicola · 2024-10-08T18:22:53Z

We're already using criterion AFAICT

Not my finest day 🤦.

Switch to ryu

80f0812

kylebarron requested a review from michaelkirk October 8, 2024 02:51

michaelkirk reviewed Oct 8, 2024

View reviewed changes

michaelkirk mentioned this pull request Oct 8, 2024

Implement reading WKT using wkt crate geoarrow/geoarrow-rs#790

Open

lnicola reviewed Oct 8, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to ryu for float to string #120

Switch to ryu for float to string #120

kylebarron commented Oct 8, 2024 •

edited

Loading

lnicola commented Oct 8, 2024

michaelkirk commented Oct 8, 2024

michaelkirk Oct 8, 2024

michaelkirk Oct 8, 2024

kylebarron Oct 8, 2024

lnicola Oct 8, 2024

lnicola Oct 8, 2024

michaelkirk Oct 8, 2024

lnicola commented Oct 8, 2024

michaelkirk commented Oct 8, 2024

lnicola commented Oct 8, 2024

Switch to ryu for float to string #120

Are you sure you want to change the base?

Switch to ryu for float to string #120

Conversation

kylebarron commented Oct 8, 2024 • edited Loading

lnicola commented Oct 8, 2024

michaelkirk commented Oct 8, 2024

michaelkirk Oct 8, 2024

Choose a reason for hiding this comment

michaelkirk Oct 8, 2024

Choose a reason for hiding this comment

kylebarron Oct 8, 2024

Choose a reason for hiding this comment

lnicola Oct 8, 2024

Choose a reason for hiding this comment

lnicola Oct 8, 2024

Choose a reason for hiding this comment

michaelkirk Oct 8, 2024

Choose a reason for hiding this comment

lnicola commented Oct 8, 2024

michaelkirk commented Oct 8, 2024

lnicola commented Oct 8, 2024

kylebarron commented Oct 8, 2024 •

edited

Loading