You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
FeatureIterator is intended to be an efficient way to iterate through individual Features in a FeatureCollection without ever having to parse the entire FeatureCollection into memory at once.
To do so, it seeks through the input stream until it finds the features array, and then starts parsing the individual features one by one.
Currently however, the way we seek to the "start of the features" array is looking for any occurrence of [. This would fail with a document like:
206: stream reading features r=frewsxcv a=michaelkirk
- [x] I agree to follow the project's [code of conduct](https://github.com/georust/geo/blob/master/CODE_OF_CONDUCT.md).
- [x] I added an entry to `CHANGES.md` if knowledge of this change could be valuable to users.
---
This greatly reduces the peak memory usage needed to parse a GeoJSON FeatureCollection by utilizing the FeatureIterator's trick of seeking to the `"features": [` array without actually parsing the FeatureCollection. Note there are [some robustness issues](#207) with this approach, some of which can be improved, but overall I think it's worth the trade-off.
There is a small (0-1%) CPU regression in our deserialization bench due to this change, but the peak memory profile of the `stream_reader_writer` examples decreased 90% - from 2.22MiB to 237Kib.
Before:
<img width="1624" alt="Screen Shot 2022-09-02 at 2 42 04 PM" src="https://user-images.githubusercontent.com/217057/188239657-ade76677-f3e2-4750-b71d-bed0effbe215.png">
After:
<img width="1624" alt="Screen Shot 2022-09-02 at 2 40 29 PM" src="https://user-images.githubusercontent.com/217057/188239645-5274e0ff-71b2-4da6-8ac9-1f72d288c2bb.png">
Notice that overall memory allocations stayed the same (see #88 (comment)), but the peak usage is much lower since we never hold the entire data structure in memory at once.
Co-authored-by: Michael Kirk <[email protected]>
FeatureIterator
is intended to be an efficient way to iterate through individual Features in a FeatureCollection without ever having to parse the entire FeatureCollection into memory at once.To do so, it seeks through the input stream until it finds the features array, and then starts parsing the individual features one by one.
Currently however, the way we seek to the "start of the features" array is looking for any occurrence of
[
. This would fail with a document like:or more complicated ones like this:
We should update FeatureIterator to be more robust in how it parses a FeatureCollection.
The text was updated successfully, but these errors were encountered: