Performance improvements #112

Tristramg · 2021-12-29T11:40:03Z

Relates to #111
This doesn’t really work: the clock time is arround the same. Copying the whole data is probably way to expensive

m0n0chr0m3 · 2021-12-29T14:08:17Z

Hi Tristram,

Thanks for your continued efforts to maintain this crate!

Your approach does appear to work here: the change properly parallelizes gtfs_reader::read_objs. It takes the runtime from ±55s to <40s on my machine. Profiling confirms that it does what it's supposed to:

During the time frame selected in the screenshot, gtfs_reader::read_objs is executing and is utilizing all cores.

Consider now this screenshot, where the next part of the runtime is selected:

During this time frame, execution is finishing up gtfs_reader::read_objs. (Not visible in the screenshot, but) Here the most time is spent in csv::**_record::trim methods and in calls to free. It might be possible to speed things up here, but I don't see any low-hanging fruit.

Finally, consider this screenshot of the last part of execution:

During this time frame, execution has progressed past the point where the 'raw' structure is constructed (and thus when gtfs_reader::read_objs executes at all). Essentially, the latter half of RawGtfs::new(gtfs).and_then(Gtfs::try_from) is now executing. Speeding up gtfs_reader::read_objs won't help at all. Perhaps you know a way to improve Gtfs::try_from?

It is the CSV library that does the trimming

Tristramg · 2021-12-30T15:49:18Z

Out of curiosity, what are you using for your profiling/visualizing?

Thank you for your feedback and your analysis, that is quite right. I made more serious measurements on a server instead of my laptop to avoid interference from other apps. I get consistent performance improvements as you noticed

I still managed to improve 3-4% better performances rewriting parse_time (a bit ugly and I sidetracked a lot on that). I will now focus on try_from. It seems that the problem comes from the building the trips. Maybe using binary trees instead of hash maps can help.

Reading duration in ms, 10 runs every time

IDFM 69Mb
* better parsing 9632 9633 9664 9627 9687 9608 9645 9855 9689 9714
* parallel 9958 9922 10001 9930 9985 9998 9993 10023 9999 9977
* original 13166 13262 13209 13083 13136 13079 13166 13170 13134 13124

NL 225Mb
* better parsing 18949 18771 18770 18838 18746 18900 18789 18782 18688 18812
* parallel 19589 19636 19574 19526 19670 19526 19558 19573 19717 19595
* original 24450 24651 24493 24972 24562 24536 24433 24436 24573 24599

PID 33Mb
* better parsing 3293 3268 3278 3284 3308 3287 3258 3267 3293 3274
* parallel 3371 3378 3386 3372 3367 3361 3347 3360 3377 3386
* original 4213 4226 4195 4186 4203 4190 4214 4182 4215 4201

Tristramg · 2021-12-30T17:32:00Z

For having a trace: following stuff didn’t work out

using BTreeMap instead of HashMap
using AHashMap
Using group_by to iterate over the stops to avoid re-looking up for every Trip

m0n0chr0m3 · 2021-12-30T19:54:00Z

Out of curiosity, what are you using for your profiling/visualizing?

Out of habit, I'm using the Instruments.app profiler that comes with XCode on Mac OS. In essence a GUI for DTrace.

Looking at it a bit more (with higher-frequency profiling instead of the default time interval), I think 'trimming' in general is a huge cost in the current version. Switching .trim(csv::Trim::Fields) to .trim(csv::Trim::None) in the body of read_objs does indeed bring a significant performance increase... albeit of course at the cost of different behavior.

Still, since it makes that much of a difference in the runtime I checked how the csv crate implements trimming and it turns out there is a real opportunity for imprevement there: currently trimming a ByteRecord always allocates a new ByteRecord, even if there is nothing to trim (and trimming probably wouldn't require reallocating either). A PR to amend this problem is stuck on what appears to be a spurious CI failure in the csv crate's repo.

Tristramg · 2021-12-31T09:15:33Z

Nice digging!
I tried indeed with csv = { git="https://github.com/do4gr/rust-csv.git", branch="non_allocating_trim" } and I gain about 20% on the GTFS from netherlands (from 18.8 down to 15.5 seconds).

I left a message on the PR, maybe that will get things going.
I tried to progress on create_trips but I can’t find a way to make it quicker. It’s quite frustrating 😞

m0n0chr0m3 · 2021-12-31T17:23:10Z

As an alternative approach (borderline large enough to be a PR instead of a comment):

If you replace the following lines in read_objs:

    let recs  = reader
        .records()
        .map(|rec| {
            rec.map_err(|e| Error::CSVError {
                file_name: file_name.to_owned(),
                source: e,
                line_in_error: None,
            })
        })
        .collect::<Result<Vec<_>, _>>()?;
    recs.par_iter()
        .map(|rec| {
            rec.deserialize(Some(&headers)).map_err(|e| Error::CSVError {
                file_name: file_name.to_owned(),
                source: e,
                line_in_error: Some(crate::error::LineError {
                    headers: headers.into_iter().map(|s| s.to_owned()).collect(),
                    values: rec.into_iter().map(|s| s.to_owned()).collect(),
                }),
            })
        })
        .collect()

by the following:

    // Pre-allocate a StringRecord
    let mut rec = csv::StringRecord::new();
    let mut objs = Vec::new();

    // Read each record into the pre-allocated StringRecord one at a time
    while reader.read_record(&mut rec)
        .map_err(|e| Error::CSVError {
            file_name: file_name.to_owned(),
            source: e,
            line_in_error: None,
        })?
    {
        let obj = rec.deserialize(Some(&headers)).map_err(|e| Error::CSVError {
            file_name: file_name.to_owned(),
            source: e,
            line_in_error: Some(crate::error::LineError {
                headers: headers.into_iter().map(|s| s.to_owned()).collect(),
                values: rec.into_iter().map(|s| s.to_owned()).collect(),
            }),
        })?;
        objs.push(obj);
    }
    Ok(objs)

then you

are no longer using parallelism, thus
no longer need the dependency on rayon, but
are about as fast as the parallel version (on my machine), but
by using a pre-allocated csv::StringRecord make it slightly harder to add parallelism again at a later point.

I also feel the frustration on the create_trips side of things. Lots going on there. And your current design seems solid.

This allows a small performance improvement (~1%) and should require less memory as most Id are commonly less than 22 chars long

Tristramg · 2022-01-01T21:03:29Z

Nice, however in my case I observed that the performance is indeed better than the original, but not as good as with rayon. So I might prefer to stick with rayon, even if pulling a new dependency for a small change is probably exagerated.

Speaking of new dependencies, I toyed with Smol-str and smartstrings, as Id are generally quite small. There is a very small improvement (~1%). Memory usage should also be down, but I haven’t measured it yet

Tristramg · 2022-01-02T10:12:46Z

@antoine-de would you like to have a look at it?

m0n0chr0m3 · 2022-01-02T12:45:01Z

the performance is indeed better than the original, but not as good as with rayon.

Okay. Do you have the infrastructure set up to compare memory footprint? I would expect that the parallel version (which at some point has all StringRecords in memory at the same time) has a significantly larger memory footprint than the sequential version that only has one StringRecord at a time. Of course, that memory use isn't going to be orders of magnitude larger than the memory use of the RawGtfs structures that are built from the records, so perhaps it doesn't matter in practice.

Optimizing for small strings might be a good idea too. Won't that change be visible in the public API, though?

Tristramg · 2022-01-03T14:56:14Z

You are right. I got a bit lost in the rabbithole trying to track any miminal performance improvement.

I started from a clean sheet, using your suggestion and being a bit less agressive on #114 , without new dependency nor allocation. It’s already better.

I added an option to be able to ignore the trimming at all when you are confident with your data.

Completely different topic: do you want to be added as a maintainer? We are only two working on it, and latetly without being paid for it. So an extra set of eyes can be helpful.

Tristramg · 2022-02-18T09:26:57Z

Closing as it was partially solved in the other PR

Tristramg added 2 commits December 29, 2021 10:51

Make a loop more rusty

330f1c7

Try to use rayon to speed up the parsing

2c5f163

Tristramg added 2 commits December 30, 2021 13:14

No need to trim in serde_helpers

ceeb9e2

It is the CSV library that does the trimming

Use nom to parse datetime

177b576

Remove useless &mut

bb33dcc

Use SmolStr for id

b6bb770

This allows a small performance improvement (~1%) and should require less memory as most Id are commonly less than 22 chars long

Tristramg changed the title ~~Draft: Parallel parse~~ Parallel parse Jan 2, 2022

Tristramg changed the title ~~Parallel parse~~ Performance improvements Jan 2, 2022

m0n0chr0m3 mentioned this pull request Jan 2, 2022

Parallel processing #111

Closed

Tristramg closed this Feb 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvements #112

Performance improvements #112

Tristramg commented Dec 29, 2021

m0n0chr0m3 commented Dec 29, 2021

Tristramg commented Dec 30, 2021

Tristramg commented Dec 30, 2021

m0n0chr0m3 commented Dec 30, 2021 •

edited

Loading

Tristramg commented Dec 31, 2021

m0n0chr0m3 commented Dec 31, 2021

Tristramg commented Jan 1, 2022

Tristramg commented Jan 2, 2022

m0n0chr0m3 commented Jan 2, 2022

Tristramg commented Jan 3, 2022 •

edited

Loading

Tristramg commented Feb 18, 2022

Performance improvements #112

Performance improvements #112

Conversation

Tristramg commented Dec 29, 2021

m0n0chr0m3 commented Dec 29, 2021

Tristramg commented Dec 30, 2021

Tristramg commented Dec 30, 2021

m0n0chr0m3 commented Dec 30, 2021 • edited Loading

Tristramg commented Dec 31, 2021

m0n0chr0m3 commented Dec 31, 2021

Tristramg commented Jan 1, 2022

Tristramg commented Jan 2, 2022

m0n0chr0m3 commented Jan 2, 2022

Tristramg commented Jan 3, 2022 • edited Loading

Tristramg commented Feb 18, 2022

m0n0chr0m3 commented Dec 30, 2021 •

edited

Loading

Tristramg commented Jan 3, 2022 •

edited

Loading