-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvements #112
Conversation
It is the CSV library that does the trimming
Out of curiosity, what are you using for your profiling/visualizing? Thank you for your feedback and your analysis, that is quite right. I made more serious measurements on a server instead of my laptop to avoid interference from other apps. I get consistent performance improvements as you noticed I still managed to improve 3-4% better performances rewriting
|
For having a trace: following stuff didn’t work out
|
Out of habit, I'm using the Instruments.app profiler that comes with XCode on Mac OS. In essence a GUI for DTrace. Looking at it a bit more (with higher-frequency profiling instead of the default time interval), I think 'trimming' in general is a huge cost in the current version. Switching Still, since it makes that much of a difference in the runtime I checked how the |
Nice digging! I left a message on the PR, maybe that will get things going. |
As an alternative approach (borderline large enough to be a PR instead of a comment): If you replace the following lines in let recs = reader
.records()
.map(|rec| {
rec.map_err(|e| Error::CSVError {
file_name: file_name.to_owned(),
source: e,
line_in_error: None,
})
})
.collect::<Result<Vec<_>, _>>()?;
recs.par_iter()
.map(|rec| {
rec.deserialize(Some(&headers)).map_err(|e| Error::CSVError {
file_name: file_name.to_owned(),
source: e,
line_in_error: Some(crate::error::LineError {
headers: headers.into_iter().map(|s| s.to_owned()).collect(),
values: rec.into_iter().map(|s| s.to_owned()).collect(),
}),
})
})
.collect() by the following: // Pre-allocate a StringRecord
let mut rec = csv::StringRecord::new();
let mut objs = Vec::new();
// Read each record into the pre-allocated StringRecord one at a time
while reader.read_record(&mut rec)
.map_err(|e| Error::CSVError {
file_name: file_name.to_owned(),
source: e,
line_in_error: None,
})?
{
let obj = rec.deserialize(Some(&headers)).map_err(|e| Error::CSVError {
file_name: file_name.to_owned(),
source: e,
line_in_error: Some(crate::error::LineError {
headers: headers.into_iter().map(|s| s.to_owned()).collect(),
values: rec.into_iter().map(|s| s.to_owned()).collect(),
}),
})?;
objs.push(obj);
}
Ok(objs) then you
I also feel the frustration on the |
This allows a small performance improvement (~1%) and should require less memory as most Id are commonly less than 22 chars long
Nice, however in my case I observed that the performance is indeed better than the original, but not as good as with rayon. So I might prefer to stick with rayon, even if pulling a new dependency for a small change is probably exagerated. Speaking of new dependencies, I toyed with |
@antoine-de would you like to have a look at it? |
Okay. Do you have the infrastructure set up to compare memory footprint? I would expect that the parallel version (which at some point has all Optimizing for small strings might be a good idea too. Won't that change be visible in the public API, though? |
You are right. I got a bit lost in the rabbithole trying to track any miminal performance improvement. I started from a clean sheet, using your suggestion and being a bit less agressive on #114 , without new dependency nor allocation. It’s already better. I added an option to be able to ignore the trimming at all when you are confident with your data. Completely different topic: do you want to be added as a maintainer? We are only two working on it, and latetly without being paid for it. So an extra set of eyes can be helpful. |
Closing as it was partially solved in the other PR |
Relates to #111
This doesn’t really work: the clock time is arround the same. Copying the whole data is probably way to expensive