Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel formatting to increase speed #6095

Open
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

MarcusGrass
Copy link
Contributor

@MarcusGrass MarcusGrass commented Feb 26, 2024

Multithreaded parallelism

Closes #6091

This is non-trivial because when internals can run threaded, eagerly printing cannot happen, messages have
to be passed back through buffers. Even that would be non-trivial if there wasn't color-term writing happening
making things more complicated.

The change became pretty big, even though the only substantive change
is in print.rs and bin/main.rs, because the printer has to be propagated
deep into the library, about half of the changes are adding a function argument, and reformatting
caused by those lines becoming too long. I'm sorry about that.

Changes

Output facilities

In practice there are four printing facilities used:

  1. Regular stdout, pretty easy, replace println with printing into a buffer.
  2. Regular stderr, same as above in all ways that matter.
  3. Term stdout, this happens in the diff-printing part of the code.
  4. Term stderr, this is done by rustc_error and the most complex to integrate.

Additionally, these four facilities can't be separated, since they have to preserve order between each other.

Rename StdoutEmitter

Confusing naming, it doesn't output into stdout, but into the
provided buffer.

Pros

This change brings a substantial speedup, especially for large projects, the speedup
becoming less and less but not none or negative for smaller projects.

If instant measures as the average reaction time of 250ms, this brings
down tokio from the not-instant ~260ms, to the instant 86ms.

Giving space for projects almost 3x tokio's size to have an "instant" formatting experience.

Results locally on tokio with diferent parallelism:

1: Around 260ms, within noise of the master branch.
2: Around 160ms, 39% less time spent than the master branch.
4: Around 110ms, 58% less time spent than the master branch.
32: Around 85ms, 67% less time spent than the master branch.

The dramatic effect with just 2 cores means that this change doesn't just have an effect if you've got a threadripper, I don't have the data, but I would imagine that most users have 2 or more cores and would benefit.

Drawbacks

There are some drawbacks to consider

Late printing memory increase

All messages that would have gone into stdout or stderr and been flushed eagerly now go
into an in-mem buffer. This causes a memory-overhead which scales with project size
(number of files). For a large project (went from 1700ms to about 500ms with this change in cargo fmt --all -time), this is a difference of 117M on the master version, and 170M on this version (~+45%).

Rustc termcolor vendoring

Another problem is the way that rustc vendors termcolor. It vendors some types but no printing facilities,
and the vendored code isn't interoperable with the published termcolor library, which means that
either the printing facilities need to be reinvented (no), or some compatibility-code needs to be introduced (yes).
Hopefully there's some better solution to this, but it's liveable at the moment see the rustc_compat* functions in print.rs.

Accidentally messing with outputs will cause an output jumble

This change would make any direct interaction with stdout and stderr in the
formatting codepath a bug. Printing deep inside a library is usually inadvisable, but now that would be upgraded to a bug.

Some testing

Apart from the usual tests, I wanted to make sure that the rustc_errors printing is the same, pretty easy, just remove a semicolon:
image
And diffing:
image
Although there are tests for those already.

Lastly I did a test modifying a bunch of different files in a project and then ran cargo fmt --all --check,
the order within-file is preserved, but the order in which diffs are printed is "random".

It's intended, and the reason for that is because output is printed by file in completion-order. It could await the join handles in order, and get the same order as the singe-threaded implementation, but then if there are more files than available parallelism, full thread utilization won't happen, which in practice measurably impacts performance on large projects.

It's possible to do a half-measure and wait in handle-order for the remaining files after all tasks have been scheduled, this would preserve ordering in projects with fewer files than available parallelism, but that doesn't seem very impactful to me, since available parallelism likely differs a lot.

@MarcusGrass MarcusGrass changed the title Concurrent Parallel formatting to increase speed Feb 26, 2024
@ytmimi
Copy link
Contributor

ytmimi commented Feb 26, 2024

Thanks for the PR. Going to put this on hold until the team has had a chance to discuss this change #6091 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement concurrent formatting
3 participants