-
Notifications
You must be signed in to change notification settings - Fork 708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory allocations in HttpFileSystembasedFindPackageByIdResource #6300
base: dev
Are you sure you want to change the base?
Conversation
src/NuGet.Core/NuGet.Protocol/RemoteRepositories/HttpFileSystemBasedFindPackageByIdResource.cs
Outdated
Show resolved
Hide resolved
src/NuGet.Core/NuGet.Protocol/RemoteRepositories/HttpFileSystemBasedFindPackageByIdResource.cs
Outdated
Show resolved
Hide resolved
src/NuGet.Core/NuGet.Protocol/RemoteRepositories/HttpFileSystemBasedFindPackageByIdResource.cs
Outdated
Show resolved
Hide resolved
I like this PR, it was a good find that the PackageInfo didn't really need to be in this cache. |
string idInLowerCase = id.ToLowerInvariant(); | ||
var baseUri = _baseUris[retry % _baseUris.Count].OriginalString; | ||
|
||
var builder = StringBuilderPool.Shared.Rent(256); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While talking to Jeff, he pointed out that the string isn't dynamic in any way (no if (something) builder.Add
), so we can use string.Concat
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still probably preferred over potentially removing items from the stringbuilder pool, but I think netfx will end up allocating the array on the heap to hold the individual strings.
src/NuGet.Core/NuGet.Protocol/RemoteRepositories/HttpFileSystemBasedFindPackageByIdResource.cs
Outdated
Show resolved
Hide resolved
|
||
return packageInfos.Keys; | ||
return packageVersions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since _packageVersionsCache
now contains HashSet
rather than SortedDictionary
, this means the public API GetAllVersionsAsync
may no longer be returning a sorted list of versions. So, changing this risks breaking behaviour.
However, FindPackageById
is only used by restore, within NuGet.Client. PM UI, and anything that cares about unlisted package status, does not use this API. Restore handles this list not being sorted, and all tests appear to be passing, so I think the biggest risk are apps using our NuGet.Protocol package.
private readonly IReadOnlyList<Uri> _baseUris; | ||
private string _chosenBaseUri; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this, I would have liked to make it round-robin load balance across all the base URIs that the server provided in their service index. But that would be a change in behavior which might risk breaking customers. So, we re-use the base uri that the version list was obtained from, just like the old code.
I'd love to hear about what measurements you ended up taking and how big of an improvement this is |
}; | ||
var baseUri = _chosenBaseUri ?? _baseUris[0].OriginalString; | ||
|
||
string contentUri = string.Concat( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't string.concat supposed to be faster than interpolation or format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was playing around on sharplab and they look pretty similar (at least on netfx) and although different on core, I'm not really sure that one is better than the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that interpolation would be more readable, but this seems like a hotpath and a reasonably long string, so my understanding is Concat is more performant.
For the longest time, I understood concat or +
for strings to be sloppy and bad performance. But a few years ago I learned that was somehow the opposite of reality!
Therefore, if my understanding is still up to date, I'd say leave it as is or compromise for readability with:
baseUri + idInLowerCase + "/" + normalizedVersionString + "/" + idInLowerCase ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with whatever is decided here, but I do think the perf of interpolated strings is better than you guys think:
A simple benchmark comparing String.Concat and interpolation actually indicates interpolation performance is about the same on net472 and actually better than String.Concat in net9.
net472
Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|
Concat | 150.5 us | 1.56 us | 1.46 us | 54.9316 | - | - | 282 KB |
ConcatIncludeLoopCounter | 233.9 us | 3.23 us | 3.03 us | 63.2324 | - | - | 325 KB |
Interpolated | 147.3 us | 0.87 us | 0.81 us | 54.9316 | - | - | 282 KB |
InterpolatedIncludeLoopCounter | 162.0 us | 1.14 us | 1.01 us | 56.3965 | - | - | 289 KB |
net9.0
Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|
Concat | 43.86 us | 0.642 us | 0.536 us | 28.2593 | - | - | 289 KB |
ConcatIncludeLoopCounter | 73.05 us | 1.286 us | 1.140 us | 44.1895 | - | - | 451 KB |
Interpolated | 38.67 us | 0.354 us | 0.331 us | 19.1040 | - | - | 195 KB |
InterpolatedIncludeLoopCounter | 42.38 us | 0.433 us | 0.384 us | 19.1040 | - | - | 195 KB |
From: #6265 (comment)
But keep in mind in the comment, from the other PR, I had the caveat that I amit I pretty much don't know what I'm doing. Here's a PerfView diff of the dev branch and this PR branch heap alloc view: Finally, I thought I'd try to use BenchmarkDotNet's MemoryAnalyzer. Here's a patch to the NuGet.Client repo with my benchmark code: However, when I run it, I get these results:
So, this benchmark results make it look like my change has more allocations and is slower than the original, which frankly doesn't make any sense to me whatsoever. What started as reviewing another PR turned into trying to implement further improvements, and while code changes took probably less than an hour, I've now been working probably 2 full days on collecting perf measurements. I was behind my assigned work for this sprint before I started any of this, so I just can't justify any more effort. If this PR is "acceptable" as-is, or with minor changes, I can do that. But I don't have the capacity to investigate any further why these perf measurements are so out of whack. |
I'm very happy with this PR (for what that's worth). I definitely wasn't trying to steal a bunch of anyone on the nuget's team time (which is why I made the original PR myself). I went ahead and did some measurements based on this branch, and will share below. My measurements were based on the following steps: "git clean -xdff" in roslyn repo, delete the global nuget cache, open VS, start profiler, and open roslyn.sln, wait for the status bar to indicate everything is done, stop profiler. I took two profiles each with and without this change and averaged the allocations/times. Filtering under FindPackagesByIdAsync, this change showed an allocation improve from 155 MB to 25 MB and CPU samples improve from 21000 ms to 3850 ms. In reply to: 2705152207 |
{ | ||
var doc = await stream.AsJObjectAsync(token); | ||
var json = await JsonSerializer.DeserializeAsync<FlatContainerVersionList>(stream, cancellationToken: token); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏
While falling asleep last night, I realized that my benchmark was reusing a single Fixing the benchmark, I get different results:
I'm still very surprised the changed code is slower, wall clock time. But at least we're now correctly measuring the memory allocation improvements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, and I look forward to seeing how perf improves as a result of this PR!
Not sure if I feel reassured or worried about test coverage given that no tests needed to be changed here. I did suggest 1 string related test change.
}; | ||
var baseUri = _chosenBaseUri ?? _baseUris[0].OriginalString; | ||
|
||
string contentUri = string.Concat( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that interpolation would be more readable, but this seems like a hotpath and a reasonably long string, so my understanding is Concat is more performant.
For the longest time, I understood concat or +
for strings to be sloppy and bad performance. But a few years ago I learned that was somehow the opposite of reality!
Therefore, if my understanding is still up to date, I'd say leave it as is or compromise for readability with:
baseUri + idInLowerCase + "/" + normalizedVersionString + "/" + idInLowerCase ...
src/NuGet.Core/NuGet.Protocol/RemoteRepositories/HttpFileSystemBasedFindPackageByIdResource.cs
Show resolved
Hide resolved
src/NuGet.Core/NuGet.Protocol/RemoteRepositories/HttpFileSystemBasedFindPackageByIdResource.cs
Outdated
Show resolved
Hide resolved
src/NuGet.Core/NuGet.Protocol/RemoteRepositories/HttpFileSystemBasedFindPackageByIdResource.cs
Outdated
Show resolved
Hide resolved
{ | ||
var doc = await stream.AsJObjectAsync(token); | ||
var json = await JsonSerializer.DeserializeAsync<FlatContainerVersionList>(stream, cancellationToken: token); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏
src/NuGet.Core/NuGet.Protocol/RemoteRepositories/HttpFileSystemBasedFindPackageByIdResource.cs
Outdated
Show resolved
Hide resolved
f450122
to
2fd7771
Compare
versionString = null; | ||
return false; | ||
} | ||
versionString = version.ToNormalizedString().ToLowerInvariant(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, we can't remove the ToLowerInvariant()
, for the reason described here: #6300 (comment). Perhaps I need to improve the comment at the top of the NETSTANDARD block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure I'm being super dense. I definitely believe that the ToLowerInvariant can be needed, I just don't understand why the !NETSTANDARD block wouldn't need it then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure we're on the same page, I was suggesting that the NETSTANDARD case also use the value from the HashSet, not from the input version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless there's an API I'm not aware of, the only way to get the value out of the hashset on netstandard is to iterate the hashset as if it was a list and stop when the value is equal. HashSet<T>
on netstandard doesn't have a TryGetValue
API, like the comment above the code says.
But yes, if there is an easy/cheap way to get the value out of the hashset, then I'd have it work the same way as netframework and netcoreapp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, it was very stuck in my mind that there was an indexer on HashSet. Thanks for the clarification!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug
Fixes: NuGet/Home#14095
Description
Following @ToddGrun's PR: #6265, in particular this comment thread: #6265 (comment)
I noticed that the code is creating one
PackageInfo
andPackageIdentity
andContentUri
for every package version the server knows about. However, in a restore, typically only a single package version is needed. So, by creating thosePackageIdentity
and url strings only when needed, we can remove a whole lot of additional allocations over what the other PR did.In addition to removing
PackageInfo
and associated types, I moved the JSON parsing to System.Text.Json instead of Newtonsoft.Json (which used JObject, no less). So, that provides us with just most of Todd's PR memory improvements as well.PR Checklist
Added testscode was refactored, no behavior changesLink to an issue or pull request to update docs if this PR changes settings, environment variables, new feature, etc.N/A