-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable cache Optimization #13481
Comments
I'm not following the bigger context but CPU cache misses "only" re-fetch values from RAM to CPU without allocating more, while my referenced comment focused on aggressive memory allocations, which is not necessarily related.
I'm unfamiliar with this specific module, but this statement is generally off:
My point is that while efficient CPU caches are always desired, they shouldn't devalue other critical aspects like efficient data structures and memory (de)allocations. They are not mutually exclusive! |
you are wrong, they are related actually very very related
This is technically right/wrong depending on the length of the data structure, so I can't really say, but from what i saw the data structure length was 8 and a hash set was used, made no sense to me tbh, for a larger size a non contiguous is better at the end of the day it's all about the size
|
They are very very very very very very very very very very very very very mutually exclusive |
Nah, let's look at a simple example of
This is not mutually exclusive to efficient memory (de)allocations as one can do efficient pre-allocations with This is not mutually exclusive to efficient data structures as when Your argument is that engineers and programs only need to focus on CPU cache and don't even need to care for other things like data structures, memory allocators, or the actual problem and data size a module is solving. I cannot find any context where this makes sense, unfortunately. A more productive approach would be to propose clear design and code changes that address the actual problem in #12842. "Enable cache optimization" is as unclear as "enable CPU optimization" and "require fewer computer operations". And with any performance work, profiling data and benchmark numbers are more helpful than pure theory always, so the issue would be more convincing if you have them. |
I never said this caring for the cache is the same (in a way) as caring for data structures, memory allocators, they are inter dependent on each other, you can draw a straight line between them: Efficient data structure for small length -> cache efficiency and minimal memory allocation
I said this previously, using an example I saw in the code:
I agree, benchmark is always better than theory, my theory is just me observing the code and changing some parts in it, would love to do that fully but I don't really have the time |
My simple My referenced screenshot showed allocations at 100 MB/s with an average lifetime of only 25.77 μs. This is unhealthy, and likely related to (pre)allocating or cloning more than needed regardless of whether the underlying data are contiguous in memory. My whole points in this thread can be summarized as:
I'm writing this many words because I like your energy (in both Reth and our |
@malik672 i think it's worth understanding Hai's broad point -- don't do crazy optimizations, always profile |
But I agreed to this ...... :) :) |
if we have an estimate, it's good enough, in the code it was 8, at worst it should be +5 that's still good for a Vec, I would try to do a pr with benchmark, not sure I've the time but I would try |
Re on point 2 about HashSet vs Vec:
One important correction: The
Seems like latency is main KPI here (correct me if I'm wrong). I processed events in the last 7 days from this Grafana panel
Before changing the code I think we need to take a more systematic approach here.
|
Describe the feature
ref: #12842 (comment)
ref: #12842
The main reason this consumes so much storage or allocation I should say is because it's way too cache unoptimized, the whole
fetcher.rs
seems like it's not taking advantage of the cache instead there are more reads to the main memory instead, a common example should beHash set
with maximum size of 8 should be replaced with aVec
, of course this might not seem logical for indexing but for small data structure it's all about being contiguous in memoryin all the reason for such allocation and latency is just because we are not using the cache efficiently,
@hai-rise @mattsse
Additional context
I don't think any amount of code removal or algorithmic reduction will make this more efficient apart from the using the cache efficiently
The text was updated successfully, but these errors were encountered: