-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[#17375] Add metrics around MemTracker::GcTcmalloc #28777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[#17375] Add metrics around MemTracker::GcTcmalloc #28777
Conversation
@bmatican (@spolitov @timothy-e) 👋🏻 I've raised a quick PR based on the comment here #17375 (comment) Can I get your quick review please? |
src/yb/util/mem_tracker.cc
Outdated
"MemTracker GC TCMalloc Bytes Per Call", | ||
yb::MetricUnit::kBytes, | ||
"Histogram of bytes released per call to MemTracker::GcTcmallocIfNeeded()", | ||
60000000 /* 60MB as max value */, 2 /* 2 digits precision */); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
☝🏻 just a hypothetical number which is large enough to accommodate most of real world use cases, would love to hear your feedback if this needs to be updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use 60000000LU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch thanks ✅ 6108969
✅ Deploy Preview for infallible-bardeen-164bc9 ready!Built without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify project configuration. |
ybctid_reader_provider_(pg_session_), | ||
fk_reference_cache_(ybctid_reader_provider_, buffering_settings_, tablespace_map_), | ||
explicit_row_lock_buffer_(ybctid_reader_provider_, tablespace_map_) { | ||
MemTracker::InitializeGcMetrics(metric_entity_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is primarily useful for the tserver, but could also be valuable for PG,
If I understand it correctly, we'd like to initialise metrics here as well for PG
@rthallamko3 Another PR for improving metrics - can I get a reviewer for this PR please? |
src/yb/util/mem_tracker.cc
Outdated
int64_t final_overhead = GetTCMallocPageHeapFreeBytes(); | ||
int64_t bytes_released = initial_overhead - final_overhead; | ||
if (gc_tcmalloc_bytes_released_metric_ && bytes_released > 0) { | ||
gc_tcmalloc_bytes_released_metric_->IncrementBy(bytes_released); | ||
} | ||
if (gc_tcmalloc_bytes_per_call_metric_ && bytes_released > 0) { | ||
gc_tcmalloc_bytes_per_call_metric_->Increment(bytes_released); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bytes_released
could theoretically be negative if memory increased during GC (unlikely but possible due to concurrent allocations). The > 0 check prevents recording negative values, which is good.
I think we should add a comment to indicate that the metric doesn't perfectly record number of byte is released(as well as the metric description)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(unlikely but possible due to concurrent allocations)
Good point. I can naively come up with following scenarios:
Time T0: initial_overhead = 100MB (measured before GC)
Time T1: GC starts, releases 50MB from page heap
Time T2: Other threads allocate 80MB
Time T3: final_overhead = 130MB (measured after GC)
(Results) bytes_released = 100MB - 130MB = -30MB
I believe the current approach still give us some pictures around GC than having no metrics at all.
Added code comments & improve metrics description ✅ 04ca6f7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work overall! I added a few comments for consideration. Really appreciate your contribution and the effort you put into this!
src/yb/util/mem_tracker.cc
Outdated
if (gc_tcmalloc_bytes_per_call_metric_) { | ||
gc_tcmalloc_bytes_per_call_metric_->Increment(0); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it might make more sense to change the metric semantics to represent “bytes released per GC” instead of “bytes released per call”?
Since we record 0 even when no GC happens, if there are many such calls, it could skew the histogram (especially the average and percentiles) toward 0, making it harder to interpret the real GC behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might make more sense to change the metric semantics to represent “bytes released per GC” instead of “bytes released per call”?
Stepping back a little bit, I think that's right - we're interested in GC behaviour and not particularly in GcTcmallocIfNeeded()
calls anyway. Thanks for the feedback!
Stopped recording 0 to avoid skewing the aggregation result ✅ 9f6c439
e0c9036
to
bb449dd
Compare
@yusong-yan Thanks for having a look and dropping all the great feedback! I've made an improvement and replied back in each comment threads. Let me know if there is anything else I need to fix. |
…_gc_tcmalloc_bytes_per_gc_sum
I discussed with @yusong-yan in Slack, and we decided to drop Confirmed from my dev environment as well: $ curl --silent http://localhost:9000/prometheus-metrics | grep mem_tracker_gc_tcmalloc_ | grep -v "^#"
|
Resolves #17375
Changes Made:
Adds three metrics to track MemTracker::GcTcmallocIfNeeded() performance, addressing the "5% total time" impact on scan queries mentioned in #17375.
2. Total Memory Released(Edited) this was dropped as
mem_tracker_gc_tcmalloc_bytes_per_gc_sum
, which actually reports the value withmem_tracker_gc_tcmalloc_bytes_per_gc_sum
. See #28777 (comment) for more details.Original proposal for `mem_tracker_gc_tcmalloc_bytes_released`
Test Verified:
Verified in the following set up:
/prometheud-metrics
endpointbin/ysqlsh
doing INSERT/SELECT queries against large datasetHere is a screenshot: