-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[native] Update prometheus metrics asynchronously #24716
base: master
Are you sure you want to change the base?
Conversation
73f3402
to
b0c75a3
Compare
@aditi-pandit @majetideepak Can you please help review? Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaystarshot Can you share some data on the evaluation of this change? What are the benefits and what are the cons?
presto-native-execution/presto_cpp/main/runtime-metrics/tests/PrometheusReporterTest.cpp
Outdated
Show resolved
Hide resolved
@@ -20,7 +20,7 @@ namespace facebook::presto::prometheus { | |||
class PrometheusReporterTest : public testing::Test { | |||
public: | |||
void SetUp() override { | |||
reporter = std::make_shared<PrometheusStatsReporter>(testLabels); | |||
reporter = std::make_shared<PrometheusStatsReporter>(testLabels, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not test with multiple threads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that for gauge metric its the last value which stays so the testing has to be in a single thread for it to be deterministic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaystarshot : Could we have a single test without gauge metrics but with the multi-threading. Without that there is no signal that this code is working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
explicit PrometheusStatsReporter( | ||
const std::map<std::string, std::string>& labels); | ||
const std::map<std::string, std::string>& labels, int numThreads = 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why default to 2? Should we make this configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can make it configurabel
The StatsReporter already runs on a separate thread currently? |
@majetideepak The updates to in-memory metric entities currently can run within the same thread based on where its called from. #23338 This is designed to prevent any query-level slowdown, especially when handling high cardinality metrics. We previously encountered an issue where updating histograms caused latency degradation due to extremely high cardinality. |
19cf1fe
to
99979d6
Compare
@jaystarshot #23338 got resolved with #23357
Did this PR solve the above issue from your experiments? This PR is only moving the work from one thread to another. I am also curious if the updates can now happen out of order. Will this lead to a less precise history? |
@majetideepak this issue wasn’t in a busy system it was just slowing the main thread. Also having a limited threadpool for metric updation means that even when we are busy we can control degradation to only metrics updation (limited threads). |
Also i haven't tested if it fixed that issue since that is legacy but current way of populating metric is a ticking timebomb is someone creates a high cardinality metric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @jaystarshot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jaystarshot. Had one minor question.
@@ -241,4 +246,8 @@ std::string PrometheusStatsReporter::fetchMetrics() { | |||
return serializer.Serialize(impl_->registry->Collect()); | |||
} | |||
|
|||
void PrometheusStatsReporter::waitForCompletion() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be called in the destructor as well ? Would ensure we didn't lose metrics at shutdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope we don't need in destructor. Prometheus is pull based and unless the metrics are pulled by the backend before closing that would be useless.
return resultOpt.value(); | ||
} | ||
// default | ||
return 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit : Abstract this with a constant variable.
60afd31
to
f60aa0f
Compare
f60aa0f
to
7022794
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jaystarshot
Description
Make metric updation async to rule out any slowness in the main thread. This is to rule out any slowness due to extensive metric use as we had seen such issue for histograms beo
Motivation and Context
Impact
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.