-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[ENH] Add block-level metrics #4801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add Block-Level Metrics and Refactor Tracing/Instrumentation This PR introduces block-level Prometheus metrics to the block store layer, specifically tracking cold block get requests, commit and flush latencies, and number of blocks flushed. It also streamlines the tracing infrastructure for block operations, introduces a utility to access the current trace ID, and removes some previously extraneous tracing spans. Key Changes: Affected Areas: This summary was automatically generated by @propel-code-bot |
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
Please tag your PR title with one of: [ENH | BUG | DOC | TST | BLD | PERF | TYP | CLN | CHORE]. See https://docs.trychroma.com/contributing#contributing-code-and-ideas |
Please tag your PR title with one of: [ENH | BUG | DOC | TST | BLD | PERF | TYP | CLN | CHORE]. See https://docs.trychroma.com/contributing#contributing-code-and-ideas |
rust/storage/src/s3.rs
Outdated
impl S3StorageMetrics { | ||
pub fn new(meter: Meter) -> Self { | ||
Self { | ||
total_num_get_requests: meter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: num_get_requests()
?
rust/storage/src/s3.rs
Outdated
pub fn new(meter: Meter) -> Self { | ||
Self { | ||
total_num_get_requests: meter | ||
.u64_counter("s3_storage_total_num_get_requests") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s3_num_get_requests
?
rust/storage/src/s3.rs
Outdated
@@ -125,6 +145,7 @@ impl S3Storage { | |||
match res { | |||
Ok(res) => { | |||
let byte_stream = res.body; | |||
self.metrics.total_num_get_requests.add(1, &[]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not in get_with_e_tag
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought I'd put it closer to where the S3 call happens in case someone else calls get_stream_and_e_tag
some time in the future. Right now get_with_e_tag
calls get_stream_and_e_tag
where this code snippet is.
b9263ee
to
216fc54
Compare
rust/storage/src/s3.rs
Outdated
|
||
let trace_id = get_current_trace_id().to_string(); | ||
let attribute = [KeyValue::new("trace_id", trace_id)]; | ||
self.metrics.num_get_requests.record(1, &attribute); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could track and emit these metrics at block manager level?
b842838
to
a3e8545
Compare
Description of changes
Added block-level metrics. In the current version of this PR, it adds the following metrics that one can view from Prometheus:
block_num_get_requests
block_commit_latency
block_num_blocks_flushed
block_flush_latency
Improvements & Bug fixes
New functionality
Test plan
How are these changes tested?
pytest
for python,yarn test
for js,cargo test
for rustDocumentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?