-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Measure average batch byte size and event count #18000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Measure average batch byte size and event count #18000
Conversation
…mber of matches to compure the average events and byte per batch
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
This pull request does not have a backport label. Could you fix it @andsel? 🙏
|
…l in events divided by number of batches
…mup all the batch event's byte estimation. Exposed metric 'pipelines.<pipeline id>.batch.byte_size.average.lifetime' containing the average byte size of each batch
…teMemory can work with
…d batch subtree metrics
…ient can collect batch metrics related to byte size and event count, this commit spread the setting and parameter around doesn't yet implement the feature
…iltered events and not the existing 'events.in'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements batch metrics collection in Logstash to measure average batch byte size and event count. It introduces a new setting pipeline.batch.metrics
with three modes: none
(disabled), minimal
(1% sampling), and full
(every batch).
- Added new batch metrics collection infrastructure with configurable sampling
- Introduced memory estimation capabilities for events and data structures
- Exposed batch statistics through the
_node/stats
API
Reviewed Changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 5 comments.
Show a summary per file
File | Description |
---|---|
qa/integration/specs/monitoring_api_spec.rb | Added integration tests for batch metrics with 'full' and 'none' modes |
logstash-core/src/test/java/.../MockNamespacedMetric.java | Created mock implementation for testing metric collection |
logstash-core/src/test/java/.../JrubyMemoryReadClientExtTest.java | Added unit tests for batch metrics collection in memory read client |
logstash-core/src/main/java/.../MetricKeys.java | Added batch-related metric key constants |
logstash-core/src/main/java/.../QueueFactoryExt.java | Added BatchMetricType enum and queue creation logic |
logstash-core/src/main/java/.../QueueReadClientBase.java | Implemented batch metrics collection and memory estimation |
logstash-core/src/main/java/.../Event.java | Added memory estimation method for events |
logstash-core/src/main/java/.../ConvertedMap.java | Implemented memory estimation for data structures |
config/logstash.yml | Added pipeline.batch.metrics configuration option |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
logstash-core/src/main/java/org/logstash/execution/QueueReadClientBase.java
Outdated
Show resolved
Hide resolved
return Integer.BYTES; | ||
} | ||
if (o instanceof RubyBoolean) { | ||
return Byte.SIZE / Byte.SIZE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, Byte.SIZE / Byte.SIZE
should be simplified to 1
.
return Byte.SIZE / Byte.SIZE; | |
return 1; |
Copilot uses AI. Check for mistakes.
random int (100) produces values in range [0..99] which means that less than 2 could contain 1 and 0, which is 2% and not 1% Co-authored-by: Copilot <[email protected]>
|
💚 Build Succeeded
History
cc @andsel |
Release notes
Implements average batch event count and byte size metrics. The collection of such metric could be disabled, enabled for each batch or done on a sample of the total batches.
What does this PR do?
pipelines.<pipeline id>.batch.count
to count number of matches to compute the average events and byte per batchpipelines.<pipeline id>.batch.total_bytes
to sumup all the batches event's byte estimation. Exposed metricpipelines.<pipeline id>.batch.byte_size.average.lifetime
containing the average byte size of each batch.pipeline.batch.metrics
which could have 3 values:none
for disabled. In this case anybatch
metric is exposed in the_node/stats
API.minimal
count batches and estimates the size only for 1% of the total whilefull
is for every batch. This setting leverages existing Logstash setting infrastructure so that one defined at pipeline level (defined inpipelines.yml
) has precedence over the global one (defined inlogstash.yml
).Why is it important/What is the impact to the user?
Exposing metric related to average batch byte size and event count let the user discover the average structure of their batches, understanding if the batches are fulfilled and eventually understand how to set
pipeline.batch.size
andpipeline.batch.delay
so that goal is reached.Checklist
[ ] I have made corresponding changes to the documentationhandled by Document feature flag and byte size and event count average metrics #17976Author's Checklist
pipeline.batch.metrics
intologstash.yml
tonone
.How to test this PR locally
Edit
pipeline.batch.metrics
inlogstash.yml
setting the three different valuesnone
,minimal
,full
.Launch Logstash and verify the metrics with:
curl http://localhost:9600/_node/stats | jq .pipelines.main.batch
Example pipeline:
Related issues
pipeline.batch.metrics
to work both at global level #17896