Skip to content

Conversation

andsel
Copy link
Contributor

@andsel andsel commented Aug 18, 2025

Release notes

Implements average batch event count and byte size metrics. The collection of such metric could be disabled, enabled for each batch or done on a sample of the total batches.

What does this PR do?

  • Instantiated metric pipelines.<pipeline id>.batch.count to count number of matches to compute the average events and byte per batch
  • Instantiated metric pipelines.<pipeline id>.batch.total_bytes to sumup all the batches event's byte estimation. Exposed metric pipelines.<pipeline id>.batch.byte_size.average.lifetime containing the average byte size of each batch.
  • created new setting pipeline.batch.metrics which could have 3 values: none for disabled. In this case any batch metric is exposed in the _node/stats API. minimal count batches and estimates the size only for 1% of the total while full is for every batch. This setting leverages existing Logstash setting infrastructure so that one defined at pipeline level (defined in pipelines.yml) has precedence over the global one (defined in logstash.yml).

Why is it important/What is the impact to the user?

Exposing metric related to average batch byte size and event count let the user discover the average structure of their batches, understanding if the batches are fulfilled and eventually understand how to set pipeline.batch.size and pipeline.batch.delay so that goal is reached.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation handled by Document feature flag and byte size and event count average metrics #17976
  • I have made corresponding change to the default configuration files (and/or docker env variables)
  • I have added tests that prove my fix is effective or that my feature works

Author's Checklist

  • update default setting pipeline.batch.metrics into logstash.yml to none.

How to test this PR locally

Edit pipeline.batch.metrics in logstash.yml setting the three different values none, minimal, full.
Launch Logstash and verify the metrics with:

curl http://localhost:9600/_node/stats | jq .pipelines.main.batch

Example pipeline:

input {
  java_generator {
    # 1KB
    message => '{"clientip": "192.168.1.10", "ident": "-", "auth": "johndoe", "timestamp": "01/Jul/2025:15:22:10 +0000", "verb": "GET", "request": "/search?q=cloud+logging+apache&lang=en&limit=50&page=2&sort=desc&filter=active&country=us&user_id=123456&session_id=abcdef1234567890abcdef1234567890abcdef&tracking_id=track-0987654321abcdef0987654321abcdef", "httpversion": "1.1", "response": "200", "bytes": "1234", "referrer": "https://www.example.com/ref?q=logtest", "agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36", "headers": { "X-Forwarded-For": "203.0.113.42", "Referrer-Policy": "strict-origin-when-cross-origin", "X-Request-ID": "req-abcdef1234567890abcdef1234567890"},"message": "192.168.1.10 - johndoe [01/Jul/2025:15:22:10 +0000] \"GET /search?... HTTP/1.1\" 200 1234 ...","logsource": "apache_access","event_type": "access","@timestamp": "2025-07-01T15:22:10Z"}'
    codec => json
    threads => 2
  }
}

output {
  sink {}
}

Related issues

…mber of matches to compure the average events and byte per batch
@andsel andsel self-assigned this Aug 18, 2025
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

Copy link
Contributor

mergify bot commented Aug 18, 2025

This pull request does not have a backport label. Could you fix it @andsel? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.
  • If no backport is necessary, please add the backport-skip label

andsel added 13 commits August 18, 2025 17:15
…mup all the batch event's byte estimation. Exposed metric 'pipelines.<pipeline id>.batch.byte_size.average.lifetime' containing the average byte size of each batch
…ient can collect batch metrics related to byte size and event count, this commit spread the setting and parameter around doesn't yet implement the feature
…iltered events and not the existing 'events.in'
@andsel andsel requested a review from Copilot August 21, 2025 12:57
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements batch metrics collection in Logstash to measure average batch byte size and event count. It introduces a new setting pipeline.batch.metrics with three modes: none (disabled), minimal (1% sampling), and full (every batch).

  • Added new batch metrics collection infrastructure with configurable sampling
  • Introduced memory estimation capabilities for events and data structures
  • Exposed batch statistics through the _node/stats API

Reviewed Changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
qa/integration/specs/monitoring_api_spec.rb Added integration tests for batch metrics with 'full' and 'none' modes
logstash-core/src/test/java/.../MockNamespacedMetric.java Created mock implementation for testing metric collection
logstash-core/src/test/java/.../JrubyMemoryReadClientExtTest.java Added unit tests for batch metrics collection in memory read client
logstash-core/src/main/java/.../MetricKeys.java Added batch-related metric key constants
logstash-core/src/main/java/.../QueueFactoryExt.java Added BatchMetricType enum and queue creation logic
logstash-core/src/main/java/.../QueueReadClientBase.java Implemented batch metrics collection and memory estimation
logstash-core/src/main/java/.../Event.java Added memory estimation method for events
logstash-core/src/main/java/.../ConvertedMap.java Implemented memory estimation for data structures
config/logstash.yml Added pipeline.batch.metrics configuration option

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

return Integer.BYTES;
}
if (o instanceof RubyBoolean) {
return Byte.SIZE / Byte.SIZE;
Copy link
Preview

Copilot AI Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, Byte.SIZE / Byte.SIZE should be simplified to 1.

Suggested change
return Byte.SIZE / Byte.SIZE;
return 1;

Copilot uses AI. Check for mistakes.

andsel and others added 2 commits August 21, 2025 15:03
random int (100) produces values in range [0..99] which means that less than 2 could contain 1 and 0, which is 2% and not 1%

Co-authored-by: Copilot <[email protected]>
@andsel
Copy link
Contributor Author

andsel commented Aug 21, 2025

Copy link

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @andsel

@andsel andsel marked this pull request as ready for review August 21, 2025 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extend the feature flag pipeline.batch.metrics to work both at global level Implement average lifetime long batch's size and document count metric
2 participants