Skip to content

DOC-933 Document new consumer group lag metrics and configs #1014

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Mar 19, 2025
Merged

Conversation

JakeSCahill
Copy link
Contributor

@JakeSCahill JakeSCahill commented Mar 14, 2025

Description

Resolves https://redpandadata.atlassian.net/browse/DOC-933
Partially resolves https://redpandadata.atlassian.net/browse/DOC-1115

Review deadline: March 17

Page previews

https://deploy-preview-1014--redpanda-docs-preview.netlify.app/25.1/manage/monitoring/#consumers

Checks

  • New feature
  • Content gap
  • Support Follow-up
  • Small fix (typos, links, copyedits, etc)

@JakeSCahill JakeSCahill requested a review from a team as a code owner March 14, 2025 13:47
Copy link

netlify bot commented Mar 14, 2025

Deploy Preview for redpanda-docs-preview ready!

Name Link
🔨 Latest commit 24e87e5
🔍 Latest deploy log https://app.netlify.com/sites/redpanda-docs-preview/deploys/67d841b8d3aeac0008f9f346
😎 Deploy Preview https://deploy-preview-1014--redpanda-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link
Contributor

hyperlint-ai bot commented Mar 14, 2025

PR Change Summary

Documented new consumer group lag metrics and configurations in the Redpanda documentation.

  • Introduced consumer group lag as a key performance indicator for monitoring data freshness.
  • Added dedicated gauges for consumer group lag metrics to simplify monitoring.
  • Updated configuration properties to enable consumer group lag metrics collection.
  • Provided examples for enabling consumer group metrics in both Kubernetes and Helm environments.

Modified Files

  • modules/manage/partials/monitor-health.adoc
  • modules/reference/pages/properties/cluster-properties.adoc
  • modules/reference/pages/public-metrics-reference.adoc

How can I customize these reviews?

Check out the Hyperlint AI Reviewer docs for more information on how to customize the review.

If you just want to ignore it on this PR, you can add the hyperlint-ignore label to the PR. Future changes won't trigger a Hyperlint review.

Note specifically for link checks, we only check the first 30 links in a file and we cache the results for several hours (for instance, if you just added a page, you might experience this). Our recommendation is to add hyperlint-ignore to the PR to ignore the link check for this PR.

What is Hyperlint?

Hyperlint is an AI agent that helps you write, edit, and maintain your documentation.

Learn more about the Hyperlint AI reviewer and the checks that we can run on your documentation.

@JakeSCahill JakeSCahill requested a review from BenPope March 14, 2025 14:03
Copy link
Contributor

@Feediver1 Feediver1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm


*Type:* integer

*Accepted values:* [`-17179869184`, `17179869183`]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These bounds are wild, maybe I should make it bounded.

@JakeSCahill JakeSCahill requested a review from BenPope March 17, 2025 12:07
@BenPope BenPope requested a review from IoannisRP March 17, 2025 13:39
endif::[]

Enabling `consumer_lag` may add extra processing overhead to the broker, especially in environments with a high number of consumer groups or partitions.
The lower the value of `consumer_group_lag_collection_interval_sec`, the higher the frequency of metric collection, which could result in higher resource utilization. Monitor the broker's resource usage after enabling these properties to ensure that the broker can handle the additional load.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a little overstated, the overhead should be pretty minimal, and not likely to be any more than using an external tool like Burrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, should we recommend that users enable consumer_lag instead of calculating lag themselves?

Why would users choose not to? Is this not enabled by default for some reason?

Copy link

@IoannisRP IoannisRP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior of enable_consumer_group_metrics is:

  • group : enables redpanda_kafka_consumer_group_consumers and redpanda_kafka_consumer_group_topics`
  • partition : enables redpanda_kafka_consumer_group_committed_offset
  • consumer_lag : enables redpanda_kafka_consumer_group_lag_max and redpanda_kafka_consumer_group_lag_sum

The group and partition values are tagged against the wrong metrics in public-metrics-reference.adoc.

It might be helpful to add this cluster-properties.adoc, as well, as @BenPope suggested.

Copy link

@IoannisRP IoannisRP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@JakeSCahill JakeSCahill requested a review from IoannisRP March 17, 2025 16:43
@JakeSCahill JakeSCahill merged commit 86fa3e1 into beta Mar 19, 2025
9 checks passed
@JakeSCahill JakeSCahill deleted the DOC-933 branch March 19, 2025 10:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants