Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize experimental Kafka scaler and fix consumer group logic #5697

Closed
wants to merge 1 commit into from

Conversation

adrien-f
Copy link
Contributor

TL;DR:

  • Only request metadata on the topics needed
  • Fix the logic to fetch topics from the consumer groups

I know we decided to implement the IAM auth in the sarama based scaler (the original) but there is still an optimization that would serve current users of the experimental scaler and potentially lower their Keda load and their broker.

Firstly, the Metadata request is not scoped for the requested topics (which can be empty):

metadata, err := s.client.Metadata(ctx, &kafka.MetadataRequest{
Addr: s.client.Addr,
})

If the list of topics is empty, we enter the branch to detect the topics & permissions based on the consumer group activity here:

https://github.com/kedacore/keda/blob/bcaf5c07e785e3e58e3be4e3707b518fdef6acde/pkg/scalers/apache_kafka_scaler.go#L441C3-L441C14

On my AWS MSK cluster, the version for the response is not supported by the segment-io library which causes the following to error out, as seen in segmentio/kafka-go#1212 (probably):

Got non-zero number of bytes remaining: 10

Alright, for the case of debugging, let's ignore the error. What happens next? From what I can see, this variable describeGrp is unused, and because the MetadataRequest is empty, the whole list of topics is processed and returned with their partitions. All the time. Which then cause Keda to request all consumer & producer offsets again and again, etc...

So to me, currently the behavior is buggy. The describeGrp should be processed to extract the list of topics and partitions for the consumer group which I also correct in this PR.

Checklist

Fixes #

Relates to #5531

@adrien-f adrien-f requested a review from a team as a code owner April 16, 2024 17:41
Copy link
Contributor

@dttung2905 dttung2905 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @adrien-f ,
Thanks for refactoring this piece of code. Generally its LGTM. Just a couple of small comments for me
Since there is a new helper function getTopicPartitionsFromConsumerGroup created, would that be possible to create a unit test for it ?

},
})

// Currently, the request could fail because of an unsupported version of the protocol
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a known error, can we put a link to the issue in the comment so that we can easily trace it back in the future ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

@adrien-f
Copy link
Contributor Author

@dttung2905 thanks you for the review! I will look into adding unit tests, the issue being that this lib does not have an interface to create mock on. I will investigate and see what's possible to do.

Copy link

stale bot commented Jun 25, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Jun 25, 2024
@zroubalik
Copy link
Member

any update on this, please?

@stale stale bot removed the stale All issues that are marked as stale due to inactivity label Jun 25, 2024
Copy link

stale bot commented Aug 25, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Aug 25, 2024
Copy link

stale bot commented Sep 1, 2024

This issue has been automatically closed due to inactivity.

@stale stale bot closed this Sep 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale All issues that are marked as stale due to inactivity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants