Optimize experimental Kafka scaler and fix consumer group logic #5697

adrien-f · 2024-04-16T17:41:28Z

TL;DR:

Only request metadata on the topics needed
Fix the logic to fetch topics from the consumer groups

I know we decided to implement the IAM auth in the sarama based scaler (the original) but there is still an optimization that would serve current users of the experimental scaler and potentially lower their Keda load and their broker.

Firstly, the Metadata request is not scoped for the requested topics (which can be empty):

keda/pkg/scalers/apache_kafka_scaler.go

Lines 425 to 427 in bcaf5c0

 metadata, err := s.client.Metadata(ctx, &kafka.MetadataRequest{ 

 Addr: s.client.Addr, 

 })

If the list of topics is empty, we enter the branch to detect the topics & permissions based on the consumer group activity here:

https://github.com/kedacore/keda/blob/bcaf5c07e785e3e58e3be4e3707b518fdef6acde/pkg/scalers/apache_kafka_scaler.go#L441C3-L441C14

On my AWS MSK cluster, the version for the response is not supported by the segment-io library which causes the following to error out, as seen in segmentio/kafka-go#1212 (probably):

Got non-zero number of bytes remaining: 10

Alright, for the case of debugging, let's ignore the error. What happens next? From what I can see, this variable describeGrp is unused, and because the MetadataRequest is empty, the whole list of topics is processed and returned with their partitions. All the time. Which then cause Keda to request all consumer & producer offsets again and again, etc...

So to me, currently the behavior is buggy. The describeGrp should be processed to extract the list of topics and partitions for the consumer group which I also correct in this PR.

Checklist

When introducing a new scaler, I agree with the scaling governance policy
I have verified that my change is according to the deprecations & breaking changes policy
Tests have been added
Changelog has been updated and is aligned with our changelog requirements
A PR is opened to update our Helm chart (repo) (if applicable, ie. when deployment manifests are modified)
A PR is opened to update the documentation on (repo) (if applicable)
Commits are signed with Developer Certificate of Origin (DCO - learn more)

Fixes #

Relates to #5531

dttung2905

Hi @adrien-f ,
Thanks for refactoring this piece of code. Generally its LGTM. Just a couple of small comments for me
Since there is a new helper function getTopicPartitionsFromConsumerGroup created, would that be possible to create a unit test for it ?

dttung2905 · 2024-04-23T16:01:14Z

pkg/scalers/apache_kafka_scaler.go

+ },
+ })
+
+ // Currently, the request could fail because of an unsupported version of the protocol


If this is a known error, can we put a link to the issue in the comment so that we can easily trace it back in the future ?

Signed-off-by: Adrien Fillon <[email protected]>

adrien-f · 2024-04-25T08:25:03Z

@dttung2905 thanks you for the review! I will look into adding unit tests, the issue being that this lib does not have an interface to create mock on. I will investigate and see what's possible to do.

stale · 2024-06-25T00:16:01Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

zroubalik · 2024-06-25T21:33:40Z

any update on this, please?

stale · 2024-08-25T09:45:30Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

stale · 2024-09-01T22:54:36Z

This issue has been automatically closed due to inactivity.

adrien-f requested a review from a team as a code owner April 16, 2024 17:41

JorTurFer requested review from zroubalik and dttung2905 April 22, 2024 21:53

dttung2905 reviewed Apr 23, 2024

View reviewed changes

Optimize experimental Kafka scaler and fix consumer group logic

b1f5874

Signed-off-by: Adrien Fillon <[email protected]>

adrien-f force-pushed the kafka-exp-update branch from c5bec6a to b1f5874 Compare April 25, 2024 08:23

stale bot added the stale All issues that are marked as stale due to inactivity label Jun 25, 2024

stale bot removed the stale All issues that are marked as stale due to inactivity label Jun 25, 2024

stale bot added the stale All issues that are marked as stale due to inactivity label Aug 25, 2024

stale bot closed this Sep 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize experimental Kafka scaler and fix consumer group logic #5697

Optimize experimental Kafka scaler and fix consumer group logic #5697

adrien-f commented Apr 16, 2024

dttung2905 left a comment

dttung2905 Apr 23, 2024

adrien-f Apr 25, 2024

adrien-f commented Apr 25, 2024

stale bot commented Jun 25, 2024

zroubalik commented Jun 25, 2024

stale bot commented Aug 25, 2024

stale bot commented Sep 1, 2024

	metadata, err := s.client.Metadata(ctx, &kafka.MetadataRequest{
	Addr: s.client.Addr,
	})

Optimize experimental Kafka scaler and fix consumer group logic #5697

Optimize experimental Kafka scaler and fix consumer group logic #5697

Conversation

adrien-f commented Apr 16, 2024

Checklist

dttung2905 left a comment

Choose a reason for hiding this comment

dttung2905 Apr 23, 2024

Choose a reason for hiding this comment

adrien-f Apr 25, 2024

Choose a reason for hiding this comment

adrien-f commented Apr 25, 2024

stale bot commented Jun 25, 2024

zroubalik commented Jun 25, 2024

stale bot commented Aug 25, 2024

stale bot commented Sep 1, 2024