-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for partition based scaling on the kafka scaler #6558
base: main
Are you sure you want to change the base?
Add support for partition based scaling on the kafka scaler #6558
Conversation
Scaling kafka consumers should be done in factors of partition count on the topic. This is to ensure that the partitions are evenly spread across all consumers. If the paritions are not evenly spread we run the risk of some partitions being consumed faster than other. This PR adds a new property on the kafka scaler `ensureEvenDistributionOfPartitions`. When this property is set to true the scaler ensure that the number of pods are always evenly spread across the number of topics on the partition. Signed-off-by: Vishal Patel <[email protected]>
a548ac9
to
c46e703
Compare
@dttung2905 @zroubalik , you are the Kafka experts xD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for this PR. Personally, I like the direction this PR is heading to. Just 1 small comment for my understanding
for _, factor := range factors { | ||
if factor*lagThreshold >= totalLag { | ||
return factor | ||
} | ||
} | ||
return totalTopicPartitions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just trying to understand and confirm the logic here. Are we trying to get the smallest number of pods to satisfy the condition factor*lagThreshold >= totalLag
? The reason is the factors array is sorted in ascending order from FindFactors
Another follow up question is that what if it is sorted in descending order to provide a more aggressive strategy to reduce lag 🤔 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just trying to understand and confirm the logic here. Are we trying to get the smallest number of pods to satisfy the condition factor*lagThreshold >= totalLag ? The reason is the factors array is sorted in ascending order from FindFactors
Yep just trying to find the smallest number of pods that will satisfy the condition
what if it is sorted in descending order to provide a more aggressive strategy to reduce lag
In that case, eventually, either we have to flip the conditional factor*lagThreshold >= totalLag
to end up getting the same number of pods that we get with the proposed logic, or we risk running way more pods than we should ideally want. Consider if there are 100 partitions
. It would immediately scale to 100 pods even for a lag of 100 and lagThreshold
of 10.
Not sure if you I am missing anything when you say aggressive strategy?
Issue: #2581
Scaling kafka consumers should be done in factors of partition count on the topic. This is to ensure that the partitions are evenly spread across all consumers. If the paritions are not evenly spread we run the risk of some partitions being consumed faster than other. This PR adds a new property on the kafka scaler
ensureEvenDistributionOfPartitions
. When this property is set to true the scaler ensure that the number of pods are always evenly spread across the number of topics on the partition.Checklist
Fixes #2581
QA:

Tested this locally by pushing an image on a local kind cluster. Kafka consumer and producers are outside the cluster to get granual and quick control over production and consumption rate.
Plotted the pod count and kafka partition lag onto grafana.