-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][broker] Fix out-of-order issues with ConsistentHashingStickyKeyConsumerSelector #23327
[fix][broker] Fix out-of-order issues with ConsistentHashingStickyKeyConsumerSelector #23327
Conversation
ac7fdeb
to
cc78aec
Compare
I renamed this to "reduce" out-of-order issues since I'm not sure if the current model in this PR prevents the case that a consumer gets swapped when an unrelated consumer leaves. It seems that another hash ring layer is needed for the colliding consumers. That would prevent unnecessary changes. A hash ring inside a hash ring... |
cc78aec
to
6a1be71
Compare
06db6e3
to
bae30bc
Compare
…ame index which is sufficiently consistent
80d6451
to
22b95a7
Compare
Hash ranges are very rare with the current hash range size of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…ConsumerSelector (apache#23327) (cherry picked from commit adb9014) (cherry picked from commit 7f8e508)
…ConsumerSelector (apache#23327) (cherry picked from commit adb9014) (cherry picked from commit 7f8e508)
Fixes #23315
Fixes #23321
Motivation
See #23315 and #23321. This PR fixes a bug in ConsistentHashingStickyKeyConsumerSelector where the selected consumer for a hash could change across 2 existing consumers when a consumer was removed or when a new consumer was added.
In addition it fixes the problem that ConsistentHashingStickyKeyConsumerSelector.getConsumerKeyHashRanges() returned invalid information. These are fixed together since they are in the same code area and fixing getConsumerKeyHashRanges is required for testing the consistent selection of a consumer when there are hash collisions.
Modifications
In PR #8396 a solution was added to fix a problem where all hash ring points were assigned to the last arrived consumer when there were hash collisions, for example when the consumers had the same name:
pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ConsistentHashingStickyKeyConsumerSelector.java
Line 127 in 4f96146
That solution causes problems since it causes the bug #23315. A different solution is needed to distribute the consumers evenly when there are hash collisions.
The solution in this PR is to assign a consumer with a "consumer name index". When there are multiple consumers with the same name, the name index will be unique. Since any consumer name could be potentially used by consumers arriving later, it is necessary to track all consumer names. When all consumers have been removed for a specific consumer name, it shouldn't result in a memory leak. This is the reason why tracking reference counts is handled in the implementation.
The implementation contains a solution where a consumer leaving and joining immediately will get assigned with the same "consumer name index". This improves stability since the hash assignment changes are minimized over time, although a better solution would be to avoid reusing the same consumer name in the first place.
Documentation
doc
doc-required
doc-not-needed
doc-complete