Description
Search before asking
- I searched in the issues and found nothing similar.
Read release policy
- I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.
Version
Warning
It's possible that the test setup contains bugs. More work is needed to validate that the test case is valid.
Note
It's possible that multiple bugs are involved. The comments describe some observations which might be client side bugs.
This applies to both master branch (which includes PIP-282 changes) as well as in branch-3.3
Minimal reproduce step
the test case has invocationCount=50 to repeat it 50 times. It usually fails about 25% of the test runs.
This is the test case:
org.apache.pulsar.client.api.KeySharedSubscriptionTest#testOrderingAfterReconnects
The test code depends on some new test utility methods.
reproducing based on master branch (with PIP-282 changes)
git clone -b lh-key_shared-testing-2024-09-13-220512 --depth=2 --single-branch https://github.com/lhotari/pulsar
cd pulsar
mvn -Pcore-modules,-main -T 1C clean install -DskipTests -Dspotbugs.skip=true -Dcheckstyle.skip=true -Dlicense.skip=true -DnarPluginPhase=none
mvn -DredirectTestOutputToFile=false -DtestRetryCount=0 test -pl pulsar-broker "-Dtest=org.apache.pulsar.client.api.KeySharedSubscriptionTest#testOrderingAfterReconnects"
based on branch-3.3 (without PIP-282)
git clone -b lh-key_shared-testing-branch-3.3-2024-09-13-220747 --depth=2 --single-branch https://github.com/lhotari/pulsar
cd pulsar
mvn -Pcore-modules,-main -T 1C clean install -DskipTests -Dspotbugs.skip=true -Dcheckstyle.skip=true -Dlicense.skip=true -DnarPluginPhase=none
mvn -DredirectTestOutputToFile=false -DtestRetryCount=0 test -pl pulsar-broker "-Dtest=org.apache.pulsar.client.api.KeySharedSubscriptionTest#testOrderingAfterReconnects"
What did you expect to see?
Key_Shared subscription in AUTO_SPLIT mode should retain ordering of messages by message key.
This should happen by holding back delivery of message keys that are currently handled by a consumer that is no longer the current "owner" of the hash range where the message key belongs to at the time of sending.
What did you see instead?
- message ordering by key isn't preserved in message processing
- sometimes message processing gets blocked and doesn't proceed
- there are duplicate messages in cases where it isn't expected
Anything else?
The initial goal of the test scenario was to simulate a rolling restart of consumers. However this isn't strictly followed in the test case. The test case was modified until test failures started appearing.
Are you willing to submit a PR?
- I'm willing to submit a PR!