Skip to content

[Bug] Message ordering isn't retained in Key_Shared AUTO_SPLIT mode in a rolling restart type of test scenario #23307

Closed
@lhotari

Description

@lhotari

Search before asking

  • I searched in the issues and found nothing similar.

Read release policy

  • I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.

Version

Warning

It's possible that the test setup contains bugs. More work is needed to validate that the test case is valid.

Note

It's possible that multiple bugs are involved. The comments describe some observations which might be client side bugs.

This applies to both master branch (which includes PIP-282 changes) as well as in branch-3.3

Minimal reproduce step

the test case has invocationCount=50 to repeat it 50 times. It usually fails about 25% of the test runs.

This is the test case:
org.apache.pulsar.client.api.KeySharedSubscriptionTest#testOrderingAfterReconnects
The test code depends on some new test utility methods.

reproducing based on master branch (with PIP-282 changes)

git clone -b lh-key_shared-testing-2024-09-13-220512 --depth=2 --single-branch https://github.com/lhotari/pulsar
cd pulsar
mvn -Pcore-modules,-main -T 1C clean install -DskipTests -Dspotbugs.skip=true -Dcheckstyle.skip=true -Dlicense.skip=true -DnarPluginPhase=none
mvn -DredirectTestOutputToFile=false -DtestRetryCount=0 test -pl pulsar-broker "-Dtest=org.apache.pulsar.client.api.KeySharedSubscriptionTest#testOrderingAfterReconnects"

based on branch-3.3 (without PIP-282)

git clone -b lh-key_shared-testing-branch-3.3-2024-09-13-220747 --depth=2 --single-branch https://github.com/lhotari/pulsar
cd pulsar
mvn -Pcore-modules,-main -T 1C clean install -DskipTests -Dspotbugs.skip=true -Dcheckstyle.skip=true -Dlicense.skip=true -DnarPluginPhase=none
mvn -DredirectTestOutputToFile=false -DtestRetryCount=0 test -pl pulsar-broker "-Dtest=org.apache.pulsar.client.api.KeySharedSubscriptionTest#testOrderingAfterReconnects"

What did you expect to see?

Key_Shared subscription in AUTO_SPLIT mode should retain ordering of messages by message key.
This should happen by holding back delivery of message keys that are currently handled by a consumer that is no longer the current "owner" of the hash range where the message key belongs to at the time of sending.

What did you see instead?

  • message ordering by key isn't preserved in message processing
  • sometimes message processing gets blocked and doesn't proceed
  • there are duplicate messages in cases where it isn't expected

Anything else?

The initial goal of the test scenario was to simulate a rolling restart of consumers. However this isn't strictly followed in the test case. The test case was modified until test failures started appearing.

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

Labels

type/bugThe PR fixed a bug or issue reported a bug

Type

No type

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions