-
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Thank you @Apurva007, good questions.
The messages in different clusters don't share the same message ids. The message ids of the originating cluster are independent of the message ids in the remote cluster. There are 2 parts to what you could call "offset management" across clusters. For replication itself, messages originating from one cluster to a remote cluster are handled by a replicator instance for each topic in the originating cluster which will publish (push) messages to the remote cluster and keep the state in a special subscription about this. To prevent replication loops, the message that is published in the remote cluster will contain metadata about the originating cluster and the original message id. The replicator is special in this sense that it's like a consumer but it's directly implemented in the Pulsar broker on top of the "managed ledger" layer without a consumer. For replicated subscriptions, reading "PIP 33: Replicated subscriptions" and especially the "Construction a cursor snapshot" is helpful in understanding how "offset management" works under the covers and what the limitations are. There's also a blog post that contains a useful summary of the limitations. The subscription snapshotting seems to be an application of Vector clocks although this isn't explicitly mentioned in the PIP-33 design document. There's another discussion #21612 which contains useful observations and details about replicated subscriptions.
Shared subscriptions using the same replicated subscription across geo-replication clusters don't have consistent behavior. It "works", but the same offsets would get consumed in both clusters in non deterministic ways. I haven't validated this what I'm saying, but I have the understanding that in many cases the messages would get processed by the concurrent consumers sharing the same replicated subscription name in both clusters, but not at all times. For use cases where there's a requirement to have at-least-once processing in any of the clusters with the replicated cluster, this is fine when a lot of duplicates aren't a problem. My understanding is that replicated subscriptions are designed to be used for active-passive configurations where some overlap isn't a problem and where there's an external solution for handling the solution for choosing which consumer should be active for a particular replicated subscription. It seems that the documentation supports this:
|
Beta Was this translation helpful? Give feedback.
Thank you @Apurva007, good questions.
The messages in different clusters don't share the same message ids. The message ids of the originating cluster are independent of the message ids in the remote cluster.
There are 2 parts to what you could call "offset management" across clusters.
For replication itself, messages originating from one cluster to a remote cluster are handled by a replicator instance for each topic in the originating cluster which will publish (push) messages to the remote cluster and keep the state in a special subscription about this. To prevent replication loops, the message that is published in the remote …