subscription state not getting replicated during geo-replication #21612

wojtekkedzior · 2023-11-22T16:37:28Z

wojtekkedzior
Nov 22, 2023

I'm trying to understand why my subscription state is not being replicated from my primary cluster over to a backup cluster. I can see a replication related subscription being created on the backup, but during a fail-over event, my consumers don't continue consuming from the point they left off when they were connected to the primary cluster. In a case where my producer finishes before a fail-over, the replication subscription cursor is always set to the start and is never progressed on the backup.

What is meant to happen here? My understanding is that the cursor of the replicated subscription should move along as the consumers are ack'ing messages on the primary (bar the 1 second [default] delay between snapshots plus any network RTT and the time it takes for the backup to progress the cursor).

I can prove that messages are being replicated to the backup cluster because I can point my producers to the primary and consume only from the backup. However, during a fail-over I seem to loose messages. The behavior varies based on whether a producer is producing during the fail-over or not.

Some basic config:

pulsar version: pachepulsar/pulsar-all:3.1.0
client-all.jar version: compiled locally from master (<pulsar.version>3.2.0-SNAPSHOT</pulsar.version>)
broker config on both primary and back clusters includes:
```
enableReplicatedSubscriptions: "true"
```
the primary pulsar cluster's proxy is on port 4003 while the backup's is on 6003. These ports are forwarded by HAproxy between my local LAN and the server.

note: full config: https://github.com/wojtekkedzior/wp-automation/tree/master/k8/pulsar3/charts/pulsar

The long story:

I'm trying to get replication setup across two pulsar clusters running in Kubernetes on qemu VMs (full K8 and Pulsar config: https://github.com/wojtekkedzior/wp-automation/blob/master/k8/restoreClusterRoot.sh). I installed the standalone zookeeper using helm:

repository: bitnami/zookeeper
tag: 3.9.0-debian-11-r11

I then create the cluster metadata for each cluster:

kubectl exec -i primary-toolset-0  -- /bin/bash -c "/pulsar/bin/pulsar initialize-cluster-metadata 
--cluster primary 
--zookeeper   primary-zookeeper.default.svc.cluster.local:2181 
--configuration-store   my-zookeeper.default.svc.cluster.local:2185 
--web-service-url   http://primary-broker.default.svc.cluster.local:8080 
--broker-service-url   pulsar://primary-broker.default.svc.cluster.local:6650"

`kubectl exec -i primary-toolset-0  -- /bin/bash -c "/pulsar/bin/pulsar initialize-cluster-metadata
--cluster backup 
--zookeeper   backup-zookeeper.default.svc.cluster.local:2181
--configuration-store   my-zookeeper.default.svc.cluster.local:2185  
--web-service-url   http://backup-broker.default.svc.cluster.local:8080 
--broker-service-url    pulsar://backup-broker.default.svc.cluster.local:6650"`

Setup the primary:

kubectl exec -i primary-toolset-0  -- /bin/bash -c "/pulsar/bin/pulsar-admin clusters create 
--broker-url pulsar://backup-broker.default.svc.cluster.local:6650 
--url http://backup-broker.default.svc.cluster.local:8080 backup"

and the backup:

kubectl exec -i backup-toolset-0  -- /bin/bash -c "/pulsar/bin/pulsar-admin clusters create 
--broker-url pulsar://primary-broker.default.svc.cluster.local:6650
--url http://primary-broker.default.svc.cluster.local:8080 primary"

set up replication on the primary:

  kubectl exec -i primary-toolset-0  -- /bin/bash -c "/pulsar/bin/pulsar-admin tenants    create t --admin-roles my-admin-role --allowed-clusters primary,backup"
  kubectl exec -i primary-toolset-0  -- /bin/bash -c "/pulsar/bin/pulsar-admin namespaces create t/ns --bundles 4"
  kubectl exec -i primary-toolset-0  -- /bin/bash -c "/pulsar/bin/pulsar-admin namespaces set-clusters t/ns --clusters primary,backup"

set up replication on the backup:

  kubectl exec -i backup-toolset-0  -- /bin/bash -c "/pulsar/bin/pulsar-admin tenants    create t --admin-roles my-admin-role --allowed-clusters primary,backup"
  kubectl exec -i backup-toolset-0  -- /bin/bash -c "/pulsar/bin/pulsar-admin namespaces create t/ns --bundles 4"
  kubectl exec -i backup-toolset-0  -- /bin/bash -c "/pulsar/bin/pulsar-admin namespaces set-clusters t/ns --clusters primary,backup"

Then I create a single topic on the primary.

kubectl exec -i primary-toolset-0 -- /bin/bash -c "/pulsar/bin/pulsar-admin topics create t/ns/sun"

At this point I have the following cursors on the primary:

$ kubectl exec -i primary-toolset-0  -- /bin/bash -c "/pulsar/bin/pulsar-admin topics stats-internal t/ns/sun"
{
  "entriesAddedCounter" : 0,
  "numberOfEntries" : 0,
  "totalSize" : 0,
  "currentLedgerEntries" : 0,
  "currentLedgerSize" : 0,
  "lastLedgerCreatedTimestamp" : "2023-11-22T14:46:45.12Z",
  "waitingCursorsCount" : 2,
  "pendingAddEntriesCount" : 0,
  "lastConfirmedEntry" : "4:-1",
  "state" : "LedgerOpened",
  "ledgers" : [ {
    "ledgerId" : 4,
    "entries" : 0,
    "size" : 0,
    "offloaded" : false,
    "underReplicated" : false
  } ],
  "cursors" : {
    "sub" : {
      "markDeletePosition" : "4:-1",
      "readPosition" : "4:0",
      "waitingReadOp" : true,
      "pendingReadOps" : 0,
      "messagesConsumedCounter" : 0,
      "cursorLedger" : -1,
      "cursorLedgerLastEntry" : -1,
      "individuallyDeletedMessages" : "[]",
      "lastLedgerSwitchTimestamp" : "2023-11-22T14:46:45.396Z",
      "state" : "NoLedger",
      "active" : true,
      "numberOfEntriesSinceFirstNotAckedMessage" : 1,
      "totalNonContiguousDeletedMessagesRange" : 0,
      "subscriptionHavePendingRead" : true,
      "subscriptionHavePendingReplayRead" : false,
      "properties" : {
        "pulsar.replicated.subscription" : 1
      }
    },
    "pulsar.repl.backup" : {
      "markDeletePosition" : "4:-1",
      "readPosition" : "4:0",
      "waitingReadOp" : true,
      "pendingReadOps" : 0,
      "messagesConsumedCounter" : 0,
      "cursorLedger" : -1,
      "cursorLedgerLastEntry" : -1,
      "individuallyDeletedMessages" : "[]",
      "lastLedgerSwitchTimestamp" : "2023-11-22T14:46:45.298Z",
      "state" : "NoLedger",
      "active" : true,
      "numberOfEntriesSinceFirstNotAckedMessage" : 1,
      "totalNonContiguousDeletedMessagesRange" : 0,
      "subscriptionHavePendingRead" : false,
      "subscriptionHavePendingReplayRead" : false,
      "properties" : { }
    }
  },
  "schemaLedgers" : [ ],
  "compactedLedger" : {
    "ledgerId" : -1,
    "entries" : -1,
    "size" : -1,
    "offloaded" : false,
    "underReplicated" : false
  }
}

while the backup has these:

kubectl exec -i backup-toolset-0  -- /bin/bash -c "/pulsar/bin/pulsar-admin topics stats-internal t/ns/sun"
{
  "entriesAddedCounter" : 0,
  "numberOfEntries" : 0,
  "totalSize" : 0,
  "currentLedgerEntries" : 0,
  "currentLedgerSize" : 0,
  "lastLedgerCreatedTimestamp" : "2023-11-22T14:46:50.909Z",
  "waitingCursorsCount" : 1,
  "pendingAddEntriesCount" : 0,
  "lastConfirmedEntry" : "4:-1",
  "state" : "LedgerOpened",
  "ledgers" : [ {
    "ledgerId" : 4,
    "entries" : 0,
    "size" : 0,
    "offloaded" : false,
    "underReplicated" : false
  } ],
  "cursors" : {
    "pulsar.repl.primary" : {
      "markDeletePosition" : "4:-1",
      "readPosition" : "4:0",
      "waitingReadOp" : true,
      "pendingReadOps" : 0,
      "messagesConsumedCounter" : 0,
      "cursorLedger" : -1,
      "cursorLedgerLastEntry" : -1,
      "individuallyDeletedMessages" : "[]",
      "lastLedgerSwitchTimestamp" : "2023-11-22T14:46:51.173Z",
      "state" : "NoLedger",
      "active" : true,
      "numberOfEntriesSinceFirstNotAckedMessage" : 1,
      "totalNonContiguousDeletedMessagesRange" : 0,
      "subscriptionHavePendingRead" : false,
      "subscriptionHavePendingReplayRead" : false,
      "properties" : { }
    }
  },
  "schemaLedgers" : [ ],
  "compactedLedger" : {
    "ledgerId" : -1,
    "entries" : -1,
    "size" : -1,
    "offloaded" : false,
    "underReplicated" : false
  }
}

Now i will start a consumer on the primary (from now on omitting schemaLedgers and compactedLedger from the output of 'stats-internal')

[pulsar-client-io-2-6] INFO org.apache.pulsar.client.impl.ConsumerImpl - [t/ns/sun][sub] Subscribed to topic on 192.168.1.17/192.168.1.17:4003 -- consumer: 0
no message
no message

on the primary:

{
  "entriesAddedCounter" : 0,
  "numberOfEntries" : 0,
  "totalSize" : 0,
  "currentLedgerEntries" : 0,
  "currentLedgerSize" : 0,
  "lastLedgerCreatedTimestamp" : "2023-11-22T14:46:45.12Z",
  "waitingCursorsCount" : 2,
  "pendingAddEntriesCount" : 0,
  "lastConfirmedEntry" : "4:-1",
  "state" : "LedgerOpened",
  "ledgers" : [ {
    "ledgerId" : 4,
    "entries" : 0,
    "size" : 0,
    "offloaded" : false,
    "underReplicated" : false
  } ],
  "cursors" : {
    "sub" : {
      "markDeletePosition" : "4:-1",
      "readPosition" : "4:0",
      "waitingReadOp" : true,
      "pendingReadOps" : 0,
      "messagesConsumedCounter" : 0,
      "cursorLedger" : -1,
      "cursorLedgerLastEntry" : -1,
      "individuallyDeletedMessages" : "[]",
      "lastLedgerSwitchTimestamp" : "2023-11-22T14:46:45.396Z",
      "state" : "NoLedger",
      "active" : true,
      "numberOfEntriesSinceFirstNotAckedMessage" : 1,
      "totalNonContiguousDeletedMessagesRange" : 0,
      "subscriptionHavePendingRead" : true,
      "subscriptionHavePendingReplayRead" : false,
      "properties" : {
        "pulsar.replicated.subscription" : 1
      }
    },
    "pulsar.repl.backup" : {
      "markDeletePosition" : "4:-1",
      "readPosition" : "4:0",
      "waitingReadOp" : true,
      "pendingReadOps" : 0,
      "messagesConsumedCounter" : 0,
      "cursorLedger" : -1,
      "cursorLedgerLastEntry" : -1,
      "individuallyDeletedMessages" : "[]",
      "lastLedgerSwitchTimestamp" : "2023-11-22T14:46:45.298Z",
      "state" : "NoLedger",
      "active" : true,
      "numberOfEntriesSinceFirstNotAckedMessage" : 1,
      "totalNonContiguousDeletedMessagesRange" : 0,
      "subscriptionHavePendingRead" : false,
      "subscriptionHavePendingReplayRead" : false,
      "properties" : { }
    }
  },

on the back up:

{
  "entriesAddedCounter" : 0,
  "numberOfEntries" : 0,
  "totalSize" : 0,
  "currentLedgerEntries" : 0,
  "currentLedgerSize" : 0,
  "lastLedgerCreatedTimestamp" : "2023-11-22T14:46:50.909Z",
  "waitingCursorsCount" : 1,
  "pendingAddEntriesCount" : 0,
  "lastConfirmedEntry" : "4:-1",
  "state" : "LedgerOpened",
  "ledgers" : [ {
    "ledgerId" : 4,
    "entries" : 0,
    "size" : 0,
    "offloaded" : false,
    "underReplicated" : false
  } ],
  "cursors" : {
    "pulsar.repl.primary" : {
      "markDeletePosition" : "4:-1",
      "readPosition" : "4:0",
      "waitingReadOp" : true,
      "pendingReadOps" : 0,
      "messagesConsumedCounter" : 0,
      "cursorLedger" : -1,
      "cursorLedgerLastEntry" : -1,
      "individuallyDeletedMessages" : "[]",
      "lastLedgerSwitchTimestamp" : "2023-11-22T14:46:51.173Z",
      "state" : "NoLedger",
      "active" : false,
      "numberOfEntriesSinceFirstNotAckedMessage" : 1,
      "totalNonContiguousDeletedMessagesRange" : 0,
      "subscriptionHavePendingRead" : false,
      "subscriptionHavePendingReplayRead" : false,
      "properties" : { }
    }
  },
}

now I will start a producer which will run against the primary and will continure trickling in messages though out the remainign:

{
  "entriesAddedCounter" : 108,
  "numberOfEntries" : 108,
  "totalSize" : 7208,
  "currentLedgerEntries" : 108,
  "currentLedgerSize" : 7208,
  "lastLedgerCreatedTimestamp" : "2023-11-22T14:46:45.12Z",
  "waitingCursorsCount" : 1,
  "pendingAddEntriesCount" : 0,
  "lastConfirmedEntry" : "4:107",
  "state" : "LedgerOpened",
  "ledgers" : [ {
    "ledgerId" : 4,
    "entries" : 0,
    "size" : 0,
    "offloaded" : false,
    "underReplicated" : false
  } ],
  "cursors" : {
    "sub" : {
      "markDeletePosition" : "4:2",
      "readPosition" : "4:13",
      "waitingReadOp" : false,
      "pendingReadOps" : 0,
      "messagesConsumedCounter" : 6,
      "cursorLedger" : 17,
      "cursorLedgerLastEntry" : 5,
      "individuallyDeletedMessages" : "[(4:7..4:8],(4:9..4:11]]",
      "lastLedgerSwitchTimestamp" : "2023-11-22T14:46:45.396Z",
      "state" : "Open",
      "active" : true,
      "numberOfEntriesSinceFirstNotAckedMessage" : 11,
      "totalNonContiguousDeletedMessagesRange" : 2,
      "subscriptionHavePendingRead" : false,
      "subscriptionHavePendingReplayRead" : false,
      "properties" : {
        "pulsar.replicated.subscription" : 1
      }
    },
    "pulsar.repl.backup" : {
      "markDeletePosition" : "4:107",
      "readPosition" : "4:108",
      "waitingReadOp" : true,
      "pendingReadOps" : 0,
      "messagesConsumedCounter" : 108,
      "cursorLedger" : 16,
      "cursorLedgerLastEntry" : 11,
      "individuallyDeletedMessages" : "[]",
      "lastLedgerSwitchTimestamp" : "2023-11-22T14:46:45.298Z",
      "state" : "Open",
      "active" : true,
      "numberOfEntriesSinceFirstNotAckedMessage" : 1,
      "totalNonContiguousDeletedMessagesRange" : 0,
      "subscriptionHavePendingRead" : false,
      "subscriptionHavePendingReplayRead" : false,
      "properties" : { }
    }
  },
}

on the backup:

{
  "entriesAddedCounter" : 20,
  "numberOfEntries" : 20,
  "totalSize" : 1930,
  "currentLedgerEntries" : 20,
  "currentLedgerSize" : 1930,
  "lastLedgerCreatedTimestamp" : "2023-11-22T14:46:50.909Z",
  "waitingCursorsCount" : 1,
  "pendingAddEntriesCount" : 0,
  "lastConfirmedEntry" : "4:19",
  "state" : "LedgerOpened",
  "ledgers" : [ {
    "ledgerId" : 4,
    "entries" : 0,
    "size" : 0,
    "offloaded" : false,
    "underReplicated" : false
  } ],
  "cursors" : {
    "pulsar.repl.primary" : {
      "markDeletePosition" : "4:19",
      "readPosition" : "4:20",
      "waitingReadOp" : true,
      "pendingReadOps" : 0,
      "messagesConsumedCounter" : 20,
      "cursorLedger" : 15,
      "cursorLedgerLastEntry" : 10,
      "individuallyDeletedMessages" : "[]",
      "lastLedgerSwitchTimestamp" : "2023-11-22T14:46:51.173Z",
      "state" : "Open",
      "active" : true,
      "numberOfEntriesSinceFirstNotAckedMessage" : 1,
      "totalNonContiguousDeletedMessagesRange" : 0,
      "subscriptionHavePendingRead" : false,
      "subscriptionHavePendingReplayRead" : false,
      "properties" : { }
    }
  },
}

Now I will simulate a fail-over for the consumer client. This will cause the consumer to change over to the backup brokers. Here's the output:

[pulsar-service-provider-1-1] INFO org.apache.pulsar.client.impl.AutoClusterFailover - Current Pulsar service is pulsar://192.168.1.17:4003, it has been down for 3001 ms, switch to the service pulsar://192.168.1.17:6003. The current service down at 1483954643520223
[pulsar-service-provider-1-1] INFO org.apache.pulsar.client.impl.PulsarClientImpl - Updating service URL to pulsar://192.168.1.17:6003

[pulsar-client-io-2-2] INFO org.apache.pulsar.client.impl.ConnectionPool - [[id: 0xa59f56dd, L:/192.168.1.115:48168 - R:192.168.1.17/192.168.1.17:6003]] Connected to server
[pulsar-client-io-2-4] INFO org.apache.pulsar.client.impl.ConnectionPool - [[id: 0x48cf1833, L:/192.168.1.115:48212 - R:192.168.1.17/192.168.1.17:6003]] Connected to server
[pulsar-client-io-2-4] INFO org.apache.pulsar.client.impl.ClientCnx - [id: 0x48cf1833, L:/192.168.1.115:48212 - R:192.168.1.17/192.168.1.17:6003] Connected through proxy to target broker at backup-broker-1.backup-broker.default.svc.cluster.local:6650
[pulsar-client-io-2-4] INFO org.apache.pulsar.client.impl.ConsumerImpl - [t/ns/sun][sub] Subscribing to topic on cnx [id: 0x48cf1833, L:/192.168.1.115:48212 - R:192.168.1.17/192.168.1.17:6003], consumerId 0
[pulsar-client-io-2-4] INFO org.apache.pulsar.client.impl.ConsumerImpl - [t/ns/sun][sub] Subscribed to topic on 192.168.1.17/192.168.1.17:6003 -- consumer: 0

note: prior to the fail over the last message processed by the primary was 27.

{
  "entriesAddedCounter" : 1939,
  "numberOfEntries" : 1939,
  "totalSize" : 143229,
  "currentLedgerEntries" : 1939,
  "currentLedgerSize" : 143229,
  "lastLedgerCreatedTimestamp" : "2023-11-22T14:46:45.12Z",
  "waitingCursorsCount" : 1,
  "pendingAddEntriesCount" : 0,
  "lastConfirmedEntry" : "4:1938",
  "state" : "LedgerOpened",
  "ledgers" : [ {
    "ledgerId" : 4,
    "entries" : 0,
    "size" : 0,
    "offloaded" : false,
    "underReplicated" : false
  } ],
  "cursors" : {
    "sub" : {
      "markDeletePosition" : "4:838",
      "readPosition" : "4:839",
      "waitingReadOp" : false,
      "pendingReadOps" : 0,
      "messagesConsumedCounter" : 839,
      "cursorLedger" : 17,
      "cursorLedgerLastEntry" : 31,
      "individuallyDeletedMessages" : "[]",
      "lastLedgerSwitchTimestamp" : "2023-11-22T14:46:45.396Z",
      "state" : "Open",
      "active" : false,
      "numberOfEntriesSinceFirstNotAckedMessage" : 1,
      "totalNonContiguousDeletedMessagesRange" : 0,
      "subscriptionHavePendingRead" : false,
      "subscriptionHavePendingReplayRead" : false,
      "properties" : {
        "pulsar.replicated.subscription" : 1
      }
    },
    "pulsar.repl.backup" : {
      "markDeletePosition" : "4:1938",
      "readPosition" : "4:1939",
      "waitingReadOp" : true,
      "pendingReadOps" : 0,
      "messagesConsumedCounter" : 1939,
      "cursorLedger" : 16,
      "cursorLedgerLastEntry" : 151,
      "individuallyDeletedMessages" : "[]",
      "lastLedgerSwitchTimestamp" : "2023-11-22T14:46:45.298Z",
      "state" : "Open",
      "active" : true,
      "numberOfEntriesSinceFirstNotAckedMessage" : 1,
      "totalNonContiguousDeletedMessagesRange" : 0,
      "subscriptionHavePendingRead" : false,
      "subscriptionHavePendingReplayRead" : false,
      "properties" : { }
    }
  },
}

While on the the backup:

{
  "entriesAddedCounter" : 1299,
  "numberOfEntries" : 1299,
  "totalSize" : 97895,
  "currentLedgerEntries" : 1299,
  "currentLedgerSize" : 97895,
  "lastLedgerCreatedTimestamp" : "2023-11-22T14:46:50.909Z",
  "waitingCursorsCount" : 1,
  "pendingAddEntriesCount" : 0,
  "lastConfirmedEntry" : "4:1298",
  "state" : "LedgerOpened",
  "ledgers" : [ {
    "ledgerId" : 4,
    "entries" : 0,
    "size" : 0,
    "offloaded" : false,
    "underReplicated" : false
  } ],
  "cursors" : {
    "sub" : {
      "markDeletePosition" : "4:178",
      "readPosition" : "4:191",
      "waitingReadOp" : false,
      "pendingReadOps" : 0,
      "messagesConsumedCounter" : 184,
      "cursorLedger" : 16,
      "cursorLedgerLastEntry" : 36,
      "individuallyDeletedMessages" : "[(4:183..4:188]]",
      "lastLedgerSwitchTimestamp" : "2023-11-22T14:53:35.603Z",
      "state" : "Open",
      "active" : true,
      "numberOfEntriesSinceFirstNotAckedMessage" : 13,
      "totalNonContiguousDeletedMessagesRange" : 1,
      "subscriptionHavePendingRead" : false,
      "subscriptionHavePendingReplayRead" : false,
      "properties" : {
        "pulsar.replicated.subscription" : 1
      }
    },
    "pulsar.repl.primary" : {
      "markDeletePosition" : "4:1298",
      "readPosition" : "4:1299",
      "waitingReadOp" : true,
      "pendingReadOps" : 0,
      "messagesConsumedCounter" : 1299,
      "cursorLedger" : 15,
      "cursorLedgerLastEntry" : 151,
      "individuallyDeletedMessages" : "[]",
      "lastLedgerSwitchTimestamp" : "2023-11-22T14:46:51.173Z",
      "state" : "Open",
      "active" : true,
      "numberOfEntriesSinceFirstNotAckedMessage" : 1,
      "totalNonContiguousDeletedMessagesRange" : 0,
      "subscriptionHavePendingRead" : false,
      "subscriptionHavePendingReplayRead" : false,
      "properties" : { }
    }
  },
}

At this point the 'sub' subscription appears on the backup (due to having the "allowAutoSubscriptionCreation" option set to true as per the default) and the consumer "appears" to go on processing. Now I will remove the fail over and run the producer again. It will once again run against the primary.

note: The last processed message ID on the backup was: 4:208:-1

Now that we are back on the primary the IDs jump to: 4:868:-1

{
  "entriesAddedCounter" : 3914,
  "numberOfEntries" : 3914,
  "totalSize" : 287833,
  "currentLedgerEntries" : 3914,
  "currentLedgerSize" : 287833,
  "lastLedgerCreatedTimestamp" : "2023-11-22T14:46:45.12Z",
  "waitingCursorsCount" : 1,
  "pendingAddEntriesCount" : 0,
  "lastConfirmedEntry" : "4:3913",
  "state" : "LedgerOpened",
  "ledgers" : [ {
    "ledgerId" : 4,
    "entries" : 0,
    "size" : 0,
    "offloaded" : false,
    "underReplicated" : false
  } ],
  "cursors" : {
    "sub" : {
      "markDeletePosition" : "4:896",
      "readPosition" : "4:903",
      "waitingReadOp" : false,
      "pendingReadOps" : 0,
      "messagesConsumedCounter" : 897,
      "cursorLedger" : 17,
      "cursorLedgerLastEntry" : 56,
      "individuallyDeletedMessages" : "[]",
      "lastLedgerSwitchTimestamp" : "2023-11-22T14:46:45.396Z",
      "state" : "Open",
      "active" : false,
      "numberOfEntriesSinceFirstNotAckedMessage" : 7,
      "totalNonContiguousDeletedMessagesRange" : 0,
      "subscriptionHavePendingRead" : false,
      "subscriptionHavePendingReplayRead" : false,
      "properties" : {
        "pulsar.replicated.subscription" : 1
      }
    },
    "pulsar.repl.backup" : {
      "markDeletePosition" : "4:3913",
      "readPosition" : "4:3914",
      "waitingReadOp" : true,
      "pendingReadOps" : 0,
      "messagesConsumedCounter" : 3914,
      "cursorLedger" : 16,
      "cursorLedgerLastEntry" : 304,
      "individuallyDeletedMessages" : "[]",
      "lastLedgerSwitchTimestamp" : "2023-11-22T14:46:45.298Z",
      "state" : "Open",
      "active" : true,
      "numberOfEntriesSinceFirstNotAckedMessage" : 1,
      "totalNonContiguousDeletedMessagesRange" : 0,
      "subscriptionHavePendingRead" : false,
      "subscriptionHavePendingReplayRead" : false,
      "properties" : { }
    }
  },
}

backup:

{
  "entriesAddedCounter" : 2321,
  "numberOfEntries" : 2321,
  "totalSize" : 177282,
  "currentLedgerEntries" : 2321,
  "currentLedgerSize" : 177282,
  "lastLedgerCreatedTimestamp" : "2023-11-22T14:46:50.909Z",
  "waitingCursorsCount" : 1,
  "pendingAddEntriesCount" : 0,
  "lastConfirmedEntry" : "4:2320",
  "state" : "LedgerOpened",
  "ledgers" : [ {
    "ledgerId" : 4,
    "entries" : 0,
    "size" : 0,
    "offloaded" : false,
    "underReplicated" : false
  } ],
  "cursors" : {
    "sub" : {
      "markDeletePosition" : "4:226",
      "readPosition" : "4:227",
      "waitingReadOp" : false,
      "pendingReadOps" : 0,
      "messagesConsumedCounter" : 227,
      "cursorLedger" : 16,
      "cursorLedgerLastEntry" : 61,
      "individuallyDeletedMessages" : "[]",
      "lastLedgerSwitchTimestamp" : "2023-11-22T14:53:35.603Z",
      "state" : "Open",
      "active" : false,
      "numberOfEntriesSinceFirstNotAckedMessage" : 1,
      "totalNonContiguousDeletedMessagesRange" : 0,
      "subscriptionHavePendingRead" : false,
      "subscriptionHavePendingReplayRead" : false,
      "properties" : {
        "pulsar.replicated.subscription" : 1
      }
    },
    "pulsar.repl.primary" : {
      "markDeletePosition" : "4:2320",
      "readPosition" : "4:2321",
      "waitingReadOp" : true,
      "pendingReadOps" : 0,
      "messagesConsumedCounter" : 2321,
      "cursorLedger" : 15,
      "cursorLedgerLastEntry" : 304,
      "individuallyDeletedMessages" : "[]",
      "lastLedgerSwitchTimestamp" : "2023-11-22T14:46:51.173Z",
      "state" : "Open",
      "active" : true,
      "numberOfEntriesSinceFirstNotAckedMessage" : 1,
      "totalNonContiguousDeletedMessagesRange" : 0,
      "subscriptionHavePendingRead" : false,
      "subscriptionHavePendingReplayRead" : false,
      "properties" : { }
    }
  },
}

This is a scenario where the producer keeps running throughout the fail-over. which means that as soon as the producer client switches, the 'sub' subscription on the back up starts getting messages, but the backup starts processing from ID 136, so what happened to the messages between 27 and 135? My understanding is that during a fail-over the consumers should continue roughly where they finished on the 'other' cluster.

During a scenario where the producer writes all of its data to the primary before a fail-over then once the consumers switch over they get no messages. Does this mean that messages are only replicated while both a producer and a consumer are connected and processing data? Again, my understanding here is that in the case where the producers finish sending its data but there is a failure over during consumption then the consumers should be able to switch over to the backup and continue roughly from where they stopped on the primary (with a second or so of data ). Once the primary is back up, then they should flick over to it and then, again, continue from the point they left on the back (with a second or so). Of course, all this with the assumption that during the fail-over event both clusters can communicate with each other, which is exactly my case as I'm only disconnection the client at the proxy level outside of the Kubernetes cluster itself.

According to my observation I can see that the same subscription names are created, but they are not in sync across the clusters. They are two separate subscription happened to be named the same running on different clusters. I would appreciate any ideas as to what I could be missing here and any pointers to where I could troubleshoot this further.

Answered by lhotari

Nov 22, 2023

Message IDs aren't preserved across clusters in replication so that is not a way to compare the subscriptions.

There a few known limitations in subscription replication which could result in duplicate messages. For example, replication happens up to the mark delete position and another detail is that batch message index positions aren't replicated.

View full answer

lhotari · 2023-11-22T20:36:54Z

lhotari
Nov 22, 2023
Collaborator

Message IDs aren't preserved across clusters in replication so that is not a way to compare the subscriptions.

There a few known limitations in subscription replication which could result in duplicate messages. For example, replication happens up to the mark delete position and another detail is that batch message index positions aren't replicated.

7 replies

wojtekkedzior Nov 23, 2023
Author

Thanks for the reply. Fantastic input! No, no other issues for the time being.

Your PR makes sense. I have to admit that I haven't noticed this behavior yet. Not to say it's not happening, I just didn't notice it.

I also ran into snapshot creation timeouts on my env and I used "replicatedSubscriptionsSnapshotTimeoutSeconds" to increase the timeout, which helped. I have no idea why I started seeing the timeouts all of the sudden, but I put it down to my dev environment.

lhotari Nov 23, 2023
Collaborator

I have no idea why I started seeing the timeouts all of the sudden

My assumption is that when replication has congested links, it simply takes a long time due to the fact how the snapshot could require multiple rounds of messages to be exchanged when there are more than 2 clusters involved.

The algorithm is described here:
https://github.com/apache/pulsar/wiki/PIP-33%3A-Replicated-subscriptions#constructing-a-cursor-snapshot

Note, when there are more than 2 clusters involved, like in the above case with cluster a, b and c, a second round of request-response will be necessary, to ensure we are including all the message that might have been exchanged between the remote clusters.

I haven't tested it, but there's a possibility that reducing replicationProducerQueueSize from the default value of 1000 could help in reducing the time it takes to do a snapshot when replication is congested.

lhotari Nov 23, 2023
Collaborator

Btw. The subscription snapshotting seems to be an application of Vector clocks.

lhotari Dec 22, 2023
Collaborator

Added a feature request about adding a metric for detecting the replicated subscription snapshot timeouts: #21793

lhotari Jun 2, 2025
Collaborator

Fix in progress for a problem with subscription replication: #24300

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

subscription state not getting replicated during geo-replication #21612

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 7 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

subscription state not getting replicated during geo-replication #21612

Uh oh!

wojtekkedzior Nov 22, 2023

Replies: 1 comment · 7 replies

Uh oh!

lhotari Nov 22, 2023 Collaborator

Uh oh!

wojtekkedzior Nov 23, 2023 Author

Uh oh!

Uh oh!

lhotari Nov 23, 2023 Collaborator

Uh oh!

lhotari Nov 23, 2023 Collaborator

Uh oh!

lhotari Dec 22, 2023 Collaborator

Uh oh!

lhotari Jun 2, 2025 Collaborator

wojtekkedzior
Nov 22, 2023

Replies: 1 comment 7 replies

lhotari
Nov 22, 2023
Collaborator

wojtekkedzior Nov 23, 2023
Author

lhotari Nov 23, 2023
Collaborator

lhotari Nov 23, 2023
Collaborator

lhotari Dec 22, 2023
Collaborator

lhotari Jun 2, 2025
Collaborator