[fix] Fix data lost after when writing ledger and deleting legder execute concurrency #4462

poorbarcode · 2024-07-08T08:01:48Z

Motivation

step	`BK client 1`	`BK client 2`
1	create ledger `1`
2		open ledger `1`
3		delete ledger `1`
4	write data to ledger `1`

At the step 4, the write should fail, but it succeeds. It leads users to assume the data has been written, but it can not be read.

You can reproduce the issue by testWriteAfterDeleted

There is a scenario that will lead to Pulsar loss messages

broker-2 created a ledger
broker-2's ZK session is expired, which will lead the topic it owned to be assigned to other brokers
broker-0 owned the topic again
- it will delete the last empty ledger
consumers connected to broker-0
producers connected to broker-2
- send messages to the topic
on broker-2, the ledger can not be closed due to the ledger metadata has been deleted

Changes

Once the ledger is fenced, it can not be wrote anymore.

horizonzy

The Patch is LGTM, but I have a concern about the intent.
Now we only forbid write data to a fenced ledger, but if the ledger(not fenced) is deleted, should we allow it to write data?

poorbarcode · 2024-07-08T12:54:36Z

Now we only forbid write data to a fenced ledger, but if the ledger(not fenced) is deleted, should we allow it to write data?

If the ledger has not been fenced before, it means only one client has the ledger handle, and if it was written after it was deleted, this means there is a bug on the client side.

hangc0276 · 2024-07-08T15:32:55Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/HandleFactoryImpl.java

    @Override
    public void ledgerDeleted(long ledgerId) {
+        markIfConflictWritingOccurs(ledgerId);


It depends on the garbage collection triggered. If the gc is not triggered, we also should fail the case, because the ledger metadata has already been deleted, correct?

It depends on the garbage collection triggered. If the gc is not triggered, we also should fail the case, because the ledger metadata has already been deleted, correct?

Before garbage collection: since the fenced state is still maintained by LedgerHandleFactory.ledgers, BK server knows it is fenced, so the variable fenced of the result of LedgerHandleFactory.getHandle(ledgerId) will be fenced, so that the BK client will get a ledger fenced error.

After garbage collection: the ledger will be marked as fenced & deleted, so the BK client will also get a ledger fenced error.

So the current change is fine

If the bk client opens the ledger by openLedgerNoRecovery, the service won't mark the ledger as a fenced state. The fenced state is caused by bk client open the ledger with recover mode. Of cause, this fix can fix the issue we encountered in Pulsar.

If the bk client opens the ledger by openLedgerNoRecovery, the service won't mark the ledger as a fenced state. The fenced state is caused by bk client open the ledger with recover mode. Of cause, this fix can fix the issue we encountered in Pulsar.

If the BK client opens the ledger by openLedgerNoRecovery, does it mean this client will only read data from the ledger and never close/delete it?

If yes, only the first writable ledger handler will delete the ledger, so we can confirm there is a bug in the client who opens the ledger by openLedgerNoRecovery if it deleted the ledger. Please correct the behavior if my understanding is wrong.

You are correct! I checked all the use case of openLedgerNoRecovery in BookKeeper and Pulsar, we won't delete ledgers with openLedgerNoRecovery LedgerHandle.

shoothzj · 2024-07-09T06:22:49Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/HandleFactory.java

@@ -24,7 +24,7 @@
 import java.io.IOException;

 interface HandleFactory {
-    LedgerDescriptor getHandle(long ledgerId, byte[] masterKey)
+    LedgerDescriptor getHandle(long ledgerId, byte[] masterKey, boolean journalReplay)


will readOnly better than journalReplay?

Get a handle that is read-only will call getReadOnlyHandle

There is only one scenario that will call getHandle to read, it is journal replay.

shoothzj

Is there contains Race Condition between deleteLedger and getHandle with same ledgerId. cc @eolivelli @nicoloboschi

poorbarcode · 2024-07-09T09:40:37Z

Is there contains Race Condition between deleteLedger and getHandle with same ledgerId. cc @eolivelli @nicoloboschi

There is no race condition issue. See details below

There are two thread safety maps: ledgers and recentlyFencedAndDeletedLedgers

OPS: deleteLedger adds ledger to recentlyFencedAndDeletedLedgers first, then removes ledger from ledgers.
OPS: getHandle get ledger from ledgers first, then get ledger from recentlyFencedAndDeletedLedgers

Scenario 1

`getHandle`	`deleteLedger`
`write`: get an existing handle
	fence
	`delete ledger`: add ledger into `recentlyFencedAndDeletedLedgers`
`write`: get an existing handle(which is fenced), client will get a fenced error
	`delete ledger`: remove ledger from `ledgers`

Scenario 2

`getHandle`	`deleteLedger`
`write`: get an existing handle
	fence
`write`: get an existing handle(which is fenced), client will get a fenced error
	`delete ledger`: add ledger into `recentlyFencedAndDeletedLedgers`
	`delete ledger`: remove ledger from `ledgers`

Scenario 3

`getHandle`	`deleteLedger`
`write`: get an existing handle
	fence
	`delete ledger`: add ledger into `recentlyFencedAndDeletedLedgers`
	`delete ledger`: remove ledger from `ledgers`
`write`: get an error since it is in `recentlyFencedAndDeletedLedgers`

Scenario 4

`getHandle`	`deleteLedger`
	fence
`write`: get an existing handle(which is fenced), client will get a fenced error
	`delete ledger`: add ledger into `recentlyFencedAndDeletedLedgers`
	`delete ledger`: remove ledger from `ledgers`

Scenario 5

`getHandle`	`deleteLedger`
`write: get handle`: get an existing handle
	fence
	`delete ledger`: add ledger into `recentlyFencedAndDeletedLedgers`
	`delete ledger`: remove ledger from `ledgers`
`write: do write`: update LAC

Scenario 5 additional note: Since getHandle and deleteLedger execute at the same thread, this scenario will not occur.

eolivelli · 2024-07-11T07:00:50Z

bookkeeper-server/src/test/java/org/apache/bookkeeper/client/TestFencing.java

+            LOG.error("Should have thrown an exception");
+            fail("Should have thrown an exception when trying to write");
+        } catch (Exception e) {
+            e.printStackTrace();


Please use logger and catch the expected exception

eolivelli · 2024-07-11T07:03:13Z

bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/HandleFactoryImpl.java

+     * "delete" now, and other clients may call "write" continuously later. We mark these ledgers can not be written
+     * anymore. And maintains the state for 7 days is safety.
+     */
+    private final Cache<Long, Boolean> recentlyFencedAndDeletedLedgers = CacheBuilder.newBuilder()


This is in memory, what happens if you restart the bookie ?

This is in memory, what happens if you restart the bookie?

I think scenarios 1 and 2 below are fine, and scenario 3 below is not fine.

Regarding scenario 3, I think there are two solutions below, which is your preference?

client side: After a bookie channel is reconnected, let all ledgers related to this channel change BKs ensemble.

server side: Let the GC task skip the fenced ledger until it is not used for a long time.

Scenario 1: the journal log is still there

Deleted the ledger.

Restart BK.

Replay journal logs.

Reset the ledger as fenced.

Since the ledger is fenced, the client will get a fenced error when it tries to write.

Delete the ledger after a GC

Move the ledger into recentlyFencedAndDeletedLedgers

Since the ledger is in the variable recentlyFencedAndDeletedLedgers, the client will get a fenced error when it tries to write.

Scenario 2: the journal log was removed & the client switched the ensemble due to pending operations failed

Deleted the ledger.

Journal logs were removed.

Restart BK.

The client tries to switch the ensemble due to previous operation fails

Since the ledger metadata was removed, this operation will fail.

Scenario 3: the journal log was removed & the client did not switch the ensemble because there are no pending operations

Deleted the ledger.

Journal logs were removed.

Restart BK.

The client reconnect successfully.

If the client tries to write, it will be successful.

@eolivelli How about we split this into 2 PRs

Fix it in memory, which is what the current PR did. And this PR can be cherry-picked into the previous branches, this fix can fix almost all the scenarios(the probably of brokers and Bks crashes at the same time is quite few)

Fix it with persistent storage, we can persist the states into a file, and add a config to define which file will be used, this will be released with the next feature release

eolivelli

LGTM

The is a small nit about a printStackTrace, please remove it before committing the patch

eolivelli · 2024-08-27T09:00:59Z

Then I expect to see the follow up work:

Fix it with persistent storage, we can persist the states into a file, and add a config to define which file will be used, this will be released with the next feature release

poorbarcode added 3 commits July 8, 2024 15:18

fix data lost

a49b436

add log

23addcb

remove unnecessary changes

45ad309

horizonzy reviewed Jul 8, 2024

View reviewed changes

hangc0276 requested changes Jul 8, 2024

View reviewed changes

hangc0276 requested review from merlimat, dlg99, eolivelli, horizonzy and zymap July 8, 2024 15:33

hangc0276 assigned poorbarcode Jul 8, 2024

poorbarcode requested a review from hangc0276 July 8, 2024 15:39

zymap approved these changes Jul 9, 2024

View reviewed changes

shoothzj reviewed Jul 9, 2024

View reviewed changes

poorbarcode requested a review from shoothzj July 9, 2024 09:40

hangc0276 approved these changes Jul 9, 2024

View reviewed changes

poorbarcode mentioned this pull request Jul 9, 2024

[fix][broker] Change default behavior when ZK session expired to shutdown apache/pulsar#23011

Closed

4 tasks

eolivelli requested changes Jul 11, 2024

View reviewed changes

address comments

c2c2cf4

poorbarcode requested a review from eolivelli July 11, 2024 13:37

eolivelli approved these changes Aug 27, 2024

View reviewed changes

address comment

b10b12f

dlg99 approved these changes Sep 3, 2024

View reviewed changes

hangc0276 merged commit 47ef48e into apache:master Sep 13, 2024
23 checks passed

hangc0276 added type/bug release/4.15.6 labels Sep 13, 2024

hangc0276 added release/4.16.7 release/4.17.2 labels Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix] Fix data lost after when writing ledger and deleting legder execute concurrency #4462

[fix] Fix data lost after when writing ledger and deleting legder execute concurrency #4462

poorbarcode commented Jul 8, 2024 •

edited

Loading

horizonzy left a comment

poorbarcode commented Jul 8, 2024 •

edited

Loading

hangc0276 Jul 8, 2024

poorbarcode Jul 8, 2024 •

edited

Loading

hangc0276 Jul 8, 2024

poorbarcode Jul 8, 2024

hangc0276 Jul 9, 2024

shoothzj Jul 9, 2024

poorbarcode Jul 9, 2024

shoothzj left a comment

poorbarcode commented Jul 9, 2024

eolivelli Jul 11, 2024

poorbarcode Jul 11, 2024

eolivelli Jul 11, 2024

poorbarcode Jul 11, 2024 •

edited

Loading

poorbarcode Aug 27, 2024

eolivelli left a comment

eolivelli commented Aug 27, 2024

[fix] Fix data lost after when writing ledger and deleting legder execute concurrency #4462

[fix] Fix data lost after when writing ledger and deleting legder execute concurrency #4462

Conversation

poorbarcode commented Jul 8, 2024 • edited Loading

Motivation

Changes

horizonzy left a comment

Choose a reason for hiding this comment

poorbarcode commented Jul 8, 2024 • edited Loading

Choose a reason for hiding this comment

poorbarcode Jul 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shoothzj left a comment

Choose a reason for hiding this comment

poorbarcode commented Jul 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

poorbarcode Jul 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eolivelli left a comment

Choose a reason for hiding this comment

eolivelli commented Aug 27, 2024

poorbarcode commented Jul 8, 2024 •

edited

Loading

poorbarcode commented Jul 8, 2024 •

edited

Loading

poorbarcode Jul 8, 2024 •

edited

Loading

poorbarcode Jul 11, 2024 •

edited

Loading