-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support skip invalid journal record in replying journal stage #3956
Conversation
@frankjkelly I have created a new PR to fix the bookie that can't startup due to the journal reply failing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, but I would set the property to true by default
* @see #isSkipReplayJournalInvalidRecord . | ||
*/ | ||
public boolean isSkipReplayJournalInvalidRecord() { | ||
return this.getBoolean(SKIP_REPLAY_JOURNAL_INVALID_RECORD, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the previous behaviour was to skip errors, we should keep the same value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous behavior is to throw the exception out and block the bookie start-up. We should keep the same behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though the previous default behavior is false
, I changed the default value to true
, because I think it's valuable to skip the the invalid record instead of make the bookie node can't start up.
try { | ||
if (len == PADDING_MASK && journalVersion >= JournalChannel.V5) { | ||
// skip padding bytes | ||
lenBuff.clear(); | ||
fullRead(recLog, lenBuff); | ||
if (lenBuff.remaining() != 0) { | ||
break; | ||
} | ||
lenBuff.flip(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line 820 may throw IOException, this may cause bookie startup failed ==> fullRead(recLog, lenBuff);
In this PR, line 835 also may throw IOException, because this future the bookie can continue starting.
Same behavior here, two results, right?
while (true) {
// entry start offset
long offset = recLog.fc.position();
// start reading entry
lenBuff.clear();
fullRead(recLog, lenBuff);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I caught all the fullRead
throw exceptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
rerun failure checks |
1 similar comment
rerun failure checks |
Hi folks - thanks for the efforts here - what are the remaining issues on this? Thanks! |
@eolivelli Would you like to take another look at this PR? |
8fb0c9c
to
a70a3b5
Compare
@frankjkelly I addressed all the comments, please help take a look, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Thanks everyone for this - much appreciated |
…#3956) Co-authored-by: zhiyuanlei <[email protected]> (cherry picked from commit 5e9fdc2)
Co-authored-by: zhiyuanlei <[email protected]> (cherry picked from commit 5e9fdc2)
Co-authored-by: zhiyuanlei <[email protected]> (cherry picked from commit 5e9fdc2)
hello community, |
…#3956) Co-authored-by: zhiyuanlei <[email protected]>
Motivation
This PR is generated from #3437
Fix #2220
When we use
kill -9
command to stop one bookie pod, the journal file may be broken. If we try to start up the bookie pod again, the pod will startup to fail with the following exception.We should have a way to make the bookie pod startup instead of decommissioning it.
Changes