Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix negative position read on BufferedChannel cause data can't read problem. #4021

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

lifepuzzlefun
Copy link
Contributor

@lifepuzzlefun lifepuzzlefun commented Jul 8, 2023

Descriptions of the changes in this PR:

Motivation

fix #4022

in brief of this change.

When do compaction. worker thread try to extract the ledgerLogger metadata header to get ledger map.
the header to point map offset is -1.

so the read request to BufferedChannel will reject this read and throw java.lang.IllegalArgumentException: Negative position

but there is a read cache in BufferedChannel the -1 is cached in the Channel which will cause following read on this channel fail to get right data.

more other background:

our production env bookie can't gc log files.

after add some log we found that

2023-07-08T22:12:50,481+0800 [GarbageCollectorThread-19-1] INFO  org.apache.bookkeeper.bookie.GarbageCollectorThread - Extracting entry log meta from entryLogId: 50236
2023-07-08T22:12:50,481+0800 [GarbageCollectorThread-19-1] WARN  org.apache.bookkeeper.bookie.EntryLogger - Recovering ledgers maps for log 50236 at offset: -1
2023-07-08T22:12:50,482+0800 [GarbageCollectorThread-19-1] WARN  org.apache.bookkeeper.bookie.EntryLogger - error when extractEntryLogMetadataFromIndex
java.lang.IllegalArgumentException: Negative position
	at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:813) ~[?:?]
	at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:93) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:65) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.extractEntryLogMetadataFromIndex(EntryLogger.java:1124) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.getEntryLogMetadata(EntryLogger.java:1085) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.extractMetaFromEntryLogs(GarbageCollectorThread.java:758) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.runWithFlags(GarbageCollectorThread.java:411) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.safeRun(GarbageCollectorThread.java:391) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[org.apache.bookkeeper-bookkeeper-common-4.15.4.jar:4.15.4]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.86.Final.jar:4.1.86.Final]
	at java.lang.Thread.run(Thread.java:833) ~[?:?]
2023-07-08T22:12:50,482+0800 [GarbageCollectorThread-19-1] INFO  org.apache.bookkeeper.bookie.EntryLogger - Failed to get ledgers map index from: 50236.log : Negative position
2023-07-08T22:12:50,482+0800 [GarbageCollectorThread-19-1] WARN  org.apache.bookkeeper.bookie.EntryLogger - MARKER pos 1024
2023-07-08T22:12:50,598+0800 [GarbageCollectorThread-19-1] WARN  org.apache.bookkeeper.bookie.EntryLogger - MARKER pos 126754820 add  entrySize 126753792 ledgerId 5593924864
2023-07-08T22:12:50,598+0800 [GarbageCollectorThread-19-1] WARN  org.apache.bookkeeper.bookie.EntryLogger - MARKER pos -1787371633 add  entrySize -1914126457 ledgerId -1675451311566362959
2023-07-08T22:12:50,607+0800 [GarbageCollectorThread-19-1] ERROR org.apache.bookkeeper.common.util.SafeRunnable - Unexpected throwable caught
java.lang.IllegalArgumentException: Negative position
	at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:813) ~[?:?]
	at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:93) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:65) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.readFromLogChannel(EntryLogger.java:418) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.scanEntryLog(EntryLogger.java:1018) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.extractEntryLogMetadataByScanning(EntryLogger.java:1191) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.getEntryLogMetadata(EntryLogger.java:1090) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.extractMetaFromEntryLogs(GarbageCollectorThread.java:758) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.runWithFlags(GarbageCollectorThread.java:411) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.safeRun(GarbageCollectorThread.java:391) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[org.apache.bookkeeper-bookkeeper-common-4.15.4.jar:4.15.4]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.86.Final.jar:4.1.86.Final]
	at java.lang.Thread.run(Thread.java:833) ~[?:?]

but using bookkeeper tool to read the log it success without throw exception

2023-07-08T21:54:52,401+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.ReadLogCommand - Scan entry log 50236 (c43c.log)
2023-07-08T21:54:52,475+0800 [main] WARN  org.apache.bookkeeper.bookie.EntryLogger - MARKER 50236 readBufferStartPosition -9223372036854775808
2023-07-08T21:54:52,486+0800 [main] WARN  org.apache.bookkeeper.bookie.EntryLogger - MARKER pos 1024
2023-07-08T21:54:52,492+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - --------- Lid=21851269, Eid=20, ByteOffset=1024, EntrySize=495132 ---------
2023-07-08T21:54:52,492+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - Type:           DATA
2023-07-08T21:54:52,492+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - LastConfirmed:  19
2023-07-08T21:54:52,492+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil -
2023-07-08T21:54:52,493+0800 [main] WARN  org.apache.bookkeeper.bookie.EntryLogger - MARKER pos 496160 add  entrySize 495132 ledgerId 21851269
2023-07-08T21:54:52,494+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - --------- Lid=21851269, Eid=21, ByteOffset=496160, EntrySize=83751 ---------
2023-07-08T21:54:52,494+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - Type:           DATA
2023-07-08T21:54:52,494+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - LastConfirmed:  19
2023-07-08T21:54:52,494+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil -
2023-07-08T21:54:52,494+0800 [main] WARN  org.apache.bookkeeper.bookie.EntryLogger - MARKER pos 579915 add  entrySize 83751 ledgerId 21851269
2023-07-08T21:54:52,494+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - --------- Lid=21851454, Eid=4, ByteOffset=579915, EntrySize=63 ---------
2023-07-08T21:54:52,494+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - Type:           DATA
2023-07-08T21:54:52,494+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - LastConfirmed:  3
2023-07-08T21:54:52,494+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil -
2023-07-08T21:54:52,495+0800 [main] WARN  org.apache.bookkeeper.bookie.EntryLogger - MARKER pos 579982 add  entrySize 63 ledgerId 21851454
2023-07-08T21:54:52,495+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - --------- Lid=21851454, Eid=5, ByteOffset=579982, EntrySize=43 ---------
2023-07-08T21:54:52,495+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - Type:           DATA
2023-07-08T21:54:52,495+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - LastConfirmed:  3
2023-07-08T21:54:52,495+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil -
2023-07-08T21:54:52,495+0800 [main] WARN  org.apache.bookkeeper.bookie.EntryLogger - MARKER pos 580029 add  entrySize 43 ledgerId 21851454
2023-07-08T21:54:52,495+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - --------- Lid=21851474, Eid=3, ByteOffset=580029, EntrySize=43 ---------

you can see the pos in BufferedChannel is changed like this which is not very good.

2023-07-08T22:12:50,482+0800 [GarbageCollectorThread-19-1] WARN  org.apache.bookkeeper.bookie.EntryLogger - MARKER pos 1024
2023-07-08T22:12:50,598+0800 [GarbageCollectorThread-19-1] WARN  org.apache.bookkeeper.bookie.EntryLogger - MARKER pos 126754820 add  entrySize 126753792 ledgerId 5593924864
2023-07-08T22:12:50,598+0800 [GarbageCollectorThread-19-1] WARN  org.apache.bookkeeper.bookie.EntryLogger - MARKER pos -1787371633 add  entrySize -1914126457 ledgerId -1675451311566362959

and we found the log header is which offset and count is both -1

2023-07-08T22:14:49,412+0800 [GarbageCollectorThread-19-1] WARN  org.apache.bookkeeper.bookie.EntryLogger - Recovering ledgers maps for log 50236 at offset: -1, header version 1 offset -1 count -1

see

public EntryLogMetadata getEntryLogMetadata(long entryLogId, AbstractLogCompactor.Throttler throttler)
throws IOException {
// First try to extract the EntryLogMetadata from the index, if there's no index then fallback to scanning the
// entry log
try {
return extractEntryLogMetadataFromIndex(entryLogId);
} catch (Exception e) {
LOG.info("Failed to get ledgers map index from: {}.log : {}", entryLogId, e.getMessage());
// Fall-back to scanning
return extractEntryLogMetadataByScanning(entryLogId, throttler);
}
}

so the log entry header offset pos is -1.

when -1 pass to BufferedReadChannel
an java.lang.IllegalArgumentException: Negative position throws first time. what is worse is that this position will be cached in BufferedReadChannel see line 92 throw exception but readPosition is not reset, so the position is -1 even after the exception is thrown.

} else {
// We don't have it in the buffer, so put necessary data in the buffer
readBufferStartPosition = currentPosition;
int readBytes = 0;
if ((readBytes = validateAndGetFileChannel().read(readBuffer.internalNioBuffer(0, readCapacity),
currentPosition)) <= 0) {
throw new IOException("Reading from filechannel returned a non-positive value. Short read.");
}
readBuffer.writerIndex(readBytes);
}
}
return (int) (currentPosition - pos);

so when the second time try to execute extractEntryLogMetadataByScanning the BufferedReadChannel is cached in ThreadLocal so the same BufferedReadChannel is return again. but this time if will hit this condition

if (readBufferStartPosition <= currentPosition
&& currentPosition < readBufferStartPosition + readBuffer.readableBytes()) {
int posInBuffer = (int) (currentPosition - readBufferStartPosition);
int bytesToCopy = Math.min(length, readBuffer.readableBytes() - posInBuffer);
dest.writeBytes(readBuffer, posInBuffer, bytesToCopy);
currentPosition += bytesToCopy;
length -= bytesToCopy;
cacheHitCount++;

also we saw this with arthas. see readBufferStartPosition the first time is 0 which is when extractEntryLogMetadataFromIndex execute first.

the second capture you can see this position is -1.

so the third time when execute extractMetaFromEntryLogs the position is a very small negivate number.

method=org.apache.bookkeeper.bookie.EntryLogger.getFromChannels location=AtExit
ts=2023-07-08 21:05:30; [cost=0.002951ms] result=@ArrayList[
    @Object[][
        @Long[50236],
    ],
    @BufferedReadChannel[
        readCapacity=@Integer[4096],
        readBuffer=@InstrumentedUnpooledUnsafeHeapByteBuf[
            alloc=@UnpooledByteBufAllocator[UnpooledByteBufAllocator(directByDefault: true)],
            array=@byte[][isEmpty=false;size=4096],
            tmpNioBuf=@HeapByteBuffer[java.nio.HeapByteBuffer[pos=4096 lim=4096 cap=4096]],
            REFCNT_FIELD_OFFSET=@Long[32],
            AIF_UPDATER=@AtomicIntegerFieldUpdaterImpl[java.util.concurrent.atomic.AtomicIntegerFieldUpdater$AtomicIntegerFieldUpdaterImpl@7e2bf17d],
            updater=@[io.netty.buffer.AbstractReferenceCountedByteBuf$1@865b4ec],
            refCnt=@Integer[2],
            logger=@LocationAwareSlf4JLogger[LocationAwareSlf4JLogger(io.netty.buffer.AbstractByteBuf)],
            LEGACY_PROP_CHECK_ACCESSIBLE=@String[io.netty.buffer.bytebuf.checkAccessible],
            PROP_CHECK_ACCESSIBLE=@String[io.netty.buffer.checkAccessible],
            checkAccessible=@Boolean[true],
            PROP_CHECK_BOUNDS=@String[io.netty.buffer.checkBounds],
            checkBounds=@Boolean[true],
            leakDetector=@ResourceLeakDetector[io.netty.util.ResourceLeakDetector@30a63b98],
            readerIndex=@Integer[0],
            writerIndex=@Integer[4096],
            markedReaderIndex=@Integer[0],
            markedWriterIndex=@Integer[0],
            maxCapacity=@Integer[2147483647],
        ],
        readBufferStartPosition=@Long[0],
        invocationCount=@Long[133],
        cacheHitCount=@Long[680901],
        fileChannel=@FileChannelImpl[
    ],
]
method=org.apache.bookkeeper.bookie.EntryLogger.getFromChannels location=AtExit
ts=2023-07-08 21:05:30; [cost=0.002436ms] result=@ArrayList[
    @Object[][
        @Long[50236],
    ],
    @BufferedReadChannel[
        readCapacity=@Integer[4096],
        readBuffer=@InstrumentedUnpooledUnsafeHeapByteBuf[
            alloc=@UnpooledByteBufAllocator[UnpooledByteBufAllocator(directByDefault: true)],
            array=@byte[][isEmpty=false;size=4096],
            tmpNioBuf=@HeapByteBuffer[java.nio.HeapByteBuffer[pos=0 lim=4096 cap=4096]],
            REFCNT_FIELD_OFFSET=@Long[32],
            AIF_UPDATER=@AtomicIntegerFieldUpdaterImpl[java.util.concurrent.atomic.AtomicIntegerFieldUpdater$AtomicIntegerFieldUpdaterImpl@7e2bf17d],
            updater=@[io.netty.buffer.AbstractReferenceCountedByteBuf$1@865b4ec],
            refCnt=@Integer[2],
            logger=@LocationAwareSlf4JLogger[LocationAwareSlf4JLogger(io.netty.buffer.AbstractByteBuf)],
            LEGACY_PROP_CHECK_ACCESSIBLE=@String[io.netty.buffer.bytebuf.checkAccessible],
            PROP_CHECK_ACCESSIBLE=@String[io.netty.buffer.checkAccessible],
            checkAccessible=@Boolean[true],
            PROP_CHECK_BOUNDS=@String[io.netty.buffer.checkBounds],
            checkBounds=@Boolean[true],
            leakDetector=@ResourceLeakDetector[io.netty.util.ResourceLeakDetector@30a63b98],
            readerIndex=@Integer[0],
            writerIndex=@Integer[4096],
            markedReaderIndex=@Integer[0],
            markedWriterIndex=@Integer[0],
            maxCapacity=@Integer[2147483647],
        ],
        readBufferStartPosition=@Long[-1],
        invocationCount=@Long[134],
        cacheHitCount=@Long[680901],
        fileChannel=@FileChannelImpl[
    ],
]
method=org.apache.bookkeeper.bookie.EntryLogger.getFromChannels location=AtExit
ts=2023-07-08 21:07:50; [cost=0.002968ms] result=@ArrayList[
    @Object[][
        @Long[50236],
    ],
    @BufferedReadChannel[
        readCapacity=@Integer[4096],
        readBuffer=@InstrumentedUnpooledUnsafeHeapByteBuf[
            alloc=@UnpooledByteBufAllocator[UnpooledByteBufAllocator(directByDefault: true)],
            array=@byte[][isEmpty=false;size=4096],
            tmpNioBuf=@HeapByteBuffer[java.nio.HeapByteBuffer[pos=0 lim=4096 cap=4096]],
            REFCNT_FIELD_OFFSET=@Long[32],
            AIF_UPDATER=@AtomicIntegerFieldUpdaterImpl[java.util.concurrent.atomic.AtomicIntegerFieldUpdater$AtomicIntegerFieldUpdaterImpl@7e2bf17d],
            updater=@[io.netty.buffer.AbstractReferenceCountedByteBuf$1@865b4ec],
            refCnt=@Integer[2],
            logger=@LocationAwareSlf4JLogger[LocationAwareSlf4JLogger(io.netty.buffer.AbstractByteBuf)],
            LEGACY_PROP_CHECK_ACCESSIBLE=@String[io.netty.buffer.bytebuf.checkAccessible],
            PROP_CHECK_ACCESSIBLE=@String[io.netty.buffer.checkAccessible],
            checkAccessible=@Boolean[true],
            PROP_CHECK_BOUNDS=@String[io.netty.buffer.checkBounds],
            checkBounds=@Boolean[true],
            leakDetector=@ResourceLeakDetector[io.netty.util.ResourceLeakDetector@30a63b98],
            readerIndex=@Integer[0],
            writerIndex=@Integer[4096],
            markedReaderIndex=@Integer[0],
            markedWriterIndex=@Integer[0],
            maxCapacity=@Integer[2147483647],
        ],
        readBufferStartPosition=@Long[-1787371633],
        invocationCount=@Long[138],
        cacheHitCount=@Long[711850],
        fileChannel=@FileChannelImpl[
    ],
]

so the wrong data is returned.

still have no idea why the entry log header save -1.

Changes

clean cached read buffer and readPosition when read from BufferedChannel throw exception

Copy link
Member

@horizonzy horizonzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very interesting; I'm tracing it now.

@horizonzy
Copy link
Member

2023-07-08T22:12:50,481+0800 [GarbageCollectorThread-19-1] INFO  org.apache.bookkeeper.bookie.GarbageCollectorThread - Extracting entry log meta from entryLogId: 50236
2023-07-08T22:12:50,481+0800 [GarbageCollectorThread-19-1] WARN  org.apache.bookkeeper.bookie.EntryLogger - Recovering ledgers maps for log 50236 at offset: -1
2023-07-08T22:12:50,482+0800 [GarbageCollectorThread-19-1] WARN  org.apache.bookkeeper.bookie.EntryLogger - error when extractEntryLogMetadataFromIndex
java.lang.IllegalArgumentException: Negative position
	at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:813) ~[?:?]
	at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:93) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:65) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.extractEntryLogMetadataFromIndex(EntryLogger.java:1124) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.getEntryLogMetadata(EntryLogger.java:1085) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.extractMetaFromEntryLogs(GarbageCollectorThread.java:758) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.runWithFlags(GarbageCollectorThread.java:411) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.safeRun(GarbageCollectorThread.java:391) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[org.apache.bookkeeper-bookkeeper-common-4.15.4.jar:4.15.4]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.86.Final.jar:4.1.86.Final]
	at java.lang.Thread.run(Thread.java:833) ~[?:?]

I checkout the code to release-4.15.4, but the code in my local does not match your stack. Did you modify some code in your environment?

@lifepuzzlefun
Copy link
Contributor Author

lifepuzzlefun commented Jul 16, 2023

@horizonzy hi, this should be the origin stack, the arthas tool may inject some trace. move the line number.

2023-07-08T18:53:36,200+0800 [GarbageCollectorThread-19-1] ERROR org.apache.bookkeeper.common.util.SafeRunnable - Unexpected throwable caught
java.lang.IllegalArgumentException: Negative position
	at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:813) ~[?:?]
	at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:93) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:65) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.readFromLogChannel(EntryLogger.java:418) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.scanEntryLog(EntryLogger.java:1008) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.extractEntryLogMetadataByScanning(EntryLogger.java:1168) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.getEntryLogMetadata(EntryLogger.java:1071) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.extractMetaFromEntryLogs(GarbageCollectorThread.java:758) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.runWithFlags(GarbageCollectorThread.java:411) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.safeRun(GarbageCollectorThread.java:391) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[org.apache.bookkeeper-bookkeeper-common-4.15.4.jar:4.15.4]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.86.Final.jar:4.1.86.Final]
	at java.lang.Thread.run(Thread.java:833) ~[?:?]
2023-07-08T18:53:36,300+0800 [SyncThread-20-1] INFO  org.apache.bookkeeper.bookie.Journal - garbage collected journal 188800d0755.txn

50236

2023-07-08T19:35:03,633+0800 [SyncThread-20-1] INFO  org.apache.bookkeeper.bookie.EntryLogManagerForSingleEntryLog - Synced entry logger 53177 to disk.
2023-07-08T19:35:20,494+0800 [GarbageCollectorThread-19-1] INFO  org.apache.bookkeeper.bookie.GarbageCollectorThread - Extracting entry log meta from entryLogId: 50236
2023-07-08T19:35:20,494+0800 [GarbageCollectorThread-19-1] INFO  org.apache.bookkeeper.bookie.EntryLogger - Failed to get ledgers map index from: 50236.log : Negative position
2023-07-08T19:35:20,616+0800 [GarbageCollectorThread-19-1] ERROR org.apache.bookkeeper.common.util.SafeRunnable - Unexpected throwable caught
java.lang.IllegalArgumentException: Negative position
	at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:813) ~[?:?]
	at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:93) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:65) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.readFromLogChannel(EntryLogger.java:418) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.scanEntryLog(EntryLogger.java:1008) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.extractEntryLogMetadataByScanning(EntryLogger.java:1168) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.getEntryLogMetadata(EntryLogger.java:1071) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.extractMetaFromEntryLogs(GarbageCollectorThread.java:758) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.runWithFlags(GarbageCollectorThread.java:411) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.safeRun(GarbageCollectorThread.java:391) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[org.apache.bookkeeper-bookkeeper-common-4.15.4.jar:4.15.4]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.86.Final.jar:4.1.86.Final]
	at java.lang.Thread.run(Thread.java:833) ~[?:?]

@horizonzy
Copy link
Member

I checked the code, the ledgerMapOffset corresponds to channel position, and the ledgersCount corresponds to ledger count. Both values can't be -1. It's so strange.

Besides this pr, we can override the ledgerMapOffset and ledgersCount values for the log file after scanning the log file.

currentPosition)) <= 0) {
try {
// We don't have it in the buffer, so put necessary data in the buffer
readBufferStartPosition = currentPosition;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can check the pos in advance; if it is negative, throw an exception directly. This pr will invalidate the read buffer cache. Maybe we can reuse the read buffer cache at next time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to method early statement to protect negative pos read.

@@ -243,6 +243,11 @@ public long forceWrite(boolean forceMetadata) throws IOException {

@Override
public synchronized int read(ByteBuf dest, long pos, int length) throws IOException {
// protect negative position read
if (pos < 0) {
throw new IllegalArgumentException("Negative position pos:" + pos);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IllegalArgumentException is an unchecked exception, can we use IOException and catch it in the program?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for review. my concern on using IOException is that current code base seems not process IOException directly. I didn't check what will happen if the IOException throws outside. For this case I think the IllegalArgumentException is the same as origin behaviour as this is the same exception throw by jdk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Entry log unknown buggy header cause data can't be read.
4 participants