New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add padding before timestamp size record if it doesn't fit into a WAL block. #12614
Conversation
9e74314
to
182995b
Compare
Sorry, my bad. This time CI/CD should be ok |
Hello @andlr thanks for this fix! Just curious did you hit this limitation in your DB, how many column families does your DB have? I assume it will start to happen for when the DB has a lot of column families, my rough calculation shows it takes about 5460 column families to encounter this. |
recyclable_log ? kRecyclableHeaderSize : kHeaderSize; | ||
const size_t data_len = kBlockSize - 2 * header_size; | ||
|
||
const auto first_str = BigString("foo", data_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After first_str
, there are kBlockSize - (kBlockSize - 2 * header_size)
left in the block, a.k.a 2 * header_size:
- for non recycleable format, that's 2 * (4 + 2 + 1)
- for recycleable format, that's 2 * (4 + 2 + 1 + 4)
In both case, the remaining space in the block is sufficient to hold the ts_sz record which is:
- for non recycleable format: 4 + 2 + 1 + 6
- for recycleable format: 4 + 2 + 1 + 4 + 6
So it seems to me it won't invoke the logic added in this PR for: if (leftover < header_size_ + (int)encoded.size())
If a ts_sz record can spill over multiple blocks, I think we would need corresponding changes in log_reader.cc to handle any corresponding logic added into log_writer.cc
Maybe simply padding won't fully solve this issue, since it seems to me there is no other cases in log_writer/log_reader where one logical record spans over multiple blocks as multiple physical records without each physical record having its own header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind about the ts_sz spanning over multiple blocks comment, I just realized this PR is not intended to target that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After first_str, there are kBlockSize - (kBlockSize - 2 * header_size) left in the block, a.k.a 2 * header_size:
Hm, not exactly, it's actually kBlockSize - (kBlockSize - header_size) == header_size
, since Write(first_str)
writes a header + data, not only data itself into WAL.
Without the fix, this test actually fails on assert as I've written in the PR description
I just realized this fix is more for when a new column family is added and a new ts_sz record needs to be emitted. If it's to be emitted in a block that does not have enough space left, it will encountered this issue. The fix is intended to account for this case rather than the ts_sz record itself being too big and spill over multiple blocks. |
Yeah, I've started writing a comment about that 🙂
I didn't actually hit this in a running DB, I've been investigating a different WAL-related issue (for that one I'll create a separate issue and PR, it's a bit more complicated than this one), and I've noticed that WAL blocks might be not be always aligned by 32K, so I found this (though as it has turned out, it's unrelated to the initial issue I've been looking into). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the fix @andlr
@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@jowlyzhang merged this pull request in b9fc13d. |
If timestamp size record doesn't fit into a block, without padding
Writer::EmitPhysicalRecord
fails on assert (eitherassert(block_offset_ + kHeaderSize + n <= kBlockSize);
orassert(block_offset_ + kRecyclableHeaderSize + n <= kBlockSize)
, depending on whether recycling log files is enabled) in debug build. In release, current block grows beyond 32K,block_offset_
gets reset on nextAddRecord
and all the subsequent blocks are no longer aligned by block size.