Add padding before timestamp size record if it doesn't fit into a WAL block. #12614

andlr · 2024-05-03T19:12:29Z

If timestamp size record doesn't fit into a block, without padding Writer::EmitPhysicalRecord fails on assert (either assert(block_offset_ + kHeaderSize + n <= kBlockSize); or assert(block_offset_ + kRecyclableHeaderSize + n <= kBlockSize), depending on whether recycling log files is enabled) in debug build. In release, current block grows beyond 32K, block_offset_ gets reset on next AddRecord and all the subsequent blocks are no longer aligned by block size.

andlr · 2024-05-13T17:32:22Z

Sorry, my bad. This time CI/CD should be ok

jowlyzhang · 2024-05-13T20:22:59Z

Hello @andlr thanks for this fix! Just curious did you hit this limitation in your DB, how many column families does your DB have? I assume it will start to happen for when the DB has a lot of column families, my rough calculation shows it takes about 5460 column families to encounter this.

jowlyzhang · 2024-05-13T20:43:59Z

db/log_test.cc

+ recyclable_log ? kRecyclableHeaderSize : kHeaderSize;
+ const size_t data_len = kBlockSize - 2 * header_size;
+
+ const auto first_str = BigString("foo", data_len);


After first_str, there are kBlockSize - (kBlockSize - 2 * header_size) left in the block, a.k.a 2 * header_size:

for non recycleable format, that's 2 * (4 + 2 + 1)

for recycleable format, that's 2 * (4 + 2 + 1 + 4)

In both case, the remaining space in the block is sufficient to hold the ts_sz record which is:

for non recycleable format: 4 + 2 + 1 + 6

for recycleable format: 4 + 2 + 1 + 4 + 6

So it seems to me it won't invoke the logic added in this PR for: if (leftover < header_size_ + (int)encoded.size())

If a ts_sz record can spill over multiple blocks, I think we would need corresponding changes in log_reader.cc to handle any corresponding logic added into log_writer.cc

Maybe simply padding won't fully solve this issue, since it seems to me there is no other cases in log_writer/log_reader where one logical record spans over multiple blocks as multiple physical records without each physical record having its own header.

Never mind about the ts_sz spanning over multiple blocks comment, I just realized this PR is not intended to target that.

After first_str, there are kBlockSize - (kBlockSize - 2 * header_size) left in the block, a.k.a 2 * header_size:

Hm, not exactly, it's actually kBlockSize - (kBlockSize - header_size) == header_size, since Write(first_str) writes a header + data, not only data itself into WAL.

Without the fix, this test actually fails on assert as I've written in the PR description

jowlyzhang · 2024-05-13T21:37:13Z

Hello @andlr thanks for this fix! Just curious did you hit this limitation in your DB, how many column families does your DB have? I assume it will start to happen for when the DB has a lot of column families, my rough calculation shows it takes about 5460 column families to encounter this.

I just realized this fix is more for when a new column family is added and a new ts_sz record needs to be emitted. If it's to be emitted in a block that does not have enough space left, it will encountered this issue. The fix is intended to account for this case rather than the ts_sz record itself being too big and spill over multiple blocks.

andlr · 2024-05-13T21:48:43Z

I just realized this fix is more for when a new column family is added and a new ts_sz record needs to be emitted. If it's to be emitted in a block that does not have enough space left, it will encountered this issue. The fix is intended to account for this case rather than the ts_sz record itself being too big and spill over multiple blocks.

Yeah, I've started writing a comment about that 🙂

Hello @andlr thanks for this fix! Just curious did you hit this limitation in your DB, how many column families does your DB have? I assume it will start to happen for when the DB has a lot of column families, my rough calculation shows it takes about 5460 column families to encounter this.

I didn't actually hit this in a running DB, I've been investigating a different WAL-related issue (for that one I'll create a separate issue and PR, it's a bit more complicated than this one), and I've noticed that WAL blocks might be not be always aligned by 32K, so I found this (though as it has turned out, it's unrelated to the initial issue I've been looking into).

jowlyzhang

LGTM! Thanks for the fix @andlr

facebook-github-bot · 2024-05-13T22:16:39Z

@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-05-14T22:57:20Z

@jowlyzhang merged this pull request in b9fc13d.

facebook-github-bot added the CLA Signed label May 3, 2024

Add padding before timestamp size record if it doesn't fit into a block.

182995b

andlr force-pushed the ts-size-record-padding branch from 9e74314 to 182995b Compare May 9, 2024 09:37

jowlyzhang self-requested a review May 9, 2024 20:44

andlr added 2 commits May 10, 2024 01:42

Fix test - check IOStatus

8d07bed

Fix formatting

e9ca205

jowlyzhang reviewed May 13, 2024

View reviewed changes

jowlyzhang approved these changes May 13, 2024

View reviewed changes

facebook-github-bot closed this in b9fc13d May 14, 2024

facebook-github-bot added the Merged label May 14, 2024

andlr deleted the ts-size-record-padding branch May 15, 2024 09:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add padding before timestamp size record if it doesn't fit into a WAL block. #12614

Add padding before timestamp size record if it doesn't fit into a WAL block. #12614

andlr commented May 3, 2024 •

edited

andlr commented May 13, 2024 •

edited

jowlyzhang commented May 13, 2024

jowlyzhang May 13, 2024 •

edited

jowlyzhang May 13, 2024

andlr May 13, 2024 •

edited

jowlyzhang commented May 13, 2024

andlr commented May 13, 2024

jowlyzhang left a comment

facebook-github-bot commented May 13, 2024

facebook-github-bot commented May 14, 2024

Add padding before timestamp size record if it doesn't fit into a WAL block. #12614

Add padding before timestamp size record if it doesn't fit into a WAL block. #12614

Conversation

andlr commented May 3, 2024 • edited

andlr commented May 13, 2024 • edited

jowlyzhang commented May 13, 2024

jowlyzhang May 13, 2024 • edited

Choose a reason for hiding this comment

jowlyzhang May 13, 2024

Choose a reason for hiding this comment

andlr May 13, 2024 • edited

Choose a reason for hiding this comment

jowlyzhang commented May 13, 2024

andlr commented May 13, 2024

jowlyzhang left a comment

Choose a reason for hiding this comment

facebook-github-bot commented May 13, 2024

facebook-github-bot commented May 14, 2024

andlr commented May 3, 2024 •

edited

andlr commented May 13, 2024 •

edited

jowlyzhang May 13, 2024 •

edited

andlr May 13, 2024 •

edited