btrfs-progs: docs: extra notes about read-only scrub on read-write fs #935

adam900710 · 2024-12-22T00:20:52Z

[BUG]
There is a bug report that read-only scrub on a read-write fs still causes writes into the fs, and that will be caught if there is a read-only block device among the storage stack.

This will cause a kernel warning on failed transaction commit:

 BTRFS info (device dm-3): first mount of filesystem e18f0c40-88de-413f-9d7e-dcc8136ad6dd
 BTRFS info (device dm-3): using crc32c (crc32c-intel) checksum algorithm
 BTRFS info (device dm-3): using free-space-tree
 BTRFS info (device dm-3): scrub: started on devid 1
 Trying to write to read-only block-device md127
 btrfs_dev_stat_inc_and_print: 362 callbacks suppressed
 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 3, rd 0, flush 0, corrupt 0, gen 0
 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 4, rd 0, flush 0, corrupt 0, gen 0
 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 5, rd 0, flush 0, corrupt 0, gen 0
 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 6, rd 0, flush 0, corrupt 0, gen 0
 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 7, rd 0, flush 0, corrupt 0, gen 0
 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 8, rd 0, flush 0, corrupt 0, gen 0
 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 9, rd 0, flush 0, corrupt 0, gen 0
 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 10, rd 0, flush 0, corrupt 0, gen 0
 BTRFS: error (device dm-3) in btrfs_commit_transaction:2523: errno=-5 IO failure (Error while writing out transaction)
 BTRFS info (device dm-3 state E): forced readonly
 BTRFS warning (device dm-3 state E): Skipping commit of aborted transaction.
 BTRFS error (device dm-3 state EA): Transaction aborted (error -5)
 BTRFS: error (device dm-3 state EA) in cleanup_transaction:2017: errno=-5 IO failure
 BTRFS warning (device dm-3 state EA): failed setting block group ro: -5
 BTRFS info (device dm-3 state EA): scrub: not finished on devid 1 with status: -5

[CAUSE]
The root cause is inside btrfs_inc_block_group_ro(), where we need to hold a transaction handle, to prevent the transaction to be committed, until we hold ro_block_group_mutex.

This will cause an empty transaction by itself, thus even if we can mark the block group read-only without any extra workload, we still need to commit the new and empty transaction.

Unfortunately this means RO scrub on RW filesystem will always cause the fs to be updated.

[FIX]
The best fix is to make btrfs to avoid empty commit transaction, but even with that done, read-only scrub on rw mount can still cause real metadata updates (e.g. allocate new chunks and update device error statistics).

It will be very complex to make read-only scrub to be fully read-only on a read-write btrfs.

Thankfully read-only scrub on read-write mount with read-only device in the storage stack is pretty rare, thus a documentation update should be enough.

Issue: #934

The function btrfs_record_file_extent() has extra handling that's specific to convert, like allowing the range to be split by block group boundary and image file extent boundary. All of these split can only lead to corruption for non-converted fs. As the only caller out of btrfs-convert is rootdir, which expects the file extent item insert to respect the reserved data extent, and never to be split. Thankfully this is not going to cause huge problem, as btrfs_record_file_extent() has extra checks if the data extent overlaps with any existing one, and if it doesn't the handling will be the same as the kernel. But to avoid abuse, change btrfs_record_file_extent() by: - Rename it to btrfs_convert_file_extent() And add extra comments on that it is specific to btrfs-convert. - Move it to convert/common.[ch] - Introduce a helper insert_reserved_file_extent() for rootdir.c Signed-off-by: Qu Wenruo <[email protected]>

…extent item Just like insert_reserved_file_extent() from the kernel, we can make btrfs_insert_file_extent() accept an on-stack file extent item directly. This makes btrfs_insert_file_extent() more flex, and it can now handle the converted file extent where it has an non-zero offset. And this makes it much easier to expand for future compressed file extent generation. Signed-off-by: Qu Wenruo <[email protected]>

There were two major problems with add_file_items(): it was writing all files sector-by-sector, making compression impossible, and it was assuming that pread would never do a short read. Fix these problems, and create a new helper add_file_item_extent(). Signed-off-by: Mark Harmstone <[email protected]>

Add an option --compress to mkfs.btrfs, to allow creating files using zlib when using --rootdir. Signed-off-by: Mark Harmstone <[email protected]>

Allow --compress to work with zstd, when compiled in. Signed-off-by: Mark Harmstone <[email protected]>

Allow --compress to work with lzo. Signed-off-by: Mark Harmstone <[email protected]> [ Add extra handling when LZO support is not compiled in ] Signed-off-by: Qu Wenruo <[email protected]>

[BUG] There is a bug report that read-only scrub on a read-write fs still causes writes into the fs, and that will be caught if there is a read-only block device among the storage stack. This will cause a kernel warning on failed transaction commit: BTRFS info (device dm-3): first mount of filesystem e18f0c40-88de-413f-9d7e-dcc8136ad6dd BTRFS info (device dm-3): using crc32c (crc32c-intel) checksum algorithm BTRFS info (device dm-3): using free-space-tree BTRFS info (device dm-3): scrub: started on devid 1 Trying to write to read-only block-device md127 btrfs_dev_stat_inc_and_print: 362 callbacks suppressed BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 1, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 2, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 3, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 4, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 5, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 6, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 7, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 8, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 9, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 10, rd 0, flush 0, corrupt 0, gen 0 BTRFS: error (device dm-3) in btrfs_commit_transaction:2523: errno=-5 IO failure (Error while writing out transaction) BTRFS info (device dm-3 state E): forced readonly BTRFS warning (device dm-3 state E): Skipping commit of aborted transaction. BTRFS error (device dm-3 state EA): Transaction aborted (error -5) BTRFS: error (device dm-3 state EA) in cleanup_transaction:2017: errno=-5 IO failure BTRFS warning (device dm-3 state EA): failed setting block group ro: -5 BTRFS info (device dm-3 state EA): scrub: not finished on devid 1 with status: -5 [CAUSE] The root cause is inside btrfs_inc_block_group_ro(), where we need to hold a transaction handle, to prevent the transaction to be committed, until we hold ro_block_group_mutex. This will cause an empty transaction by itself, thus even if we can mark the block group read-only without any extra workload, we still need to commit the new and empty transaction. Unfortunately this means RO scrub on RW filesystem will always cause the fs to be updated. [FIX] The best fix is to make btrfs to avoid empty commit transaction, but even with that done, read-only scrub on rw mount can still cause real metadata updates (e.g. allocate new chunks and update device error statistics). It will be very complex to make read-only scrub to be fully read-only on a read-write btrfs. Thankfully read-only scrub on read-write mount with read-only device in the storage stack is pretty rare, thus a documentation update should be enough. Issue: kdave#934 Signed-off-by: Qu Wenruo <[email protected]>

kdave

This got mixed with the compression patches so I've commented only the documentation change that's clear.

kdave · 2025-01-07T19:51:36Z

Documentation/btrfs-scrub.rst

+		Note that a read-only scrub on a read-write filesystem can
+		still cause write into the filesystem due to some internal
+		limitations.
+		Only a read-only scrub on a read-only fs can avoid writes from


fs

filesystem

kdave · 2025-01-07T19:52:24Z

Documentation/ch-scrub-intro.rst

@@ -46,6 +46,16 @@ read-write mounted filesystem.
   used, with expert guidance, to rebuild certain corrupted filesystem structures
   in the absence of any good replica.

+.. note::
+   Read-only scrub on read-write filesystem will cause some write into the


on read-write

on a read-write

some write

some writes

[BUG] There is a bug report that read-only scrub on a read-write fs still causes writes into the fs, and that will be caught if there is a read-only block device among the storage stack. This will cause a kernel warning on failed transaction commit: BTRFS info (device dm-3): first mount of filesystem e18f0c40-88de-413f-9d7e-dcc8136ad6dd BTRFS info (device dm-3): using crc32c (crc32c-intel) checksum algorithm BTRFS info (device dm-3): using free-space-tree BTRFS info (device dm-3): scrub: started on devid 1 Trying to write to read-only block-device md127 btrfs_dev_stat_inc_and_print: 362 callbacks suppressed BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 1, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 2, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 3, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 4, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 5, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 6, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 7, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 8, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 9, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 10, rd 0, flush 0, corrupt 0, gen 0 BTRFS: error (device dm-3) in btrfs_commit_transaction:2523: errno=-5 IO failure (Error while writing out transaction) BTRFS info (device dm-3 state E): forced readonly BTRFS warning (device dm-3 state E): Skipping commit of aborted transaction. BTRFS error (device dm-3 state EA): Transaction aborted (error -5) BTRFS: error (device dm-3 state EA) in cleanup_transaction:2017: errno=-5 IO failure BTRFS warning (device dm-3 state EA): failed setting block group ro: -5 BTRFS info (device dm-3 state EA): scrub: not finished on devid 1 with status: -5 [CAUSE] The root cause is inside btrfs_inc_block_group_ro(), where we need to hold a transaction handle, to prevent the transaction to be committed, until we hold ro_block_group_mutex. This will cause an empty transaction by itself, thus even if we can mark the block group read-only without any extra workload, we still need to commit the new and empty transaction. Unfortunately this means RO scrub on RW filesystem will always cause the fs to be updated. [FIX] The best fix is to make btrfs to avoid empty commit transaction, but even with that done, read-only scrub on rw mount can still cause real metadata updates (e.g. allocate new chunks and update device error statistics). It will be very complex to make read-only scrub to be fully read-only on a read-write btrfs. Thankfully read-only scrub on read-write mount with read-only device in the storage stack is pretty rare, thus a documentation update should be enough. Issue: #934 Pull-request: #935 Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>

kdave · 2025-01-28T13:31:45Z

Merged to devel with the suggested fixups.

[BUG] There is a bug report that read-only scrub on a read-write fs still causes writes into the fs, and that will be caught if there is a read-only block device among the storage stack. This will cause a kernel warning on failed transaction commit: BTRFS info (device dm-3): first mount of filesystem e18f0c40-88de-413f-9d7e-dcc8136ad6dd BTRFS info (device dm-3): using crc32c (crc32c-intel) checksum algorithm BTRFS info (device dm-3): using free-space-tree BTRFS info (device dm-3): scrub: started on devid 1 Trying to write to read-only block-device md127 btrfs_dev_stat_inc_and_print: 362 callbacks suppressed BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 1, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 2, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 3, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 4, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 5, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 6, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 7, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 8, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 9, rd 0, flush 0, corrupt 0, gen 0 BTRFS error (device dm-3): bdev /dev/mapper/data errs: wr 10, rd 0, flush 0, corrupt 0, gen 0 BTRFS: error (device dm-3) in btrfs_commit_transaction:2523: errno=-5 IO failure (Error while writing out transaction) BTRFS info (device dm-3 state E): forced readonly BTRFS warning (device dm-3 state E): Skipping commit of aborted transaction. BTRFS error (device dm-3 state EA): Transaction aborted (error -5) BTRFS: error (device dm-3 state EA) in cleanup_transaction:2017: errno=-5 IO failure BTRFS warning (device dm-3 state EA): failed setting block group ro: -5 BTRFS info (device dm-3 state EA): scrub: not finished on devid 1 with status: -5 [CAUSE] The root cause is inside btrfs_inc_block_group_ro(), where we need to hold a transaction handle, to prevent the transaction to be committed, until we hold ro_block_group_mutex. This will cause an empty transaction by itself, thus even if we can mark the block group read-only without any extra workload, we still need to commit the new and empty transaction. Unfortunately this means RO scrub on RW filesystem will always cause the fs to be updated. [FIX] The best fix is to make btrfs to avoid empty commit transaction, but even with that done, read-only scrub on rw mount can still cause real metadata updates (e.g. allocate new chunks and update device error statistics). It will be very complex to make read-only scrub to be fully read-only on a read-write btrfs. Thankfully read-only scrub on read-write mount with read-only device in the storage stack is pretty rare, thus a documentation update should be enough. Issue: #934 Pull-request: #935 Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>

adam900710 and others added 7 commits December 13, 2024 15:16

btrfs-progs: mkfs: add --compress option

140b568

Add an option --compress to mkfs.btrfs, to allow creating files using zlib when using --rootdir. Signed-off-by: Mark Harmstone <[email protected]>

btrfs-progs: mkfs: add zstd to --compress option

225ecd1

Allow --compress to work with zstd, when compiled in. Signed-off-by: Mark Harmstone <[email protected]>

btrfs-progs: mkfs: add lzo to --compress option

fe56662

Allow --compress to work with lzo. Signed-off-by: Mark Harmstone <[email protected]> [ Add extra handling when LZO support is not compiled in ] Signed-off-by: Qu Wenruo <[email protected]>

adam900710 force-pushed the devel branch 2 times, most recently from d740473 to 2eb8f9f Compare January 5, 2025 23:24

kdave force-pushed the devel branch from 6e64201 to 0268a8f Compare January 6, 2025 13:19

kdave approved these changes Jan 7, 2025

View reviewed changes

kdave force-pushed the devel branch from 600c14b to b449783 Compare January 17, 2025 20:59

kdave closed this Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

btrfs-progs: docs: extra notes about read-only scrub on read-write fs #935

btrfs-progs: docs: extra notes about read-only scrub on read-write fs #935

Uh oh!

adam900710 commented Dec 22, 2024

Uh oh!

kdave left a comment

Uh oh!

kdave Jan 7, 2025

Uh oh!

kdave Jan 7, 2025

Uh oh!

kdave commented Jan 28, 2025

Uh oh!

Uh oh!

btrfs-progs: docs: extra notes about read-only scrub on read-write fs #935

btrfs-progs: docs: extra notes about read-only scrub on read-write fs #935

Uh oh!

Conversation

adam900710 commented Dec 22, 2024

Uh oh!

kdave left a comment

Choose a reason for hiding this comment

Uh oh!

kdave Jan 7, 2025

Choose a reason for hiding this comment

Uh oh!

kdave Jan 7, 2025

Choose a reason for hiding this comment

Uh oh!

kdave commented Jan 28, 2025

Uh oh!

Uh oh!