-
Notifications
You must be signed in to change notification settings - Fork 268
Description
I'm in need of help on recovering my 20TB volume on a synology nas with btrfs.
Unfortunately, around 130mb of the start of the volume got nulled out by dd if=/dev/zero
I'm now trying to rescue the file system by running chunk-recover like
/volume3/btrfs/btrfs.static rescue chunk-recover -y -v /dev/dm-3 > /root/chunk-recover8.log 2>&1
however, it ALWAYS fails with
nohup: ignoring input
All Devices:
Device: id = 1, name = /dev/dm-3
Scanning: 6869080866816 in dev0scan chunk headers error
Chunk tree recovery aborte
this is the comprehensive bug report by claude code, I hope it includes everything relevant for you to see what is going on.
Any help GREATLY appreciated!
btrfs rescue chunk-recover aborts with "scan chunk headers error" on valid EXTENT_TREE metadata blocks
System Information
- btrfs-progs version: v6.17.1 (static build)
- Architecture: x86_64
- OS: Synology DSM (Linux-based NAS)
- nodesize: 16384 bytes
- sectorsize: 4096 bytes
- Filesystem features: FREE_SPACE_TREE, MIXED_BACKREF, BIG_METADATA, SKINNY_METADATA, NO_HOLES
- Filesystem size: 21,990,232,555,520 bytes (~22 TB)
- Bytes used: 14,610,736,689,152 bytes (~14.6 TB)
Description
btrfs rescue chunk-recover aborts mid-scan when encountering valid metadata blocks that belong to EXTENT_TREE (objectid=2). The tool appears to misinterpret these blocks as chunk tree data, fails to parse them, and aborts instead of skipping them.
This makes chunk-recover unusable for its intended purpose: recovering filesystems with corrupted chunk trees.
Background
My btrfs filesystem suffered corruption when approximately 128 MB at the beginning of the device was accidentally zeroed. This destroyed the chunk tree located near byte offset ~21,086,208. The RAID array, LVM, and all actual file data beyond the first 128 MB remain intact.
chunk-recover is the documented tool for exactly this scenario, but it cannot complete its scan.
Steps to Reproduce
- Have a btrfs filesystem with a corrupted/zeroed chunk tree
- Run:
btrfs rescue chunk-recover -v /dev/<device> - Observe scan progress until it reaches certain metadata blocks
- Tool aborts with "scan chunk headers error"
Observed Behavior
$ btrfs rescue chunk-recover -v /dev/dm-3
Scanning: 6869080866816 in dev0
scan chunk headers error
The scan consistently aborts at byte offset 6,869,080,866,816 (~6.25 TB). This is reproducible — 8+ attempts all fail at the identical offset.
Analysis of Block at Failure Offset
I extracted and analyzed the 16 KB metadata block at offset 6,869,080,866,816:
dd if=/dev/dm-3 bs=16384 skip=419255424 count=1 of=/tmp/problem_block.binParsed Header
| Field | Value |
|---|---|
| Stored checksum | 0xff073405 |
| Calculated checksum | 0xff073405 |
| Checksum valid | YES |
| fsid | 7e31261a-22a6-44d4-b5f9-2b14a9aae709 (matches filesystem) |
| bytenr | 6,869,080,866,816 (matches physical offset) |
| flags | 0x01 (WRITTEN) |
| generation | 9,182,501 |
| owner | 2 (EXTENT_TREE) |
| nritems | 121 |
| level | 0 (leaf) |
Key Finding
The block is completely valid. It has:
- Correct CRC32C checksum
- Correct filesystem UUID
- Self-consistent bytenr matching physical offset
- Valid btrfs header structure
- 121 extent tree items
The only issue is that it belongs to EXTENT_TREE (objectid=2), not CHUNK_TREE (objectid=3).
Expected Behavior
When scanning for chunk tree blocks, chunk-recover should:
- Read each potential metadata block
- Verify it has a valid btrfs header (checksum, fsid)
- Check if
owner == BTRFS_CHUNK_TREE_OBJECTID(3) - If not a chunk tree block → skip and continue scanning
- If it is a chunk tree block → parse and collect chunk data
Currently, step 3-4 appears to be missing or broken. The tool attempts to parse non-chunk blocks as chunk data, fails, and aborts the entire scan.
Suggested Fix
In the chunk scanning code, verify the tree owner before attempting to parse block contents:
/* After validating block header */
if (btrfs_header_owner(leaf) != BTRFS_CHUNK_TREE_OBJECTID)
continue; /* Skip non-chunk-tree blocks */Hex Dump of Problem Block
First 512 bytes of the 16 KB block at offset 6,869,080,866,816:
00000000: 0534 07ff 0000 0000 0000 0000 0000 0000 .4..............
00000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000020: 7e31 261a 22a6 44d4 b5f9 2b14 a9aa e709 ~1&.".D...+.....
00000030: 0000 a034 1f06 0000 0100 0000 0000 0001 ...4............
00000040: bafd 1258 f9d3 4823 a1f6 19d4 1d6e b7ae ...X..H#.....n..
00000050: 251d 8c00 0000 0000 0200 0000 0000 0000 %...............
00000060: 7900 0000 0000 10d4 b04b 0600 00a8 00c0 y........K......
00000070: 5700 0000 0000 663f 0000 3500 0000 00d0 W.....f?..5.....
00000080: 2bb1 4b06 0000 a800 a01c 0000 0000 0031 +.K............1
00000090: 3f00 0035 0000 0000 7048 b14b 0600 00a8 ?..5....pH.K....
000000a0: 0000 4000 0000 0000 fc3e 0000 3500 0000 ..@......>..5...
000000b0: 0070 88b1 4b06 0000 a800 2028 0000 0000 .p..K..... (....
000000c0: 00c7 3e00 0035 0000 0000 90b0 b14b 0600 ..>..5.......K..
000000d0: 00a8 00f0 2b00 0000 0000 923e 0000 3500 ....+......>..5.
000000e0: 0000 0080 dcb1 4b06 0000 a800 e02b 0000 ......K......+..
000000f0: 0000 005d 3e00 0035 0000 0000 6008 b24b ...]>..5....`..K
00000100: 0600 00a8 00a0 2b00 0000 0000 283e 0000 ......+.....(>..
00000110: 3500 0000 0000 34b2 4b06 0000 a800 306a 5.....4.K.....0j
00000120: 0000 0000 00f3 3d00 0035 0000 0000 309e ......=..5....0.
00000130: b24b 0600 00a8 00f0 7300 0000 0000 be3d .K......s......=
00000140: 0000 3500 0000 0020 12b3 4b06 0000 a800 ..5.... ..K.....
00000150: 6067 0000 0000 0089 3d00 0035 0000 0000 `g......=..5....
00000160: 8079 b34b 0600 00a8 0070 5c00 0000 0000 .y.K.....p\.....
00000170: 543d 0000 3500 0000 00f0 d5b3 4b06 0000 T=..5.......K...
00000180: a800 b078 0000 0000 001f 3d00 0035 0000 ...x......=..5..
00000190: 0000 a04e b44b 0600 00a8 0040 7400 0000 ...N.K.....@t...
000001a0: 0000 ea3c 0000 3500 0000 00e0 c2b4 4b06 ...<..5.......K.
000001b0: 0000 a800 4054 0000 0000 00b5 3c00 0035 ....@T......<..5
000001c0: 0000 0000 2017 b54b 0600 00a8 0090 1800 .... ..K........
000001d0: 0000 0000 803c 0000 3500 0000 00b0 2fb5 .....<..5...../.
000001e0: 4b06 0000 a800 b057 0000 0000 004b 3c00 K......W.....K<.
000001f0: 0035 0000 0000 6087 b54b 0600 00a8 0020 .5....`..K.....
Impact
This bug makes chunk-recover unable to complete its scan on filesystems where:
- The chunk tree is corrupted/missing (exactly the scenario it's designed for)
- Valid EXTENT_TREE or other metadata blocks exist in the scan path
The tool aborts instead of recovering the filesystem.
Additional Information Available
- Full
dump-superoutput - Complete 16 KB block dump
- Python checksum verification scripts
- Any other diagnostics upon request
References
- Filesystem UUID:
7e31261a-22a6-44d4-b5f9-2b14a9aae709 - Failure offset: 6,869,080,866,816 bytes (block 419,255,424 at 16 KB nodesize)
- btrfs-progs repo: https://github.com/kdave/btrfs-progs
thanks a lot in advance to anyone spending just a second looking into this! I have over 10y of family photos on that volume :(