Skip to content

btrfs replace on RAID1 does not copy all data when replacing drive causing potential data loss #1077

@rudis

Description

@rudis

Hi,

during evaluation of btrfs I noticed that the following scenario causes a broken file system:

  • create RAID1 on two drives disk1 and disk2
  • write data
  • remove disk2, replace with new with btrfs replace and wait for completion
  • remove disk1
  • mounting fails

Running btrfs balance after removing disk2 prevents the issue. But this is not (clearly) documented in man btrfs-replace which states "On a live filesystem, duplicate the data to the target device which is currently stored on the source device."

Tested on Debian Trixie running Linux Kernel 6.12.63 and btrfs-progs 6.14-1.

Is this the intended behavior? If so the documentation should be updated (I can provide a pull request). If not this should be fixed.

The following script reproduces this issue (using loop devices, but this was also reproduced on real hardware):

set -eu

cd /tmp
mkdir -p mnt

# Create two 10GiB disks
rm -f disk0; truncate -s 10G disk0
rm -f disk1; truncate -s 10G disk1
# Btrfs needs to see both devices when mounting
losetup /dev/loop0 disk0
losetup /dev/loop1 disk1

# Initialize btrfs RAID1 and create a file with random data
mkfs.btrfs --data raid1 --metadata raid1 /dev/loop0 /dev/loop1
mount /dev/loop0 mnt
dd if=/dev/urandom bs=1G count=1 > mnt/data
sha512sum mnt/data > mnt/data.sha512sum
umount mnt

# Destroy data on second disk
rm -f disk1; truncate -s 10G disk1
losetup -d /dev/loop1; losetup /dev/loop1 disk1
# Not necessary, but just to make clear it's not a cache issue
echo 3 > /proc/sys/vm/drop_caches

# Replace second disk (-B waits until replace is complete)
mount -o degraded /dev/loop0 mnt
btrfs replace start -B 2 /dev/loop1 mnt
# btrfs filesystem usage -T mnt
# btrfs balance start -dconvert=raid1,soft -mconvert=raid1,soft mnt
# echo write > mnt/test
umount mnt

# Destroy data on first disk
rm -f disk0; truncate -s 10G disk0
losetup -d /dev/loop0; losetup /dev/loop0 disk0
# Not necessary
echo 3 > /proc/sys/vm/drop_caches

# Attempt to mount
mount -o degraded /dev/loop1 mnt
sha512sum -c mnt/data.sha512sum

# Cleanup
umount mnt
losetup -d /dev/loop0; rm disk0
losetup -d /dev/loop1; rm disk1

The mount fails with:

mount: /tmp/mnt: can't read superblock on /dev/loop1.
       dmesg(1) may have more information after failed mount system call.

BTRFS info (device loop1): first mount of filesystem 5f1f3583-8c44-4671-9831-02bd8ff743e1
BTRFS info (device loop1): using crc32c (crc32c-intel) checksum algorithm
BTRFS warning (device loop1): devid 1 uuid 2b43640d-819f-42d2-9559-e8e284b84be1 is missing
BTRFS error (device loop1): failed to read chunk root
BTRFS error (device loop1): open_ctree failed: -5

Writing data before the unmount changes the result:

mount: /tmp/mnt: wrong fs type, bad option, bad superblock on /dev/loop1, missing codepage or helper program, or other error.
       dmesg(1) may have more information after failed mount system call.

BTRFS warning (device loop1): devid 1 uuid b015b488-4a96-4c5a-81f2-cbec50dfe6ca is missing
BTRFS warning (device loop1): chunk 1372585984 missing 1 devices, max tolerance is 0 for writable mount
BTRFS warning (device loop1): writable mount is not allowed due to too many missing devices
BTRFS error (device loop1): open_ctree failed: -22

Mounting with -o degraded,ro works but an attempt to replace the missing disk fails with ERROR: ioctl(DEV_REPLACE_START) failed on "mnt": Read-only file system.

btrfs filesystem usage -T before the unmount shows the problem:

Overall:
    Device size:                  20.00GiB
    Device allocated:              3.80GiB
    Device unallocated:           16.20GiB
    Device missing:                  0.00B
    Device slack:                    0.00B
    Used:                          2.00GiB
    Free (estimated):             11.47GiB      (min: 8.77GiB)
    Free (statfs, df):             7.96GiB
    Data ratio:                       1.50
    Metadata ratio:                   1.50
    Global reserve:                5.50MiB      (used: 0.00B)
    Multiple profiles:                 yes      (data, metadata, system)

              Data    Data    Metadata  Metadata  System   System
Id Path       single  RAID1   single    RAID1     single   RAID1   Unallocated Total    Slack
-- ---------- ------- ------- --------- --------- -------- ------- ----------- -------- -----
 1 /dev/loop0 1.00GiB 1.00GiB 256.00MiB 256.00MiB 32.00MiB 8.00MiB     7.46GiB 10.00GiB     -
 2 /dev/loop1       - 1.00GiB         - 256.00MiB        - 8.00MiB     8.74GiB 10.00GiB     -
-- ---------- ------- ------- --------- --------- -------- ------- ----------- -------- -----
   Total      1.00GiB 1.00GiB 256.00MiB 256.00MiB 32.00MiB 8.00MiB    16.20GiB 20.00GiB 0.00B
   Used         0.00B 1.00GiB     0.00B   1.14MiB 16.00KiB   0.00B

Best,
Simon

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions