Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to btrfs replace to remove unrecoverable files instead of aborting #932

Open
benpicco opened this issue Dec 19, 2024 · 2 comments

Comments

@benpicco
Copy link

When a btrfs replace is unable the recover a file on a raid56 array, it will just abort:

[167573.709048] BTRFS error (device sdf): unrepaired sectors detected, full stripe 49505622556672 data stripe 2 errors 8-15
[167573.847875] BTRFS error (device sdf): btrfs_scrub_dev(/dev/sdg, 2, /dev/sdd) failed -5

There is a formula to translate those magic numbers back to a file, for convenience I moved that to a shell script:

MNT="/mnt/data"

# unrepaired sectors detected, full stripe 49505622556672 data stripe 2 errors 8-15
#                                                 |                   |        |  |
#                                                $1                  $2       $3 $4

STRIPE=$1
INDEX=$2
E_START=$3
E_END=$4

sudo btrfs inspect-internal logical-resolve -o $(($STRIPE + $INDEX * 65536 + $E_START * 4096)) $MNT
sudo btrfs inspect-internal logical-resolve -o $(($STRIPE + $INDEX * 65536 + $E_END * 4096)) $MNT

(assuming 4k sectors and 64k stripes)

Now instead of barfing fs internals at the user and having them figure out what that do with that information and re-start the replace job from the beginning (until the next unrecoverable error is found), it would be much better if the filesystem could automatically remove those unrecoverable files and continue with the device replace.

@KodeToad
Copy link

KodeToad commented Feb 18, 2025

Could this be an option for check --repair too?
After all, if your fs is toast, this couldn't make it any worse and is likely to reduce manual errors and reduce noise for support

@Zygo
Copy link

Zygo commented Feb 18, 2025

This is due to a raid56 kernel bug that should just be fixed.

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 531312efee8d..0781885c87ac 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -2058,7 +2058,7 @@ static int scrub_raid56_parity_stripe(struct scrub_ctx *sctx,
 "unrepaired sectors detected, full stripe %llu data stripe %u errors %*pbl",
                                  full_stripe_start, i, stripe->nr_sectors,
                                  &error);
-                       ret = -EIO;
+                       ret = 0;
                        goto out;
                }
                bitmap_or(&extent_bitmap, &extent_bitmap,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants