Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: btrfs replace start --new-fsid #961

Closed
Winterhuman opened this issue Feb 23, 2025 · 11 comments
Closed

Feature request: btrfs replace start --new-fsid #961

Winterhuman opened this issue Feb 23, 2025 · 11 comments

Comments

@Winterhuman
Copy link

Winterhuman commented Feb 23, 2025

Version: v6.13

I learnt of this interesting feature, where a backing device can be transparently replaced with another, using the following commands:

$ truncate -s 4G /tmp/somedevice
$ losetup -f /tmp/somedevice /dev/loop0
$ btrfs replace start /path/to/current/device /dev/loop0 /

However, if /dev/loop0 is actually a real BTRFS partition with identical content, and --force is passed so it doesn't reject the operation, that'd still mean it would overwrite the target despite it being identical.

Proposal

See #961 (comment) for the new proposal and the reasoning.

Old proposal

I propose adding something like an --if-match option (probably a better name available for this), which checks if the src and target have identical tree hashes, and transparently replaces src with the target if the check succeeds. If src and target aren't identical, or aren't mounted read-only as an extra safety measure, the command cleanly fails and no writing ever occurs (#933 is relevant here).

My idea for this feature would be for cleanly transitioning from an initramfs/recovery USI to a real root, where both are intended to have identical contents, but, where you don't want to keep the initramfs in memory (hence the backing device transfer).

@Winterhuman Winterhuman changed the title btrfs replace, but without writing Feature request: btrfs replace, but without writing Feb 23, 2025
@adam900710
Copy link
Collaborator

I believe it is a must-have safety check, not an optional --if-match.
We should just reject such operation if the target device has the same fsid, as it can screw up the fsid and device lists.

For your idea of transacting from an in-memory fs to a full btrfs, I'm wondering why both the initramfs and the new root would even have the same fsid.
Shouldn't the new rootfs to empty or some different fses (contents can match, but never the same fsid)?

@Winterhuman
Copy link
Author

Yeah, rejecting on matching FSID is definitely safer. I'm also not sure why the initramfs and the new root would have the same fsid either, is "tree hash" actually the same as fsid, or am I misunderstanding where I had implied that?

@Winterhuman
Copy link
Author

Here's a better explanation of what I'm intending:

  1. You start by booting a UKI, which contains the full root filesystem (it's not a normal, minimal initramfs).
  2. From the UKI, it then calls btrfs replace:
    1. If the on-disk root doesn't exist yet, btrfs replace is called normally like in the example from the issue (but going in reverse, since we're coming from memory), and then the backing device is transparently transferred to the on-disk filesystem. It's a seamless transition from memory to disk.
    2. Or, if the on-disk root was already created on a previous boot, the backing device is seemlessly transferred, but only if the UKI and the on-disk root have identical data (since otherwise, it means the on-disk root has been modified with content that didn't come from the UKI). If they don't have identical content btrfs replace fails, and the UKI would then trigger its own logic for how to handle this scenario (e.g. deleting the on-disk FS).

Does this make more sense? Sorry for the confusion initially

@adam900710
Copy link
Collaborator

The fsid is the btrfs internal name, it's exported as UUID, which can be verified by lsblk -f.

I begin to understand the case ii), it can indeed cause problems like this:

  • The UKI boots with an empty on-disk root

  • The UKI migrated the fs to the on-disk root
    Now the on-disk root has the btrfs with the UUID from the UKI.

  • The same UKI boot again
    Now we have the UKI in-memory root and the on-disk root, both having the same UUID.
    (And btrfs can be confused with that case)

In this particular case, I guess btrfs seed device may be a better solution, as the sprout process (adding a writable device) will generate a new fsid, so that every new fs will not have the same UUID, thus no conflicts.

For case 1) I still do not understand why we need to use the same backing file.

Can't we create an empty file on the new rootfs, and do the replace?

@Winterhuman
Copy link
Author

Winterhuman commented Feb 24, 2025

Oh, btrfs replace mirrors the fsid, gotcha. Seeding devices is also something I hadn't considered before, however, that'd still leave the seeding device from the UKI in memory (will remember it though).

I'm still not sure what you mean by "create an empty file on the new rootfs"; the idea is that the rootfs on-disk is only written once, and btrfs replace is simply being used so the backing device of the OS is transparently moved over without a switch-root involved.

Perhaps a new command like btrfs transfer-backing is a better way to describe what I'm attempting, since I'm intending the src & target to be identical except for their fsids.

@adam900710
Copy link
Collaborator

that'd still leave the seeding device from the UKI in memory (will remember it though).

You can just remove the seed device from the sprout fs, then it will just be come a fully independent fs.
(And since the seed device is always read-only, after removal it stays what it was, and can be reused for the next sprout)

I'm still not sure what you mean by "create an empty file on the new rootfs"

Sorry, I didn't understand the original case i), you mentioned "btrfs replace is called normally like in the example from the issue (but going in reverse, since we're coming from memory)".

Do you mean just call btrfs replace start <memory dev> <rootfs dev> / ?
If so, it looks like I misunderstood the original case. I thought you mkfs a new rootfs, create a file and setup loopback device, then do the replace.

And in that case it will work as expected, although with the mentioned conflicting fsid problem.

@Winterhuman
Copy link
Author

Winterhuman commented Feb 24, 2025

You can just remove the seed device from the sprout fs, then it will just be come a fully independent fs.

Oh, it'd actually sync the on-disk FS when the original seed device is removed? That could work then!

Do you mean just call btrfs replace start / ?

Yep!

And in that case it will work as expected, although with the mentioned conflicting fsid problem.

Perhaps the new option I'm looking for would be --new-fsid instead? Unless that's impossible with how transparent backing device replacement works currently.

I think I actually can work with btrfs replace always overwriting, since it seems to only sync the difference between the src & target, and doesn't overwrite the entire device like I thought it did.

@Winterhuman Winterhuman changed the title Feature request: btrfs replace, but without writing Feature request: btrfs replace start --new-fsid Feb 24, 2025
@adam900710
Copy link
Collaborator

Although we support change the fsid, all the supported solution is off-line based.

Meaning the fsid can only be changed with the fs unmounted, thus not suitable for seamless migration.
(btrfstune -m and btrsftune -u)

Thus the only online migration that can generate a new fsid is seed device, at least for now.

@Winterhuman
Copy link
Author

Can't the fsid change happen after the target is generated, but before it's added to the src's device list? Or is generation something that happens while the target is in the device list, in which case, can the device be temporarily removed from the device list to make it offline, and then re-added afterwards before the transition?

@adam900710
Copy link
Collaborator

Fsid change is an offline operation. And we haven't yet take conflicting fsids into consideration.
So I won't recommend to go that path.

The devices can be taken out of the device list, but that's only for kernel btrfs module, not for offline tools like btrfstune.
Thus it's still a bad idea for now.

Overall seed device is still the best way, and IIRC Oracle and Meta is using this method to generate their container/VM images already.

@Winterhuman
Copy link
Author

Winterhuman commented Feb 24, 2025

I managed to replace the btrfs replace start method with Seeding devices, although, the fsid does seem to seamlessly change from /dev/loop0's to /dev/loop1's after btrfs device add despite you saying it shouldn't be possible (EDIT: Figured it out, see second paragraph).

$ truncate -s 1G test{,2}.img
$ mkfs.btrfs test.img
$ losetup /dev/loop0 test.img
$ losetup /dev/loop1 test2.img
$ btrfstune -S 1 /dev/loop0
$ mount /dev/loop0 /mnt
$ btrfs filesystem show /mnt
Label: none  uuid: 4364d596-b462-4db6-980e-cd0eabebb3d1
	Total devices 1 FS bytes used 144.00KiB
	devid    1 size 1.00GiB used 110.38MiB path /dev/loop0
$ btrfs device add /dev/loop1 /mnt
$ btrfs filesystem show /mnt
Label: none  uuid: 4eaa7663-236f-4b6b-b09e-a4cb85301565
	Total devices 2 FS bytes used 144.00KiB
	devid    1 size 1.00GiB used 110.38MiB path /dev/loop0
	devid    2 size 1.00GiB used 288.00MiB path /dev/loop1
$ mount -o remount,rw /mnt
$ btrfs device remove /dev/loop0 /mnt
$ btrfs filesystem show /mnt
Label: none  uuid: 4eaa7663-236f-4b6b-b09e-a4cb85301565
	Total devices 1 FS bytes used 144.00KiB
	devid    2 size 1.00GiB used 400.00MiB path /dev/loop1
$ btrfs filesystem show /dev/loop0
Label: none  uuid: 4364d596-b462-4db6-980e-cd0eabebb3d1
	Total devices 1 FS bytes used 144.00KiB
	devid    1 size 1.00GiB used 126.38MiB path /dev/loop0

It'd be interesting to know how fsid changing works in this case, where it can't work with btrfs replace start, but, I'm happy this is already functioning as intended. I realise what you mean now, the fsid already starts at a different value when using Seeding devices, while replace fully duplicates the device. Closing issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants