Skip to content

zfsvfs_hold() leaks VFS s_active reference when z_unmounted is B_TRUE, causing permanent EBUSY on pool export #18309

@mischivus

Description

@mischivus

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 24.04.4 LTS (Noble Numbat)
Kernel Version 6.17.0-14-generic (HWE)
Architecture x86_64
OpenZFS Version 2.3.4-1ubuntu2 (kernel module), 2.2.2-0ubuntu9.4 (userspace)

Describe the problem you're observing

zpool export fails permanently with EBUSY after ZFS ioctls race with dataset unmount operations. No userspace process, mount, file descriptor, or kernel keyring holds the pool open. The only recovery is a full system reboot.

Root cause: zfsvfs_hold() in module/zfs/zfs_ioctl.c (line ~1435) leaks a VFS s_active reference. When getzfsvfs() succeeds (incrementing s_active via zfs_vfs_ref() -> atomic_inc_not_zero), but z_unmounted is subsequently found to be B_TRUE, the function returns EBUSY without calling zfs_vfs_rele(). The companion function zfsvfs_rele() (immediately below in the same file) correctly releases both the teardown lock AND the VFS reference -- the error path only releases the teardown lock.

The leaked s_active permanently prevents generic_shutdown_super() -> zpl_kill_sb() -> dmu_objset_disown() -> spa_close(), keeping spa_refcount above spa_minref.

Reachability: z_unmounted is set to B_TRUE at three sites in module/os/linux/zfs/zfs_vfsops.c (lines ~1410, ~1920, ~1967), in functions zfsvfs_teardown(), zfs_resume_fs(), and zfs_end_fs(). All three execute from ZFS ioctl handlers that hold an active VFS reference via getzfsvfs(), guaranteeing s_active > 0 when the flag is set. Critically, zfs_resume_fs() releases the teardown write lock at its bail: label BEFORE dmu_objset_disown() runs, creating a window where a concurrent ioctl can acquire the read lock, observe z_unmounted == B_TRUE, and leak s_active.

Evidence from live system:

  • Custom kernel module read s_active = 2 on the ZFS superblock with zero mounts across all 14 mount namespaces
  • deactivate_super() fires during export attempt; generic_shutdown_super() NEVER fires
  • /proc/spl/kstat/zfs/rpool/objset-0x105 persists (dataset alive in kernel despite zero mounts)
  • Bug state persisted 20+ hours without self-resolving
  • 16 diagnostic commands confirmed zero userspace holders (lsof, fuser, mountinfo, keyrings, systemd units, ARC flush, drop_caches -- all negative)

Proposed fix (PR submitted): Add the missing zfs_vfs_rele() call, guarded by zfs_vfs_held() to handle the zfsvfs_create() fallback path. This matches the existing pattern in zfsvfs_rele().

Describe how to reproduce the problem

The bug requires two concurrent ZFS ioctls racing with dataset unmount:

  1. IOCTL-A (e.g., zfs_ioc_recv) acquires s_active via getzfsvfs(), then internally calls zfs_resume_fs() which fails, setting z_unmounted = B_TRUE and releasing the teardown write lock
  2. IOCTL-B (e.g., zfs_ioc_objset_stats via zfs get) calls zfsvfs_hold() -> getzfsvfs() during the window after the write lock release but before dmu_objset_disown() clears os_user_ptr
  3. IOCTL-B observes z_unmounted == B_TRUE, returns EBUSY without zfs_vfs_rele()
  4. zpool export fails permanently with EBUSY

Trigger scenario: ZFS-on-root deployment via debootstrap + chroot with mount-over operations, bind mounts, and teardown running zpool set cachefile=none and zpool export -f while lazy unmount is in progress.

Reproducer script: A multi-threaded C program spawning concurrent mount/unmount/ioctl threads is attached. The race window is narrow (nanoseconds); reproduction is intermittent. Live system evidence (s_active=2 persisting 20+ hours) serves as primary proof.

repro_zfsvfs_hold.sh

tight_loop.c

Include any warning/errors/backtraces from the system logs

# zpool export attempt
$ sudo zpool export -f rpool
cannot export 'rpool': pool is busy

# Kernel module proof of leaked s_active
$ sudo insmod /tmp/sb_probe/sb_probe.ko && sudo dmesg | grep sb_probe && sudo rmmod sb_probe
[75229.126131] sb_probe: ZFS sb=ffff8e904d47c000  s_active=2  s_id="zfs"  s_flags=0x60010000

# Zero mounts across all namespaces
$ cat /proc/*/mountinfo 2>/dev/null | grep rpool | wc -l
0

# Mount namespace count
$ ls /proc/*/ns/mnt 2>/dev/null | wc -l
14

# Dataset alive in kernel despite zero mounts
$ cat /proc/spl/kstat/zfs/rpool/objset-0x105 | head -3
197 1 0x01 27 7600 123374565221 13205739627447
name                            type data
dataset_name                    7    rpool/ROOT/ubuntu

# ZFS module refcount stuck at 1
$ lsmod | grep zfs
zfs                  6823936  1

# spl_delay_taskq counter climbing (12 dispatches/min, all cancelled)
$ awk '/tasks_dispatched_delayed/ {print $3}' /proc/spl/kstat/taskq/spl_delay_taskq.0
2950
# 60 seconds later:
$ awk '/tasks_dispatched_delayed/ {print $3}' /proc/spl/kstat/taskq/spl_delay_taskq.0
2962

# bpftrace: deactivate_super fires, generic_shutdown_super never fires during export
# (kprobe_events missing on this kernel config, attach-only trace)
# Result: deactivate_super called 15+ times during export, generic_shutdown_super: 0 times

# Exhaustive elimination of userspace holders (all empty/negative):
$ zfs mount                                          # (empty)
$ lsof /dev/zfs                                      # (empty)
$ fuser -v /dev/nvme0n1p4                            # (empty)
$ ps aux | grep "[z]fs\|[z]ed"                       # (empty)
$ cat /proc/keys | grep zfs                          # (empty)
$ systemctl list-units --type=mount | grep zfs       # (empty)
$ find /proc/*/fd -lname "*rpool*" 2>/dev/null       # (empty)
$ ls -la /proc/*/root 2>/dev/null | grep optane      # (empty)
$ echo 3 | sudo tee /proc/sys/vm/drop_caches         # no effect on s_active
$ zpool get freeing,leaked rpool                     # 0, 0
$ zfs list -t snapshot -r rpool                      # (empty)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions