Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Files shared from ZFS are mysteriously marked "sparse", breaking Windows VHD files. #316

Open
DanaGoyette opened this issue May 13, 2024 · 13 comments

Comments

@DanaGoyette
Copy link

I was having a miserable time trying to get the leftover system image backup tool in Windows 11 to work on a ksmbd-shared ZFS directory. It would repeatedly fail with various cryptic error messages, despite
tcpdump saying that it was creating the vhdx file just fine.

The same backup job works just fine with the old Samba.
 
When one result for one of the error codes said to go create a VHD file and set Windows to back up to that instead, I tried to create one on my ZFS share:

Start -> type "disks" -> Manage Disks and Volumes -> fill out with a share as the location -> click Create.

And voila, I finally got a useful error message!

There was an error creating the virtual disk: Error Code: 0xC03A001A The requested operation could not be completed due to a virtual disk system limitation.  Virtual hard disk files must be uncompressed and unencrypted and must not be sparse. 

Aha, there's the problem!

Double-checking with tcpdump, it looks like Windows is requesting to create the file with file attributes 0x0, and getting back a file with file attributes 0x220 (requires archived: yes, sparse: yes).

ksmbd-zfs-why-sparse.pcapng.gz

I already tried setting share:fake_fscaps = 0, but that didn't change anything.

I'm using Proxmox (Debian 12), but I had to build a newer ksmbd-tools because the one included in Debian showed zero files in any of my shares.

@mmakassikis
Copy link

Did you set share:fake_fscaps = 0 as a global parameter ?

I recall testing Windows10 backup tool to ksmbd, and something like this should do the trick:

[global]
        workgroup = WORKGROUP
        server string = Test_Server
        guest user = nobody
        netbios name = Test_Server
        follow symlinks = no
        map to guest = never
        share:fake_fscaps = 0

[testshare]
        comment = test share
        path = /media/share
        writeable = yes

It was on ext4 though. Can you test on ext4 to make sure it's not some weird interaction with ZFS that is causing issues ?

@DanaGoyette
Copy link
Author

I already had share:fake_fscaps = 0 set, that didn't change anything.

I tried creating a zvol for that share, and formatting the zvol ext4, and that allows Windows to create the VHDX file without it getting marked as sparse.

The Windows Image Backup (Backup and Restore (Windows 7)) still fails. I'll have to check if it fails with plain Samba too.

ksmbd-ext4-zvol-not-sparse.pcapng.gz
ksmbd-ext4-zvol-windows-backup-fail.pcapng.gz

@DanaGoyette
Copy link
Author

DanaGoyette commented May 13, 2024

I tried setting that location as backup target again, and when it went to check the location for usability, it gave a much shorter pcap, just a session setup request.

ksmbd-ext4-zvol-session-setup-fail.pcapng.gz

The specified network location cannot be used
Verify the path points to a correct network location and that the supplied credentials can be used for write access to the folder.
The request is not supported. (0x80070032)

Here's the current config.

ksmbd.conf.txt

It also now gives a kernel oops:

[  219.036820] BUG: kernel NULL pointer dereference, address: 0000000000000042
[  219.044760] #PF: supervisor read access in kernel mode
[  219.050271] #PF: error_code(0x0000) - not-present page
[  219.055777] PGD 0 P4D 0
[  219.058672] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  219.063403] CPU: 15 PID: 163 Comm: kworker/15:1 Tainted: P           OE      6.8.4-3-pve #1
[  219.072484] Hardware name: To Be Filled By O.E.M. X570D4U-2L2T/X570D4U-2L2T, BIOS L1.76 09/18/2023
[  219.082202] Workqueue: ksmbd-io handle_ksmbd_work [ksmbd]
[  219.087998] RIP: 0010:smb_lazy_parent_lease_break_close+0x21/0x140 [ksmbd]
[  219.095627] Code: 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 48 89 fb e8 b8 83 51 ec 4c 8b 6b 30 e8 2f c0 51 ec <45> 0f b6 65 42 41 80 fc 01 0f 87 75 1a 02 00 41 83 e4 01 74 0a 49
[  219.115896] RSP: 0018:ffffb2ce4086fc58 EFLAGS: 00010246
[  219.121497] RAX: 0000000000000000 RBX: ffff97eb74964780 RCX: 0000000000000000
[  219.129365] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  219.137234] RBP: ffffb2ce4086fc78 R08: 0000000000000000 R09: 0000000000000000
[  219.145104] R10: 0000000000000000 R11: 0000000000000000 R12: ffff97eb74965680
[  219.152974] R13: 0000000000000000 R14: ffff97eb74964780 R15: ffff97eb74964780
[  219.160870] FS:  0000000000000000(0000) GS:ffff97f54c580000(0000) knlGS:0000000000000000
[  219.169733] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  219.175880] CR2: 0000000000000042 CR3: 0000000110294000 CR4: 0000000000350ef0
[  219.183784] Call Trace:
[  219.186611]  <TASK>
[  219.189079]  ? show_regs+0x6d/0x80
[  219.192842]  ? __die+0x24/0x80
[  219.196246]  ? page_fault_oops+0x176/0x500
[  219.200686]  ? do_user_addr_fault+0x2f9/0x6b0
[  219.205383]  ? exc_page_fault+0x83/0x1b0
[  219.209638]  ? asm_exc_page_fault+0x27/0x30
[  219.214145]  ? smb_lazy_parent_lease_break_close+0x21/0x140 [ksmbd]
[  219.221061]  ? smb_lazy_parent_lease_break_close+0x21/0x140 [ksmbd]
[  219.227975]  close_id_del_oplock+0x115/0x170 [ksmbd]
[  219.233281]  __ksmbd_close_fd+0xc1/0x2c0 [ksmbd]
[  219.238228]  ksmbd_close_fd+0xaf/0xd0 [ksmbd]
[  219.242990]  smb2_close+0x1a8/0x3e0 [ksmbd]
[  219.247497]  handle_ksmbd_work+0x16e/0x4d0 [ksmbd]
[  219.252600]  process_one_work+0x16d/0x350
[  219.256905]  worker_thread+0x306/0x440
[  219.260943]  ? __pfx_worker_thread+0x10/0x10
[  219.265501]  kthread+0xf2/0x120
[  219.268918]  ? __pfx_kthread+0x10/0x10
[  219.272940]  ret_from_fork+0x47/0x70
[  219.276785]  ? __pfx_kthread+0x10/0x10
[  219.280800]  ret_from_fork_asm+0x1b/0x30
[  219.284996]  </TASK>
[  219.287455] Modules linked in: xt_mark cmac nls_utf8 ebtable_filter ebtables ip_set ip6table_raw iptable_raw xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xt_addrtype scsi_transport_iscsi rdma_ucm ib_uverbs nvme_fabrics overlay qrtr nf_tables ip6table_nat ip6table_filter ip6_tables iptable_nat iptable_filter sp5100_tco binfmt_misc bonding tls openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nfnetlink_log nfnetlink joydev input_leds ipmi_ssif intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd kvm crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd cryptd acpi_ipmi rapl wmi_bmof ccp k10temp ipmi_si ipmi_devintf ryzen_smu(OE) ipmi_msghandler mac_hid vhost_net vhost vhost_iotlb tap nct6775 nct6775_core hwmon_vid vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd ksmbd crc32_generic cifs_arc4 nls_ucs2_utils rdma_cm msr iw_cm parport_pc ppdev ib_cm lp parport ib_core
[  219.287553]  efi_pstore nfsd auth_rpcgss nfs_acl lockd grace sunrpc dmi_sysfs ip_tables x_tables autofs4 zfs(PO) spl(O) btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 hid_generic usbkbd usbmouse usbhid hid nvme xhci_pci xhci_pci_renesas ixgbe nvme_core igb ahci xfrm_algo crc32_pclmul xhci_hcd ast i2c_piix4 nvme_auth libahci dca mdio i2c_algo_bit wmi
[  219.423410] CR2: 0000000000000042
[  219.427122] ---[ end trace 0000000000000000 ]---
[  221.634145] clocksource: Long readout interval, skipping watchdog check: cs_nsec: 2123381463 wd_nsec: 2123379647
[  221.721077] RIP: 0010:smb_lazy_parent_lease_break_close+0x21/0x140 [ksmbd]
[  221.728858] Code: 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 48 89 fb e8 b8 83 51 ec 4c 8b 6b 30 e8 2f c0 51 ec <45> 0f b6 65 42 41 80 fc 01 0f 87 75 1a 02 00 41 83 e4 01 74 0a 49
[  221.749258] RSP: 0018:ffffb2ce4086fc58 EFLAGS: 00010246
[  221.754883] RAX: 0000000000000000 RBX: ffff97eb74964780 RCX: 0000000000000000
[  221.762795] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  221.770709] RBP: ffffb2ce4086fc78 R08: 0000000000000000 R09: 0000000000000000
[  221.778620] R10: 0000000000000000 R11: 0000000000000000 R12: ffff97eb74965680
[  221.786533] R13: 0000000000000000 R14: ffff97eb74964780 R15: ffff97eb74964780
[  221.794447] FS:  0000000000000000(0000) GS:ffff97f54c580000(0000) knlGS:0000000000000000
[  221.803313] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  221.809458] CR2: 0000000000000042 CR3: 0000000110294000 CR4: 0000000000350ef0
[  221.817371] note: kworker/15:1[163] exited with irqs disabled

@namjaejeon
Copy link
Member

@DanaGoyette Can you check your ksmbd have this patch ?
cifsd-team/ksmbd@722c824

@DanaGoyette
Copy link
Author

@DanaGoyette Can you check your ksmbd have this patch ? cifsd-team/ksmbd@722c824

If I'm guessing correctly about how to check, maybe Proxmox doesn't have that change in yet. See their changelog

I'm not sure how to rebuild Proxmox's kernel, so I may have to just wait for them to pick up that null pointer dereference.
Is it useful to investigate that "sparse" flag separately before that?

@namjaejeon
Copy link
Member

Is it useful to investigate that "sparse" flag separately before that?

Maybe, yes. It seems like they are different issues. Let me check your dumps first.

@namjaejeon
Copy link
Member

The Windows Image Backup (Backup and Restore (Windows 7)) still fails. I'll have to check if it fails with plain Samba too.

Have you checked it with samba ?

@DanaGoyette
Copy link
Author

With plan Samba, both the VHD creation tool and the Windows Image Backup work with the ext4 zvol, as well as with a normal ZFS filesystem. I'll have to grab a pcap of those sometime.

@namjaejeon
Copy link
Member

@DanaGoyette Thanks for your check! Could you please help me reproduce this issue to check it at detail ?

  1. VHD creation tool : Could you introduce what tool I use ?
  2. Windows Image Backup : How do I execute image backup on windows ? Actually, I don't know if it was the same thing last night, but I ran a backup to ksmbd share in Windows and it seemed to take a very long time, so I stopped in the middle. Does it actually take a long time?

@DanaGoyette
Copy link
Author

@DanaGoyette Thanks for your check! Could you please help me reproduce this issue to check it at detail ?

  1. VHD creation tool : Could you introduce what tool I use ?

It's built into Windows:
https://pureinfotech.com/create-virtual-drive-vhd-vhdx-dev-windows-11/

  1. Windows Image Backup : How do I execute image backup on windows ? Actually, I don't know if it was the same thing last night, but I ran a backup to ksmbd share in Windows and it seemed to take a very long time, so I stopped in the middle. Does it actually take a long time?

The system image backup does tend to take a long time, depending on source and target and network speed, and its progress bar is rather useless. I usually leave only the system image enabled, that is, only the bottommost checkbox, not any of the libraries.

https://www.ionos.ca/digitalguide/server/configuration/windows-11-backup/

The VHD creation tool will be much easier to reproduce with, though... it fails quickly, rather than much later as with the system image backup.

@DanaGoyette
Copy link
Author

I finally got around to buying a second RDMA-supporting NIC, so I'd like to try it out, but what I can find online seems to say that RDMA requires ksmbd, not userspace smbd.

I'd like to try ksmbd again, but are there any updates to this issue? If nothing has changed, I won't bother swapping NICs around yet.

@namjaejeon
Copy link
Member

@DanaGoyette Can you try to check it with ext4 or xfs?, not zfs

@namjaejeon
Copy link
Member

ZFS violates the kernels licene. And zfs is not in linux kernel mainline, This is completely unsupported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants