Replies: 68 comments 215 replies
-
|
Found something interesting in a proposed patch in a discussion whose topic was "[PATCH] nvme-pci: fix host memory buffer allocation size" dating of may 10th 2022. The starting point of the discussion start here => https://www.spinics.net/lists/kernel/msg4339024.html At some point (https://www.spinics.net/lists/kernel/msg4352567.html), it is mentioned that:
Also in a subsequent message ( https://www.spinics.net/lists/kernel/msg4372632.html ) it is also mentioned that the situation has improved drastically with the patch. And another point of the discussion about having the Host Memory Buffer of just 32MB. According to my logs, I have the same allocation: For the record, here is excerpts of some messages:
Current parameters for the nvme kernel modules on my system are on their defaults: Going though the code of The patch in question is mentioned at the very beginning of the discussion and is this one: Another related thread is here => https://lore.kernel.org/linux-nvme/f94565db-f217-4a56-83c3-c6429807185c@t-8ch.de/
|
Beta Was this translation helpful? Give feedback.
-
|
Above patch tried, but in my case, worsens the issue :( The crash happens much more earlier than before. |
Beta Was this translation helpful? Give feedback.
-
|
Basically at this point, I am out of options with those sticks. Those are a replacement for a trio of ADATA Gammix S70 Blade which were also problematic because their namespace had a bad value for EUI64: Basically all were all set to eui64=0000000000000000 which made the system totally confused about who was who. So my only option at this point is to get another model :/ Perhaps I will keep them for a much-less intensive use. Reality is: not all NVMe hardware can play nicely with ZFS. It seems that investing in higher end of hardware is not an option, especially with ZFS. I won't ever consider switching them back to 512b sectors, I don't think this will solve the issue and if ever it solves it, there is a significant performance penalty. Hoping my hours of investigations would avoid someone wasting money in junk hardware. It is a bit disappointing that this junk is coming from a well-known brand. PS: Free feel to further elaborate. I will post if I get something new on this. |
Beta Was this translation helpful? Give feedback.
-
|
I would try to replace the PSU with another one and probably 1000W one.
Often mysterious problems end up with replacing faulty PSU.
…On Wed, Apr 26, 2023 at 9:23 AM admnd ***@***.***> wrote:
Above patch tried, but in my case, worsens the issue :( The crash happens
much more early than before.
Fiddling around with parameters of nvme.ko, I managed to have a higher
allocation of 200 MB with nvme.max_host_mem_size_mb=512 + the above patch
applied.
—
Reply to this email directly, view it on GitHub
<#14793 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABXQ6HOVYHJWDVAHYS4RWYDXDBMHPANCNFSM6AAAAAAXLAAQ7E>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
|
This might be a longshot, but where have you connected your NVMe? Did you use the onboard slots or a riser card with bifurcation? And if you used the onboard slots which ones did you use? From the Manual you can see one of the slots shares bandwith with the Sata Ports if theres anything in there it could cause a Problem. Further x670 daisy chanins 2x the x670 chipset to give more connectivity. A Guess off mine could be that this issue could be cause by limited bandwith between chipsets and the CPU which might cause the controller to look like its dropping. My suggestion to troubleshoot this, is to get a bifurcating riser card put it in the 16x Slot and have all the NVMes directly connected to the CPU. This would eliminate going over the Chipsets. Unfortunatly ASUS has no blockdiagram of the Board and where which PCIe Lanes go with which speed. But I would see if limiting the speed of the drives could also be causing this issue. PCIe Switching link speed caused me a lot of headaches with my rx5700 xt GPU. It caused some weird issue of it disconnecting crashing the drivers etc. So pretty similar to what you experience. Those 2 would be my guesses for this issue. |
Beta Was this translation helpful? Give feedback.
-
|
It's interesting you're having issues with the SN770. I was having issues with mine (2TB as well) in my laptop. ZFS, Btrfs on LVM/LUKS even ext4, my drive would reset just like yours in my laptop. Whether during boot or when sitting there doing nothing, or something. Seemingly random. I took it to my computer store to get it replaced. Through their testing the drive passed all tests, so they did not replace it. I believe they were testing with windows. I am going to RMA it with WD, hopefully my replacement performs better. I have the exact same drive in my desktop(X570 5950X), using a single ZFS vdev as root. I have not experienced these issues. I would try putting the desktop drive in my laptop (XPS 9560)to see if it has issues but that would be quite an inconvenience to me. So I am just going to RMA it. The previous drive in my laptop did not have these issues. This stuff occurred with both 512b and 4kb sectors I believe. |
Beta Was this translation helpful? Give feedback.
-
|
Others pointers (FreeBSD):
At this point, I have opened a case with WD, perhaps something can be done at their level. As I should have some freetime tomorrow, I will try to exchange modules between my two machines. |
Beta Was this translation helpful? Give feedback.
-
|
SN770 Swapped out for 3x WD SN 850 configured in 4K. Day & night! My 7950X is literally breathing again! Over 100K IOPS while emerging GCC 13, zpool scrubs are going easily to 5-6 GB/s. Earlier this afternoon, I tried to swap one module at a time. Guess what? One SN 770 quit the pool seconds after the resilvering started, the second reset in the middle. I had thousands checksums errors reported. Fortunately I have daily snapshots stored on a TrueNAS box, so not an issue. This junk is even not able to sustain a pool resilvering. So, gentlemen, moral of the story : Don't use DRAM-less NVMe stuff with ZFS Will give news on what happens with my now famous SN 770 when I will have :) Perhaps they will do better in my secondary machine or in the junk-box. Thank you, again, for jumping in and take some of your time to put suggestions here. This is greatly appreciated. |
Beta Was this translation helpful? Give feedback.
-
|
Stumpled over this by searching for consequences of my pool crash. |
Beta Was this translation helpful? Give feedback.
-
|
Hello @admnd I'm experiencing the same problems on my server infrastructure, I recently added this wd nvme (sn850x) just for some low-spec VM that I did not prefer to run on my main nvme composed by different pm9a3. |
Beta Was this translation helpful? Give feedback.
-
|
I don't know if its related somehow but here's my 2 cents. I had an SN570 500GB (dram less) NVMe, which was actually quite newish (less than 1 year old). I never had any issues initially with ZFS and gentoo on it, been using ZFS since the last 5 months. Until recently, I started noticing random kernel crashes and ZFS status reporting permanent errors while scrubbing. My RAM was perfectly fine concluding from the fact that memtest86+ tests reported pass twice consecutively. To my surprise, upon rebooting to windows, WD dashboard reported that "NVM subsystem reliability has degraded" with 99% lifetime remaining. Even, SMART tests started failing. And unfortunately, the drive had to be replaced out. |
Beta Was this translation helpful? Give feedback.
-
|
Would be cool for a "ZFS NVMe Recommendations List" to come out of this discussion. I imagine SLC and MLC NVMes would be above the rest. What are the other criteria of which ZFS users should be aware when identifying the best SSD hardware? |
Beta Was this translation helpful? Give feedback.
-
|
I think I'm suffering from this on a 8TB Corsair MP600 PRO NH used as additional storage for a proxmox 8. rsync seems to trigger it specially. The sledgehammer solution: Brings back the device for me but the zfs pool doesn't come back. I think it is because proxmox creates the pool with a /dev/nvme0nX and the X changes with every "resurrection". I'm going to try ext4 next on that device and see how it goes. I wanted to post here in case there is more people with the same device and similar problems. |
Beta Was this translation helpful? Give feedback.
-
|
Just FYI, I had the exact same issue with a brand new WD BLACK SN770, and swapping my PSU solved the issue (while my previous one seemed perfectly fine)... |
Beta Was this translation helpful? Give feedback.
-
Last time I saw this was with either firmware / hardware issue, RMA solves sometimes, if they return you a piece with newer version of firmware or an internal known defect fixed. I would suggest not to buy same brand & model of the same batch for all vdev in a pool, that might put you at risk of faulting all disks if ever there is a hardware / firmware / manufacture issue. |
Beta Was this translation helpful? Give feedback.
-
|
FYI: SN580 1TB under windows10/11 with latest firmware 281040wd, the same behaviour after switching to 4096 lba , reproducible by fio with numjobs=4 |
Beta Was this translation helpful? Give feedback.
-
|
Posting here to A) Thank the OP, B) spread the word.... TL/DR_SUMMARY: The OP is 100% correct, this IS some kind of a problem between ZFS, the WD drives (SN770 and SN850S, SN850XE), and maybe even the underlying hardware. Better said, it's a particular chemistry of calamity that ultimately results in the problems everyone is describing. A drive will randomly drop out the zpool, write errors will be seen, and generally nothing other than a reboot will reset the drive controller, thus allowing zfs to resilver and heal the pool. I've spent WAY too much time on this and ultimately, switching filesystems was the fix. So here is how I got there and maybe some help for you. DETAILS: I started with the 'ole trusty mdadm to build an array from 12 x 4TB SN850XE drives shucked from USB3 cases. Before you say shucked drives are the problem - just know I verified the controller, controller firmware, clock mechanism, and memory chips are identical to the SN850X, available as a standalone drive. For some time I thought the shucking trick was my enemy. Nope. I used some PCIe 4.0 x16 to 4x(x4) adapters found here to place the drives in three of the five PCIe x16 slots available. Supermicro H12SSL-i, AMD Epyc 7352, 256GB of memory. I wasn't happy with the contact mechanism between thermal pad and drive, but more on that later. A mdadm RAID 5 array would fail building itself around the 80-90% mark, every time I tried for about ten different attempts. RAID 0, 1, 10 were all fine, but not when distributed parity was a player. I changed build flags and settings, sector sizes, an array-of-partitions instead of disks. I went through lvmraid and snapraid (both of which rely on the md subsystem). Failed every time. Another factor here is heat, these little things get HOT. So i switched to these drive carriages, which because of the screws in the middle of the heatsink, had better contact with the thermal pads used. More mdadm attempts, more failures. Enter ZFS. I've always been a little shaky with ZFS because of it's proximity to the kernel but building a ZFS pool doesn't carry the bitmap overhead and drive geometry mapping that mdadm has. Building a RAID 5 zpool was a snap and I was mounted with encrypted and unencrypted datasets immediately. But just like all the others above me in this thread - large file transfers and even sustained small file transfers would kill the system. So next I started digging through dmesg. Since this is a Proxmox box and I used SR-IOV and PCI passthrough religiously, PCI Advanced Error Reporting (AER) and PCI Access Control (ACS) had to be enabled. That instantly produces the below. These errors will show up every few seconds and make sure you've got log rotation turned on, or you've turned off AER logging with a boot flag, else you're going to be exhausting drive capacity in a few hours. Please let me save you countless hours of digging through kernel dev forums and just tell you this is a complete red herring. AMD Epyc series processors are very "chatty" about the PCIe Bus. The slightest re-ordering of bus transaction data results in similar above messages, of which most CPUs DO NOT flag the kernel over. You may even wind up on a forum where an AMD engineer calls this a firmware errata since corrected in later generations of Epyc and Ryzen CPUs. It's also quite dependent on the underlying northbridge controller in the CPU. Alas, it's a red herring. And since the hardware error (the WD drive is complaining about transactional re-ordering) is corrected by the drive controller - it's quite "normal" and NOT contributory to the problem. BACK TO THE STORY: I went through another PCIe card from Dell that can handle the 22110 drives but still the same failures with ZFS. Sometimes I could get 8 or 10 TB transferred (I used straight CIFS, rsync, NFS, and others), and sometimes just a few GB. Sometimes the pool would pause and the transfer would continue for awhile, and then after a hard failure and reboot, resilver itself and heal for the amount of data I was able to transfer. Eventually I switched to SFF-8654 carrier cards and Silverstone active cooler carriages. Heat would not beat me! But still the same problem, nearly repeatable for every zpool flag, feature, anything I could switch on or off.
I placed thermal sensors on the drives and the Silverstone carriages are won-dee-ful. They were keeping the drives around 50 degrees, or lower. Heat was thus not a factor. So in a final effort to maintain allegiance to ZFS, I swapped the drives to a completely Samsung platform (980 Pro). Two things happened...one the number of AER messages got cut in half. Two, no more pool crashes and removed drives!!! All other things unchanged, that told me that some bit of chemistry between the CPU, the board, the drives, and ZFS was the problem. So I then tested on a SuperMicro X10SDV board - albeit with a single PCIe card, bifurcated x4x4x4x4 and running at PCIe gen 3.0 speeds. Nope, ZFS and the WD drives still broke. Samsung drives were A-ok. That's an intel board with a completely different IOMMU, AER, and ACS structure. So the final conclusion here, after all that testing is that ZFS pools, definitively when the pool uses a parity structure (RAIDz.*), are not compatible with current generation WD M.2 NVME drives. The OP's hypothesis of burst writes might be the culprit. FINALITY: With ZFS and mdadm cooked, I switched to RAID 5 BTRFS. Not a single problem. rsync transfer rates are 600MB/s from a pure SATA ZFS array of 24 x 2TB M.2 drives. That's less than what rsync reported on the ZFS pool, but it's also realistic. SMART load tests show 6000MB/s, on part for these drives. Nothing special, no unique flags for the BTRFS RAID 5 array, i don't even use commit=120. But I can copy hundreds of TB's back and forth with not a single problem. So here's what I know to be true:
5. Use BTRFS with these drives. The write-hole problem was fixed. |
Beta Was this translation helpful? Give feedback.
-
|
Did you all try latest ZFS versions? 2.2.8 or 2.3.3? Maybe you are hit by:
fixed by #16687 |
Beta Was this translation helpful? Give feedback.
-
|
Dear all, we have several issues like in your description but not with WD either with Samsung SSD 990 Pro. Our Setup: QNAP Storage TS-h1090FU (Firmware: JS06716L The setup is one ZFS RAID-6 pool with activated dedup. The first 4 month we didn't had any problems - but then the horror began. Every week a few minutes after the scrubbing began a hard disk was disconnected due to a timeout. Not always the same disk - sometimes Disk 1, then Disk 10 and then Disk 2. If we stop the scheduled scrubbing the storage is running over two weeks without any issues. Smart infos of all disks are ok. We created a QNAP-ticket and the the storage was replaced by a new one in 04/2025 (QNAP thought it was the backplane). Same QNAP firmware, same NVMe with the same adapter (QDA-U2MP). The new storage runs ~5 month (like the old one) without any problems and yesterday one hard disk was disconnected a few minutes after scrubbing was started - with a timeout. The temperature of the hard disk during scrubbing is ~30°C (all HDDs are cooled by the QDA-U2MP adapter and we are using air condition). ZFS parameters:
Heres the log: I think it's a combination of zfs + the harddisks. I donÄt think it's the storage because is was already replaced. It seems that the scrubbing creates so much IO that the hard disk gets disconnected. Does anybody has an idea how to solve it? Best regards Rainer |
Beta Was this translation helpful? Give feedback.
-
|
Wanted to document WD Red SN700 NVME drives randomly drop out of the array (MDADM RAID-6) with BTRFS on top of it on an Asustor Gen1 NVME NAS. Western Digital/Sandisk provides a generic answer and Asustor noted they do see other customers with this issue but do not have any type of fix or workaround for this issue currently. The workaround on my side is automating the shutdown and waking up (WOL) the NAS and it will rebuild itself automatically. This can be frustrating though as it adds a lot of latency during the rebuilds. [482320.067256] nvme nvme1: I/O 11 QID 0 timeout, reset controller |
Beta Was this translation helpful? Give feedback.
-
|
Hi guys Some test with mdadm raid5 (btrfs format) does give me faulty state under load too! (Only one drive) I think maybe we do have following problem fields: Aspm is a big pain for me so far. To Disable it by grub doesn’t work (besides long boot times). Disable aspm via uefi doesn’t fix the problem either. the smart conditions looking good and the firmware are all at its newest for the nvmes. I could hand out some specific logs too if needed. |
Beta Was this translation helpful? Give feedback.
-
My desktop has an NVMe drive that was randomly locking up my system with IO errors. I disabled Autonomous Power State Transitions (APST) and the problems went away. Here's how I (non-persistently) disabled it: |
Beta Was this translation helpful? Give feedback.
-
|
you can find some more details with links in the arch wiki: https://wiki.archlinux.org/title/Solid_state_drive/NVMe#Allow_drive_to_enter_low-power_states_(APST) |
Beta Was this translation helpful? Give feedback.
-
|
I checked how APST is set on 4TiB WD SN700s within an Asustor NVME NAS (12 bay/Gen1) , this is off already :( Question for those that switched from ZFS/BTRFS (CoW) to EXT4 or XFS for example, do the drive lockups persist? |
Beta Was this translation helpful? Give feedback.
-
Something that I found interesting since moving to MDADM with RAID-1:Dec 21 19:37:01 box1 mdadm[1093]: mdadm: DeviceDisappeared event detected on md device /dev/md/md0 This appears to be a device changing power state, it was suggested to boot with the following: Now waiting to see if this issue recurs. |
Beta Was this translation helpful? Give feedback.
-
|
Since migrating my NAS volume from BTRFS to EXT4 a couple days ago, I just had a random drive go off-line (Dec 24, 2025) on the Asus Flashstor Gen1, so with regards to the filesystem BTRFS vs. EXT4 for these WD SN700 drives, it did not make any difference. Does anyone on this thread have WD SN700 drives working in a stable configuration? On EXT4: |
Beta Was this translation helpful? Give feedback.
-
|
Researching this further, I came across a very interesting suggestion for a workaround, if these WD SN700 drives have aggressive power management (5 seconds) and the drives do not respect the kernel options, I am going test the following: writing 1 byte every second to a file via a Linux systemd unit to the NFS share (NAS array) and see if this issue recurs. I have seen this issue occur almost ALWAYS when the system (Asus Flashstor is idle). Right now my "workaround" is automated script/tooling to shutdown the machine and wake it up via WOL packet but the rebuild time takes forever on these systems and there is very bad latency during the rebuild. If there is an acceptable workaround that will make these drives usable, this would be the best case scenario. This may be wishful thinking but I am going to test the following next: Create a script, e.g., /usr/local/bin/nvme_heartbeat.sh:
|
Beta Was this translation helpful? Give feedback.
-
|
Update to my 8/15/25 post.. BTRFS - Meh. It's pseudo-functional. I've had the drive drops and failed drive problems such as with ZFS, but not nearly at the same volume/frequency. Weekly scrubbing and re-balancing helps greatly. Thrice in the last six months I've had a drive disconnect from the array, and one of those times, BTRFS still reported a healthy array. Nothing in dmesg other than a missing drive during a weekly scrub. In two cases the array "healed" automatically. In the third, I lost data, though the array still continued to operate in RO mode. This was a slight improvement over ZFS, by which even after losing a single drive in a RAIDZ1 array, "recoverability" was troublesome. The good news is that BTRFS historical performance issues seem fixed, the bad news is that it's more of the same for NVMe drives in a parity topology. For all block level storage systems, I've had the best experiences with BTRFS (also using it as a file system) but this isn't ready for production. Kernel Flags - pcie_aspm=off. This doesn't "turn ASPM off". Instead it just tells the kernel to not enforce an ASPM policy on compatible devices, almost always on the PCI bus. To truly disable ASPM, use the BIOS if yours supports it. But I've tried that too and nothing changes. ZFS, BTRFS, CephFS, mdadm, any block level storage (yes, some of these are file level too) in a parity model fails for me. Interestingly, non-parity topologies such as a two drive ZFS mirror using Samsung 980 and 990 drives is rock solid. Or a stripe. But parity is the problem for me. pcie_port_pm=off - this will tell the kernel to disable power management at the port level for each PCIe port. Essentially this forces PCI bus devices to stay in an active power state because the kernel will not honor state changes. I tested on kernels 6.8-6.17, but this never had an effect on the NVMe drives. nvme_core.default_ps_max_latency_us=0 - this will reduce the maximum power state transition time acceptable to the kernel to 0, thus preventing any NVMe device from transitioning to any power state. It's another insular kernel parameter like pcie_port_pm=off. Sadly, no changes on WD SN850X drives. I also tried disabled PCIe AER, but that's needed for advanced IOMMU operations and ASPM functionality. My dmesg is clogged with corrected PCIe errors, usually TLP and DLLP errors. Most of which come from these WD NVMe drives. Da Bus - I've even tried downclocking WD drives to PCIe 3.0. I figured, maybe my system is "too fast" for these drives in such a topology. Well, after switching to a RHEL distro just to be able to use a compiled version of setpci that would reduce the PCIe version to 3.0, nothing changed. I briefly dabbled in downclocking the bus itself, and rate-limiting the data transfer rate. But these changes would have effected all the bus devices, and not just NVMe's. And if this is really a solution, I'd rather just buy different drives or beg WD for a firmware update. What works - Non-parity topologies. Right now I've got a simple 12 drive RAID0 stripe on WD SN850X drives. I know....yuck. But I backup in triplicate and though expensive, it's the most stable I've found. ZFS complains about a multi-drive stripe so I use mdadm for assembly at the block level. Though this IS technically possible with ZFS, and I HAVE made it work. Other options would be a JBOD LVM physical volume, and then build an LV on top of that and place a file system on the LV. That's worked fine for me as well (so far). The best/fastest combo seems to be a simple multi-drive mdadm stripe, then create a partition and format to EXT4. A LUKS encrypted partition worked fine, as well as file overlay systems such as ecryptFS. The good news is that such a stripe on WD 850X drives is FAST....SUPER FAST. The bad news is that there is no parity, no drive redundancy, and I'm forced to have a second array for backups. |
Beta Was this translation helpful? Give feedback.
-
|
tl;dr: faulty drives are faulty. Replace it with one that doesn't have controller firmware with showstopping bugs. |
Beta Was this translation helpful? Give feedback.
-
|
Anyone try: |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Originally started as a bug, but after investigations and comments it is definitely more a hardware issue related to ZFS than a ZFS bug so I open a general discussion here, free feel to put constructive observations/ideas/workarounds/suggestions.
TL;DR: Some NVME sticks just crash with ZFS, probably due to the fact they are unable to sustain I/O bursts. It is not clear why this happens, the controller might just crash or a combination of firmware/BIOS/hardware makes it unstable/crash when used in a ZFS pool.
Hardware
Issue observed
My system zpool is composed of a single RAID-Z1 VDEV composed of 3x WD Black SN770 2TB them selves configured in 4K logical sectors (I did not test with 512b sectors to see if the issue still happens....yet). The VDEV uses LZ4 compression, is not encrypted neither the underlying modules (they do not support that), standard 128K stripes are used. No L2ARC cache used. System has plenty of free RAM so no RAM underpressure.
Under "normal" daily usage I did not experience anything, the zpool is regularly scrubbed and nothing to report: no checksum error, no frozen tasks, no crash, nothing, the pool completes all scrubbings wonderfully well. The machine also experience no freeze or kernel crashes/"oopses", no stuck tasks (I have had reported an issue with auditd here a couple of weeks ago but this guy is now inactive, see bug #14697). Even "emerging" big stuff like dev-qt/qtwebengine with 32 CMake jobs in parallel or reemerging the whole system from scratch with 32 parallel tasks with heavy packages rebuilt at the same time succeeds. No crashes.
However, if I use
zfs sendto make a backup of the system datasets on a local TrueNAS box over a 10GbE link this is another story: most of the time one of the NVMe modules randomly crash. The issues also happens at different times in the data transfer: sometimes the issue appears after 12Gb, sometimes after 78Gb, sometimes after 93 Gb and so on. If I am lucky, sometimes it completes the operation successfully (less than a quarter of the time). Itchy and annoying. I have managed also to reproduce it with rsync-ing a dataset on an empty new one in the same pool also this happens more rarely. The TrueNAS box and network are out of concern as they run smoothly and as I can reproduce the issue locally by sending the ZFS stream in /dev/null (zfs send .... | cat > /dev/null).When the crash happens, the following trace appears in the kernel logs:
At this point, if I am lucky enough, I can manage to bring it back to life using a sledgehammer:
If the faulted device reappears the zpool becomes ONLINE again and completes its resilvering (a couple of KB or MB). In the worst case, another one NVMe also drops off the pool which becomes suspended so I have to powercycle the machine or push its reset button. Of course, doing a
nvme listat this point either completely freezes either lists the two remaining NVMe modules, depending on what is alive.My best guess so far is that the Western Digital SN 770 modules controller is not not beefy enough to handle a burst of I/O requests (knowing they have no DRAM cache) so it is put on its knees and become so unresponsive that it is unable to complete a reset request on its own (no AER reported in logs BTW). As not always the same module crashes, they do not seems be all defective or I am extremely unlucky. Pool scrubbing might by a bit lighter for the controller so the scrubs/resilvers work without any issue (maximum observed speed observe is around 4.5~5 GB/s when scrubbing the pool according to
zpool status).What has been tried so far
Several things! Without any improvements unfortunately:
nvme_core.default_ps_max_latency_us=0 pcie_aspm=offon the kernel command-line;zfskernel modules parameters: lowering values ofzfs_vdev_sync_read_min_active,zfs_vdev_sync_read_max_activeand theirasynccounterpart (I used the same values set as defaults forfs_vdev_scrub_max_activeandfs_vdev_scrub_max_active) ;throttle:zfs send ... | throttle -M 300 | ...blkiocgroupzfs sendfrom a FreeBSD live media : FreeBSD allocates a 200MB host buffer for each module but unfortunately no more success and azfs sendalso hangs :/Some thoughts / ideas of tests to try
Is there a "ZFS native" way to throttle I/O operations in the case of doing a
zfs send?Has anybody here experienced something like this? If so, what are the other brands/models subject to a similar issue?
Beta Was this translation helpful? Give feedback.
All reactions