Can't import pool read-write after PSU failure, but can read-only #17820
-
|
We recently had a dual PSU failure in a JBOD. We replaced one of the failed PSU with a spare while waiting on the replacement. When we try to import the zpool (either just I suspect there might still be an underlying hardware issue (working on getting the power situation fully redundant and checking with the hardware vendor for other signs of failures), but was hoping to run this past the community and see if there are any suggestions. Let me know if additional information is needed / helpful. From And from |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
|
Been poking more and best I can tell the underlying hardware is just fine. Tried one rewind to a previous txg and no luck but since it seems to be on writes it makes sense that wouldn't fix it (I think). Forgot to add this... |
Beta Was this translation helpful? Give feedback.
-
|
This turns out to be a fascinating one. This is the first time I've seen this, but what happened here is that the vdev config is larger than fits in the label, and so the vdevs all fail to sync their labels out. Because they don't even issue the write, the parent IO (for the top-level vdev) sees The extreme size of the vdev config is the result of a combination of two things: First, there are 240 disks in a single top-level vdev, which is a lot. Second, the disks have a path, devid, phys_path, and enclosure path all configured. The thing that triggered the problem to start happening is that, after the outage/reboot, ZFS is trying to set the phys_path for every single disk in the vdev; the config already in the label doesn't have it set for each disk, and the newly generated one does. Weirdly, it is setting them all to the same value; a single entry in The way we worked around the issue is by setting |
Beta Was this translation helpful? Give feedback.
-
|
Really big thanks to Paul on this. Boils down to my fault for creating a really large dRAID, nearing the max size apparently. We have a plan to better layout the pool. Just wanted to add a few more bits of info here. The pool was created on 2022-11-02. Not sure which version of ZFS we were running but it has been through a few upgrades. The child disks ended up each with a few keys which really increased the size of the label. Unfortunately I do not have a full capture of zdb output but did grab this. Looking at some other pools on other systems, some very old, looks like we have some values in 'phys_path' that are really confusing. Above that 'phys_path' is the guid of the zpool. On all the EL8/9 systems I could find '/usr/lib/udev/rules.d/13-dm-disk.rules' will create '/dev/disk/by-uuid/ZPOOL_GUID' to the most recently added disk when triggered.
So all the disks that I could see in the pool had the same 'phys_path' value, which was a pointer to one disk in the pool. Looking at other older pools they have some phys_path values that must be from a test another admin did 5+ years ago but there is no current entry for on the system so seems like it is not updating. |
Beta Was this translation helpful? Give feedback.
This turns out to be a fascinating one. This is the first time I've seen this, but what happened here is that the vdev config is larger than fits in the label, and so the vdevs all fail to sync their labels out. Because they don't even issue the write, the parent IO (for the top-level vdev) sees
good_writes == 0and sets the error to beEIO. The error message for this is terrible, which is probably something we should fix.The extreme size of the vdev config is the result of a combination of two things: First, there are 240 disks in a single top-level vdev, which is a lot. Second, the disks have a path, devid, phys_path, and enclosure path all configured. The thing that triggered the prob…