-
Notifications
You must be signed in to change notification settings - Fork 233
Description
While using a 4k (4096 byte) ashift=12 may be generally preferable for HDDs (4k performance and migration), this is not so single-sided for NVMEs anymore (due to write amplification causing wear):
NVMe flash-translation-layer modes with small indirection units (IUs)
"NVMe allows even smaller pages – down to 512 bytes.
For write amplification this would be even better" (https://vldb.org/pvldb/vol16/p2090-haas.pdf)
Some NVMEs that support 512 data bytes LBA indirection units can even do this with the same performance as e.g. with a 4096 data bytes LBA format (though the controller is of course processing more in 512 byte mode).
Write amplification for small changes may be reduced 8x (if changing full 512 bytes instead of 4096) up to 32768x if changing a single bit.
If the LBA format is configurable, the write-amplificaton for small writes could be reduced by a smaller LBA format (i.e. if modern controller can asynchronously fill its write pages (even >>16k) with smaller indirection units.)
For NVMEs with a fixed LBA format, e.g. 512 byte, avoiding the write amplification basically requires using the the proper default ashift (thus no manual ashift=12 override).
On Thu, Nov 27, 2025 at 3:26 PM bar-g @.***> wrote:
posts too often recommending manual ashift=12 for ssd/nvme
(likely just reproducing an HDD advice/reasoning).
This is what is known as a "system tuning virus" where people copy-n-paste
what they read on the internet
without understanding what it does and why. The fact that the
ashifttunable exists at all is due to the fact that
during the transition to 4kn HDDs, many disks lied.
With: "Flash-based solid state drives [...] The early models used
4096-byte pages while the newer models have moved to an 8192-byte page."
the doc could seem suggesting lager page sizes would warrant larger
ashift, because it only talks about page size in conjunction of ashift.
It varies widely, but the best policy is to follow what the device says.
But maybe it might rather explain ashift should match the devices
indirection unit, which is usually much smaller than the page size, and
corresponds to the configurable (or not) LBA format.
Originally posted by @richardelling in #6373
Doc example leaning towards ashift=12 overrides:
https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Workload%20Tuning.html#alignment-shift-ashift
Also explicit install guide examples, e.g. https://openzfs.github.io/openzfs-docs/Getting%20Started/Debian/Debian%20Trixie%20Root%20on%20ZFS.html