Skip to content

wrong ashift info and recommendations (from HDDs 4k physical-blocks introduction era) #583

@bar-g

Description

@bar-g

While using a 4k (4096 byte) ashift=12 may be generally preferable for HDDs (4k performance and migration), this is not so single-sided for NVMEs anymore (due to write amplification causing wear):

NVMe flash-translation-layer modes with small indirection units (IUs)

"NVMe allows even smaller pages – down to 512 bytes.
For write amplification this would be even better" (https://vldb.org/pvldb/vol16/p2090-haas.pdf)

Some NVMEs that support 512 data bytes LBA indirection units can even do this with the same performance as e.g. with a 4096 data bytes LBA format (though the controller is of course processing more in 512 byte mode).

Write amplification for small changes may be reduced 8x (if changing full 512 bytes instead of 4096) up to 32768x if changing a single bit.

If the LBA format is configurable, the write-amplificaton for small writes could be reduced by a smaller LBA format (i.e. if modern controller can asynchronously fill its write pages (even >>16k) with smaller indirection units.)

For NVMEs with a fixed LBA format, e.g. 512 byte, avoiding the write amplification basically requires using the the proper default ashift (thus no manual ashift=12 override).


On Thu, Nov 27, 2025 at 3:26 PM bar-g @.***> wrote:

posts too often recommending manual ashift=12 for ssd/nvme

(likely just reproducing an HDD advice/reasoning).

This is what is known as a "system tuning virus" where people copy-n-paste

what they read on the internet

without understanding what it does and why. The fact that the ashift

tunable exists at all is due to the fact that

during the transition to 4kn HDDs, many disks lied.

With: "Flash-based solid state drives [...] The early models used

4096-byte pages while the newer models have moved to an 8192-byte page."

https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Workload%20Tuning.html#alignment-shift-ashift

the doc could seem suggesting lager page sizes would warrant larger

ashift, because it only talks about page size in conjunction of ashift.

It varies widely, but the best policy is to follow what the device says.

But maybe it might rather explain ashift should match the devices

indirection unit, which is usually much smaller than the page size, and

corresponds to the configurable (or not) LBA format.

Originally posted by @richardelling in #6373


Doc example leaning towards ashift=12 overrides:
https://openzfs.github.io/openzfs-docs/Performance%20and%20Tuning/Workload%20Tuning.html#alignment-shift-ashift

Also explicit install guide examples, e.g. https://openzfs.github.io/openzfs-docs/Getting%20Started/Debian/Debian%20Trixie%20Root%20on%20ZFS.html

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions