Skip to content

Conversation

@minwooim
Copy link
Contributor

@minwooim minwooim commented Jul 21, 2025

This patch introduces a new --verify_type option to support crash-consistent offline verification scenarios. The primary motivation is to handle cases where write operations are interrupted by device failures or unexpected power loss during the write phase, requiring verification to be limited to data that was properly persisted before the interruption.

Key features:

  • New --verify_type=flush option filters verification candidates based on fsync completion timing
  • Tracks fsync completion timestamps during write phase
  • During verification, excludes writes that completed after the last fsync
  • Always includes FUA (Force Unit Access) writes regardless of timing, as they bypass cache and are immediately persistent
  • Works with existing verify_state_save mechanism for offline verification

Use case:
This is particularly useful for testing storage durability guarantees in scenarios such as:

  1. Simulated power failures during write workloads
  2. Device error injection testing

and espeically with such ioengines which directly communicates with the device directly as io_uring_cmd does.

This enhancement enables more realistic testing of data durability in storage systems by ensuring only properly synchronized data is verified after simulated failures.

Examples:

  1. write phase

[global]
ioengine=io_uring_cmd
cmd_type=nvme
filename=/dev/ng0n1
rw=write
bs=4k
verify=pattern
verify_pattern=%o

iodepth=32
size=32k
fsync=3

[test]
verify_type=flush
do_verify=0
verify_state_save=1

  1. read phase

[global]
ioengine=io_uring_cmd
cmd_type=nvme
filename=/dev/ng0n1
rw=write
bs=4k
verify=pattern
verify_pattern=%o

iodepth=32
size=32k
fsync=3

[test]
verify_type=flush
do_verify=1
verify_only=1
verify_state_load=1

Copy link
Collaborator

@sitsofe sitsofe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@minwooim:

I've very briefly looked over this and from what you're saying the purpose is to allow the verification of only those writes that happened before an fsync? How does this interact with things like trims? Do you really foresee a verify_type other than flush?

Some additions that will be needed:

@minwooim
Copy link
Contributor Author

I've very briefly looked over this and from what you're saying the purpose is to allow the verification of only those writes that happened before an fsync? How does this interact with things like trims? Do you really foresee a verify_type other than flush?

Not this time since we already have trim_verify_zero option for the TRIM-ed offset ranges. Rather than excluding the TRIM-ed offsets, we can verify them with zero-data.

Will add these two, Thanks for catching this.

io_u.c Outdated
if (!f->last_write_comp)
return;

now_nsec = utime_since_now(&io_u->start_time) * 1000; /* Convert to nanoseconds */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ntime_since_now exists

}

/* Mark FUA writes for verification state tracking */
if (io_u->ddir == DDIR_WRITE && (ld->cdw12_flags[DDIR_WRITE] & (1 << 30)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be clearer if you used o->writefua for the condition

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the sg ioengine has a writefua flag as well.

io_u.c Outdated
return;

/* Track the last FLUSH completion timestamp */
completion_time = utime_since_now(&io_u->start_time) * 1000;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ntime_since_now
Also, we already calculate a completion time elsewhere for all operations. Is there a way to use that completion time instead of calculating a second one?

@vincentkfu
Copy link
Collaborator

There is also a t/verify-state.c file that will need updating.

@minwooim minwooim force-pushed the verify_type branch 2 times, most recently from 603be44 to 18c23e9 Compare July 24, 2025 07:22
verify.h Outdated
VERIFY_PATTERN, /* verify specific patterns */
VERIFY_PATTERN_NO_HDR, /* verify specific patterns, no hdr */
VERIFY_NULL, /* pretend to verify */
VERIFY_FLUSH, /* verify with flush timing consideration */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed removing this one, Thanks for catching this.

@vincentkfu
Copy link
Collaborator

For your commit message there needs to be just one Signed-off-by line.

This patch introduces a new --verify_type option to support crash-consistent
offline verification scenarios. The primary motivation is to handle cases where
write operations are interrupted by device failures or unexpected power loss
during the write phase, requiring verification to be limited to data that was
properly persisted before the interruption.

Key features:
- New --verify_type=flush option filters verification candidates based on
  fsync completion timing
- Tracks fsync completion timestamps during write phase
- During verification, excludes writes that completed after the last fsync
- Always includes FUA (Force Unit Access) writes regardless of timing, as
  they bypass cache and are immediately persistent
- Works with existing verify_state_save mechanism for offline verification

Use case:
This is particularly useful for testing storage durability guarantees in
scenarios such as:
1. Simulated power failures during write workloads
2. Device error injection testing

and espeically with such ioengines which directly communicates with the
device directly as io_uring_cmd does.

This enhancement enables more realistic testing of data durability in storage
systems by ensuring only properly synchronized data is verified after
simulated failures.

Examples:

1. write phase

[global]
ioengine=io_uring_cmd
cmd_type=nvme
filename=/dev/ng0n1
rw=write
bs=4k
verify=pattern
verify_pattern=%o

iodepth=32
size=32k
fsync=3

[test]
verify_type=flush
do_verify=0
verify_state_save=1

2. read phase

[global]
ioengine=io_uring_cmd
cmd_type=nvme
filename=/dev/ng0n1
rw=write
bs=4k
verify=pattern
verify_pattern=%o

iodepth=32
size=32k
fsync=3

[test]
verify_type=flush
do_verify=1
verify_only=1
verify_state_load=1

Signed-off-by: Minwoo Im <[email protected]>
thread_options.h Outdated
struct zone_split zone_split[DDIR_RWDIR_CNT][ZONESPLIT_MAX];
uint32_t zone_split_nr[DDIR_RWDIR_CNT];
uint32_t pad2;
uint64_t pad2;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically you can get rid of the pad as you will now have 64 byte alignment...


/* Mark FUA writes for verification state tracking */
if (io_u->ddir == DDIR_WRITE && o->writefua)
io_u_set(td, io_u, IO_U_F_FUA);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be hidden in the DDIR_WRITE case of fio_nvme_uring_cmd_prep() in nvme.c?

}

if (hdr->version == 0x04)
if (hdr->version == 0x05)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're going to need to bump this up even further after rebasing...

failure should be verified.
.P
This option requires \fBverify_state_save\fR to be enabled and is only
effective during offline verification (separate verify job). Default: none
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to explicitly list that none as one of the options above too.


if (ddir == DDIR_WRITE)
file_log_write_comp(td, f, io_u->offset, bytes);
file_log_write_comp(td, f, io_u->offset, bytes, io_u);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to pass the full io_u does it make sense to pass the offset separately rather than calculate it in the function?


struct zone_split zone_split[DDIR_RWDIR_CNT][ZONESPLIT_MAX];
uint32_t zone_split_nr[DDIR_RWDIR_CNT];
uint32_t pad2;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After rebasing you'll have to see if you still need the pad.

};

#define VSTATE_HDR_VERSION 0x04
#define VSTATE_HDR_VERSION 0x05 /* Incremented for FLUSH count support */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might need bumping after rebasing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants