Skip to content

Conversation

@minwooim
Copy link
Contributor

Patch series to support masking expected system error with --ignore_error=.

Added more error numbers(errno) after ERANGE to support various errno
string to options like `--ignore_error=ETIMEDOUT`.  unvme-cli libunvmed
ioengine returns ETIMEDOUT if a command is timed out.  To mask this
situation with `--ignore_error=` option, errno after ERANGE should be
supported in `str2errr()`.

Signed-off-by: Minwoo Im <[email protected]>
Some of ioengines (e.g., io_uring_cmd) returns negative errno to
represent system error as negative errno instead of positive values for
NVMe-specific error status.  To masking expected situations with
--ignore_error= option, added support for negative value of errno
syntax.

Signed-off-by: Minwoo Im <[email protected]>
IO_U_F_DEVICE_ERROR has been introduced in Commit ebe67b6
("io_uring: Add IO_U_F_DEVICE_ERROR to identify error types") to parse
two different types of errors can happen: (1) device error(e.g.,
NVMe-specific error status code) and (2) system error(e.g., EINVAL).

`--ignore_error=` option let user mask expected situation, but if user
wants to mask system error rather than device error, no way to represent
the system error except for such as -EINVAL.  Even if user put -EINVAL
to the option, it will be checked as a positive value since @io_u->error
will be set with abs() in io_uring_cmd ioengine.

This patch flips the given @io_u->error positive value to a negative
value if the given @io_u->flags represents system error by
IO_U_F_DEVICE_ERROR.

Signed-off-by: Minwoo Im <[email protected]>
Copy link
Collaborator

@vincentkfu vincentkfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change concerns me because it changes the default behavior for error handling. Most of the time io_u->error will be a system error since only io_uring_cmd sets IO_U_F_DEVICE_ERROR. So this will flip the sign of the error in most cases. This will break jobs that depend on fio's current behavior.

Why can't you just use --ignore_error=EINVAL to mask EINVAL instead of relying on this patch and using --ignore_error=-EINVAL?

@minwooim
Copy link
Contributor Author

The reason I wanted to mask with -EINVAL was exactly what you mentioned: with io_uring_cmd, the NVMe CQ entry’s status code can currently be masked through io_u->error. But you’re right — flipping the error-handling sign for all other cases just to accommodate a single ioengine shouldn’t happen.

Why can't you just use --ignore_error=EINVAL to mask EINVAL instead of relying on this patch and using --ignore_error=-EINVAL?

The problem is that if IO_U_F_DEVICE_ERROR is set and io_u->error contains the value 22 (0x16), we can’t tell whether that represents EINVAL (22) or an NVMe SGL Offset Invalid error (sct=0x0, sc=0x16).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants