Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature(ArtifactsTest): IOTune parameters validation #10121

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

k0machi
Copy link
Contributor

@k0machi k0machi commented Feb 18, 2025

This commits adds a new subtest inside Artifacts test, that does the
check of machine image io_properties.yaml by comparing them to the
actual machine values and showing the deviation. The results are both
printed to console with deviation value and submitted to argus as a
GenericResultTable.

Fixes scylladb/qa-tasks#1787

Testing

PR pre-checks (self review)

  • I added the relevant backport labels
  • I didn't leave commented-out/debugging code

Reminders

  • Add New configuration option and document them (in sdcm/sct_config.py)
  • Add unit tests to cover my changes (under unit-test/ folder)
  • Update the Readme/doc folder relevant to this change (if needed)

@k0machi k0machi self-assigned this Feb 18, 2025
@k0machi k0machi requested review from fruch and soyacz February 18, 2025 16:46
@k0machi
Copy link
Contributor Author

k0machi commented Feb 18, 2025

Demo:
image

17:42:50  < t:2025-02-18 16:42:49,047 f:iotune.py       l:55   c:sdcm.utils.validators.iotune p:INFO  > Disk performance values validation - testing /var/lib/scylla
17:42:50  < t:2025-02-18 16:42:49,047 f:iotune.py       l:59   c:sdcm.utils.validators.iotune p:INFO  > [/var/lib/scylla] read_iops: 109519 (-0)
17:42:50  < t:2025-02-18 16:42:49,047 f:iotune.py       l:59   c:sdcm.utils.validators.iotune p:INFO  > [/var/lib/scylla] read_bandwidth: 806913408 (6)
17:42:50  < t:2025-02-18 16:42:49,047 f:iotune.py       l:59   c:sdcm.utils.validators.iotune p:INFO  > [/var/lib/scylla] write_iops: 59110 (-3)
17:42:50  < t:2025-02-18 16:42:49,047 f:iotune.py       l:59   c:sdcm.utils.validators.iotune p:INFO  > [/var/lib/scylla] write_bandwidth: 559601600 (-0)
17:42:50  < t:2025-02-18 16:42:50,119 f:artifacts_test.py l:280  c:ArtifactsTest        p:INFO  > Verify XFS mount options for /var/lib/scylla contain `discard'

@@ -325,6 +326,14 @@ def test_scylla_service(self):
with self.subTest("check ENA support"):
assert self.node.ena_support, "ENA support is not enabled"

if ("gce" in backend or "aws" in backend or "azure" in backend):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it going to work with e.g. k8s-local-kind-aws?
I'd propose to use if backend in ("gce", "aws", "azure")
Also, shouldn't it be used only with predefined images (if params.get('use_preinstalled_scylla')?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I was testing and removed use_preinstalled_scylla, will put it back

diff = (val / preset_val - 1) * 100
LOGGER.info("[%s] %s: %s (%.0f)", mountpoint, key, val, diff)

def _submit_results_to_argus(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be done outside of IOTuneValidator which should do only one thing: validating correctness of predefined io_properties against actual values: all things like submitting or printing in console could be done separately to not blur purpose of this class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, I'll move submission to the tester body.

Comment on lines 85 to 115
ValidationRules = {
"read_iops": ValidationRule(best_pct=10, fixed_limit=bottom_limit(preset_disk.get("read_iops"))),
"read_bandwidth": ValidationRule(best_pct=10, fixed_limit=bottom_limit(preset_disk.get("read_iops"))),
"write_iops": ValidationRule(best_pct=10, fixed_limit=bottom_limit(preset_disk.get("read_iops"))),
"write_bandwidth": ValidationRule(best_pct=10, fixed_limit=bottom_limit(preset_disk.get("read_iops"))),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thing best_pct rule is not right.
I'd propose different approach: instead of sending raw metrics, send differences in absolute value. This way higher_is_better could be set to False and fixed limit (calculated by predefined io_properties.yaml metric * x%). This should work for any instance size.
In that case table name should be renamed to f"{self.params.get('cluster_backend')} - {self.node.db_node_instance_type} IO Tune absolute deviation"
Drawback is only when we need to know exact value we would need to look into logs (could be tackled by publishing error event with full details in that case)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can send both I think, and that's what I was thinking as well - I'll look into doing it that way.

Comment on lines +94 to +122
if tested_mountpoint != preset_disk["mountpoint"]:
LOGGER.warning("Disks differ - probably a mistake: %s vs %s, will not submit iotune results",
tested_mountpoint, preset_disk["mountpoint"])
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't fail silently here. Better raise error event

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

@k0machi k0machi force-pushed the artifacts-iotune-params-verification branch from 597579b to a527d67 Compare February 20, 2025 12:16
This commits adds a new subtest inside Artifacts test, that does the
check of machine image io_properties.yaml by comparing them to the
actual machine values and showing the deviation

Fixes scylladb/qa-tasks#1787
@k0machi k0machi force-pushed the artifacts-iotune-params-verification branch from a527d67 to 791a3cc Compare February 20, 2025 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants