SCT for backup #8752

regevran · 2024-09-18T13:45:37Z

Measure backup process performance - how long does it take to backup.
Measure user-queries during backup - what is the impact of backup on user experience.
Measures inside Scylla.
SM tests may or may not be combined with the tests in this issue.

soyacz · 2024-09-18T14:06:15Z

I think this is on @mikliapko plate as part of SM tests.

regevran · 2024-09-18T14:13:43Z

I think this is on @mikliapko plate as part of SM tests.

I updated the description, does it make more sense now?

soyacz · 2024-09-19T06:12:40Z

I think this is on @mikliapko plate as part of SM tests.

I updated the description, does it make more sense now?

@mikliapko ^^

mikliapko · 2024-09-19T08:41:48Z

@regevran Could you please provide some context on this issue?

Where it comes from;
How soon should we have such a test;
Is there any preferred Scylla + dataset configuration to test.

Currently in Manager test scope we don't have backup performance tests, only for restore.

cc: @rayakurl

regevran · 2024-10-10T08:11:51Z

Where it comes from;

2024 Q4 is planned to finish with better backup and restore capabilities.
This issue is to make sure we get there with measurements.

How soon should we have such a test;

@kreuzerkrieg is assigned to this and will start shortly working on the details.

Is there any preferred Scylla + dataset configuration to test.

Yes. I think the focus should be on a large, full cluster backups. And we should measure heavy user loads during the backup.
The numbers (how large, what does it mean heavy load, etc.) to be discussed.

rayakurl · 2024-10-10T09:06:00Z

@regevran - can you please add more details regarding how @mikliapko should assist?

regevran · 2024-10-10T11:56:19Z

@regevran - can you please add more details regarding how @mikliapko should assist?

@kreuzerkrieg is collecting information including the requested support 😸.

kreuzerkrieg · 2024-10-16T08:51:22Z

@mikliapko I have a couple of questions.
Took a look at this file and looks like we already have backup/restore test that measures performance of it. At least this is what the comment states

The test is extensively used for restore benchmarking purposes and consists of the following steps

Do we track results of this tests? is it reported and can be reviewed somehow?
Do we have any benchmarking of read performance during the backup? something comparable, like, we have the bench results with and without backup running?
Do we have bechmarking performance of reading from a table residing on the same cluster where another table being restored?

We need these three to compare the current state of the system with the one we will have in the nearest future - scylla based backup, so we need these numbers. If we dont have all or any of above, what would it take to create these metrics?

CC: @rayakurl

mikliapko · 2024-10-16T09:25:19Z

Do we track results of this tests? is it reported and can be reviewed somehow?

Yes, recently we introduced two jobs:

Argus link

The purpose of these jobs is to check restore performance build by build.
So, important note - it's only for restore. We don't benchmark backup there.

Do we have any benchmarking of read performance during the backup? something comparable, like, we have the bench results with and without backup running?

No, we don't have such benchmarks. At least I'm not aware of any of them.
@fruch Perhaps, you know more about it?

Do we have bechmarking performance of reading from a table residing on the same cluster where another table being restored?

Nope.

We need these three to compare the current state of the system with the one we will have in the nearest future - scylla based backup, so we need these numbers. If we dont have all or any of above, what would it take to create these metrics?

Actually, it's hard to provide you any more or less exact estimations without digging deeper.
@kreuzerkrieg When I'd like to have these metrics? If we need them urgently, we will prioritize these things and start working on them.
CC: @rayakurl

kreuzerkrieg · 2024-10-16T10:13:44Z

@mikliapko as of urgency, good question, @regevran what do you say? what is the expectation?

regevran · 2024-10-16T12:06:39Z

We plan to start the new approach in the beginning of November 2024. It would be great if the tests are ready then with current implementation already measured.

mikliapko · 2024-10-16T12:34:44Z

We plan to start the new approach in the beginning of November 2024. It would be great if the tests are ready then with current implementation already measured.

I see, alright, we will talk with @rayakurl and prioritize it accordingly.

mikliapko · 2024-10-18T10:38:55Z

Do we have any benchmarking of read performance during the backup? something comparable, like, we have the bench results with and without backup running?

Do we have bechmarking performance of reading from a table residing on the same cluster where another table being restored?

@karol-kokoszka @Michal-Leszczynski Guys, I want to hear your opinion about these tests value for Manager in general to not implement something that would be used only one time. I suppose they might be pretty useful.

mikliapko · 2024-10-18T10:41:13Z

We plan to start the new approach in the beginning of November 2024. It would be great if the tests are ready then with current implementation already measured.

@regevran
We had a discussion with @rayakurl about priorities. Is this something that you have capacity to implement?

regevran · 2024-10-20T09:21:23Z

@regevran We had a discussion with @rayakurl about priorities. Is this something that you have capacity to implement?

Yes, we'll do it

kreuzerkrieg · 2024-10-21T13:54:25Z

@rayakurl who can assist (guide) with this task?

rayakurl · 2024-10-21T17:21:59Z

@rayakurl who can assist (guide) with this task?

@kreuzerkrieg @mikliapko can assist

regevran · 2024-10-22T05:31:25Z

@rayakurl who can assist (guide) with this task?

@kreuzerkrieg @mikliapko can assist

Also please make sure @cezarmoise is in the loop too.

Michal-Leszczynski · 2024-10-22T08:41:46Z

Do we have any benchmarking of read performance during the backup? something comparable, like, we have the bench results with and without backup
Do we have bechmarking performance of reading from a table residing on the same cluster where another table being restored?

@karol-kokoszka @Michal-Leszczynski Guys, I want to hear your opinion about these tests value for Manager in general to not implement something that would be used only one time. I suppose they might be pretty useful.

I would say that the first benchmark is more interesting and important, as backups run all the time and cluster should work fine when they are running. The second one is less important, as for now the main objective is to make full cluster restore as fast as possible - it assumes that cluster does not handle any user related traffic. Optimizing restore on a running cluster is left as a future effort for now.

regevran · 2024-11-28T07:07:17Z

We are not done with this issue, as we still couldn't create a good baseline for measures.

github-actions bot assigned regevran Sep 18, 2024

regevran assigned kreuzerkrieg Oct 10, 2024

regevran mentioned this issue Oct 14, 2024

Integrated Backup and Restore in Scylla scylladb/scylladb#21100

Open

18 tasks

kreuzerkrieg mentioned this issue Nov 21, 2024

Add backup benchmarking under read stress #9307

Merged

soyacz closed this as completed in #9307 Nov 27, 2024

regevran reopened this Nov 28, 2024

regevran added the type/epic Complex mission split into a task list label Nov 28, 2024

regevran added this to the 4.4 milestone Jan 9, 2025

kreuzerkrieg linked a pull request Feb 18, 2025 that will close this issue

test(backup): add native/rclone backup benchmarking under read/write stress #10112

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SCT for backup #8752

SCT for backup #8752

regevran commented Sep 18, 2024 •

edited

Loading

soyacz commented Sep 18, 2024

regevran commented Sep 18, 2024

soyacz commented Sep 19, 2024

mikliapko commented Sep 19, 2024

regevran commented Oct 10, 2024

rayakurl commented Oct 10, 2024

regevran commented Oct 10, 2024

kreuzerkrieg commented Oct 16, 2024 •

edited

Loading

mikliapko commented Oct 16, 2024

kreuzerkrieg commented Oct 16, 2024

regevran commented Oct 16, 2024

mikliapko commented Oct 16, 2024

mikliapko commented Oct 18, 2024

mikliapko commented Oct 18, 2024

regevran commented Oct 20, 2024

kreuzerkrieg commented Oct 21, 2024

rayakurl commented Oct 21, 2024

regevran commented Oct 22, 2024

Michal-Leszczynski commented Oct 22, 2024

regevran commented Nov 28, 2024

SCT for backup #8752

SCT for backup #8752

Comments

regevran commented Sep 18, 2024 • edited Loading

soyacz commented Sep 18, 2024

regevran commented Sep 18, 2024

soyacz commented Sep 19, 2024

mikliapko commented Sep 19, 2024

regevran commented Oct 10, 2024

rayakurl commented Oct 10, 2024

regevran commented Oct 10, 2024

kreuzerkrieg commented Oct 16, 2024 • edited Loading

mikliapko commented Oct 16, 2024

kreuzerkrieg commented Oct 16, 2024

regevran commented Oct 16, 2024

mikliapko commented Oct 16, 2024

mikliapko commented Oct 18, 2024

mikliapko commented Oct 18, 2024

regevran commented Oct 20, 2024

kreuzerkrieg commented Oct 21, 2024

rayakurl commented Oct 21, 2024

regevran commented Oct 22, 2024

Michal-Leszczynski commented Oct 22, 2024

regevran commented Nov 28, 2024

regevran commented Sep 18, 2024 •

edited

Loading

kreuzerkrieg commented Oct 16, 2024 •

edited

Loading