Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

May I ask how to speed up the verification speed through multi-threading #195

Open
wnark opened this issue Dec 23, 2022 · 11 comments
Open

Comments

@wnark
Copy link

wnark commented Dec 23, 2022

I have a brand new batch of 7.68T u.2 SSDs here and want to test whether writing data with theoretical cycle life times under server room temperature conditions will produce failures.
When I enable only one f3write&f3read, even though the drive has a theoretical write speed of 7GB/S, the actual drive write speed is only 1GB/s due to the CPU single core main frequency not being high enough.
I might even have to test it for a year if I go at that speed.
If I use fio I can use the -thread -numjobs parameter, but how do I use f3write & f3read to achieve multi-threading?

@wnark
Copy link
Author

wnark commented Dec 23, 2022

I tried to create multiple folders under one hard drive and then start a set of f3write&f3read for each folder. I don't know if f3write&f3read will interfere with the accuracy of each by such a method of multi-threading.
When the hard disk is full of writes, the running f3write can stop at the same time, and then start f3read for verification at the same time. When all the folders are verified, all the files in the folders are deleted at the same time and f3write starts again at the same time. i don't know if this method is accurate.

@wnark
Copy link
Author

wnark commented Dec 23, 2022

Suppose I use the --end-at=NUM and --start-at=NUM parameters, although I can run multiple f3write&f3read in a folder, but when running to the end, multiple f3write&f3read will still encounter attempts to compete for less than 1GB of storage space, may I ask if the results are still accurate at this point?

@AltraMayor
Copy link
Owner

As you have already figured out, you can have multiple instances of f3write and f3read running simultaneously to increase the write/read speeds.

You should ensure that each instance of f3write/f3read uses a unique file range; a range goes from --start-at=NUM to --end-at=NUM. This is because the report may be ambiguous if there's an issue with a duplicate file. However, it's okay for all combined ranges to have gaps; for example, the combined ranges 1...1000 and 10001...11000 are fine.

f3write will stop writing when the volume is full. It's okay for the last files to be less than 1GB.

With unique ranges, using folders is optional, but likely helpful.

If you develop a script to make it simple to run multiple instances of f3write and f3read, you may want to share it since other users could be interested in it.

@wnark
Copy link
Author

wnark commented Dec 25, 2022

disk_stress_f3.zip
I wrote a simple script. Before using it, you need to put f3write and f3read into the /bin/ directory. In the future, you need to use python to analyze the generated log and connect with Prometheus. I haven't used Prometheus before, and it may take a week to learn how to parse logs in real time.
usage:
./disk_stress_f3.sh -t 2 -c 100 -w randwr -s /mnt -l /var/log/
or
./disk_stress_f3.sh -t 2 -c 100 -w writeread -s /mnt -l /var/log/

@wnark
Copy link
Author

wnark commented Dec 25, 2022

When multiple f3write processes are writing at the same time, it may cause a misjudgment due to the slow speed of the hard disk, and then stop writing,Convert to read。maybe this needs to be detected in monitoring

@AltraMayor
Copy link
Owner

Thank you for posting your script, @wnark.

@wnark
Copy link
Author

wnark commented Dec 27, 2022

Thank you for posting your script, @wnark.

Hello, according to this issue:#175, f3 will not determine if the press test result is OK, but I have a 7T hard drive, and a press test will output at least 7000x3x1000 lines of logs, and it looks like I need to analyze the logs in near real time if I want to monitor now, and then only output the abnormal cases to the log file, and not save all the output to the log file.
In that case I wonder if a shell script can handle the analysis well, or do I have to call f3write/f3read via python and then get the text from the output buffer for analysis? (I seem to need to cut the output text via \n, because like Creating file, it will take a while before outputting ok at the end )
Which way is better to go through, please?

@wnark
Copy link
Author

wnark commented Dec 27, 2022

According to this article, the shell is not a good way to process text, so I still try to run f3write & f3read in python, and then process it.
https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice

@AltraMayor
Copy link
Owner

I agree Python will be much more helpful in processing the output.

@Artoria2e5
Copy link

Artoria2e5 commented Mar 2, 2024

According to this article...

Shell is fine. The issue lies in reading line by line, but since you're already using bash you can just do mapfile to read the whole thing. You probably won't even need to do that, considering you only need to find Data LOST: 0.00 Byte (0 sectors) using grep.

validate_log() {
    if ! grep -F "Data LOST: 0.00 Byte (0 sectors)" "$1" &>/dev/null
    then 
        printf 'f3read日志文件%s显示有错误\n' "$1">&2
        printf 'likely error in f3read log file: %s\n' "$1">&2
        exit 1
    fi
}

Here's the changed script disk_stress_f3.sh.zip. I:

  • Added a function to check the f3read output using grep & make sure to run it after every f3read. I have no idea how to check f3write... yet.
    • DO NOT RUN IT DIRECTLY! The validate_log is wrong in the zip. Replace it with the correct version above before you run it. Uploading new versions of a zip file is too much bother.
  • Removed all the /bin/ before invocations. This is wrong; people have their own PATH for a reason. If you want to only use /bin, just use PATH=/bin.
  • Changed the per-round log to use separate files. Otherwise the grep would be repeating on the earlier results, that's silly.

I didn't change some other stylistic choices. Use shellcheck for those. You probably want to use ((first_group_start_num < first_group_end_num)) instead of [ "$first_group_start_num" -gt "$first_group_end_num" ] with bash, for example.

@AltraMayor
Copy link
Owner

Thank you for this contribution, @Artoria2e5!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants