-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Fix incorrect DROP_SIZE usage #296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Show more experiment results about "statistical analysis on unmeasured (zero-filled) samples, potentially skewing results or preventing detection thresholds from being reached.
To validate the claim about "statistical analysis on unmeasured (zero-filled) samples", I added debug output in Below are two outputs: Before fix (measure() only measures middle range)Note: i = 10 | class = 1 | exec_time = 0
i = 11 | class = 1 | exec_time = 0
i = 12 | class = 0 | exec_time = 0
...
i = 18 | class = 0 | exec_time = 0
i = 19 | class = 1 | exec_time = 0
i = 20 | class = 0 | exec_time = 1178
i = 21 | class = 1 | exec_time = 1140
i = 22 | class = 1 | exec_time = 1064
...
i = 129 | class = 1 | exec_time = 1064
i = 130 | class = 0 | exec_time = 0
...
i = 143 | class = 1 | exec_time = 0
i = 144 | class = 0 | exec_time = 0
i = 145 | class = 1 | exec_time = 0
i = 146 | class = 1 | exec_time = 0
i = 147 | class = 0 | exec_time = 0
i = 148 | class = 1 | exec_time = 0
i = 149 | class = 0 | exec_time = 0 This shows:
Zero-filled execution times will
Both of them will suppress the t-value, especially when the sample size is still small, and will cause the test to overestimate the number of additional measurements required. After fix (measure() records all, update_statistics() uses DROP_SIZE range)i = 0 | class = 0 | exec_time = 1102
i = 1 | class = 1 | exec_time = 1101
...
i = 149 | class = 1 | exec_time = 1063
This supports the need for consistent range handling between measurement and statistical analysis. |
Revise git commit messages accordingly. |
Previously, measure() only recorded timings for indices in the middle range [DROP_SIZE, N_MEASURES - DROP_SIZE), while update_statistics() assumed all entries were starting available from index 10. This mismatch allowed zero-valued exec_times from unmeasured head and tail indices to be included in the t-test, reducing sample means, inflating variances, and suppressing the t-value, which may lead to incorrect results or prevent detection thresholds from being reached. After the fix, all samples are measured and update_statistics() discards DROP_SIZE samples at both ends. This ensures correct sample accounting, prevents overestimating the number of measurements required, and avoids false negatives due to uninitialized timing data. Change-Id: Ibb1515043da5f56d72fe34fd5c78e2283df9a993
The commit message has been revised as requested and force-pushed to update the PR. |
Thank @Mike1117 for contributing! |
Previously, measure() only recorded timings for indices in the middle range [DROP_SIZE, N_MEASURES - DROP_SIZE), while update_statistics() assumed all entries were available from index 0.
This mismatch led to statistical analysis on unmeasured (zero-filled) samples, potentially skewing results or preventing detection thresholds from being reached.
Now:
Change-Id: Ibb1515043da5f56d72fe34fd5c78e2283df9a993