New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add benchmark test for ScheduleOne
#124728
base: master
Are you sure you want to change the base?
add benchmark test for ScheduleOne
#124728
Conversation
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Hi @chengjoey. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Welcome @chengjoey! |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: chengjoey The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
pkg/scheduler/schedule_one_test.go
Outdated
// simulating a scenario with 5000 nodes and 5000 pods. at the same time, each pod corresponds to a node one by one | ||
// and the corresponding node is selected for each pod concurrently to test | ||
// the performance of findNodesThatFitPod under high concurrency | ||
func Benchmark_findNodesThatFitPod(b *testing.B) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you change this to test at scheduleOne
level instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, maybe something like Benchmark_scheduleOne_onePodPerNode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I'll give it a try
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
2b97feb
to
1a477df
Compare
1a477df
to
d8e2d8f
Compare
findNodesThatFitPod
ScheduleOne
d8e2d8f
to
ccaf6f5
Compare
Instead of having this benchmark test, can we somehow improve scheduler_perf (e.g., add test cases) so that it can catch the throughput improvement of preallocation? |
testing 5000 pods and nodes the performance of selecting the fit node for the pod Signed-off-by: joey <[email protected]>
ccaf6f5
to
bd58f0e
Compare
start := time.Now() | ||
logger := klog.FromContext(ctx) | ||
// Send pods to be scheduled. | ||
for _, p := range pods { | ||
sched.SchedulingQueue.Add(logger, p) | ||
} | ||
wg.Wait() | ||
cancel() | ||
cost := time.Since(start).Seconds() | ||
throughput := float64(nodeNum) / cost | ||
if throughput < benchmarkThroughput { | ||
b.Fatalf("ScheduleOne_onePodPerNode: %d nodes, %d pods, cost: %f seconds, throughput: %f pods/second, below the baseline throughput standard: %f pods/second", | ||
nodeNum, nodeNum, cost, throughput, benchmarkThroughput) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sanposhiho Is it possible to add benchmarkThroughput
to this benchmark test, at least to verify that the code quality is not lower than a certain standard?
Of course, the throughput in the actual production environment will be affected by more factors such as network memory. I wonder if we can use kwok in e2e to help us better simulate the actual environment. The current benchmark test focuses more on code quality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least in this benchmark test, we can see that preallocation for NodeToStatusMap helps improve throughput.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why that needs to be this type of benchmark test. scheduler_perf is a basic tool we usually use to measure performance. It also has a monitoring visualization provided via perf-dash. I believe we should create this kind of benchmark tests only when it's somehow not possible to measure the performance within scheduler_perf. So, that's why I suggested we should try to seek the possibility of improving scheduler_perf first.
(Besides that, either way, IIUC, Benchmark_XXX tests aren't run in prow jobs.)
But, I like your idea of keeping a certain throughput by implementing as tests. However, for that idea too, I'd like to see something like:
- we define a desired throughput for each test case of scheduler_perf.
- implement something so that we easily notice the degradation based on ^. e.g., the CI fails in PRs, we get alerts in Slack if the scheduler of the master branch is slow, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting a thread -
https://kubernetes.slack.com/archives/C09QZTRH7/p1715262959575039
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What type of PR is this?
/kind cleanup
/kind flake
What this PR does / why we need it:
add benchmark test for
ScheduleOne
testing 5000 pods and nodes the performance of selecting the fit node for the pod
#124709
Special notes for your reviewer: @alculquicondor @sanposhiho @AxeZhan
Does this PR introduce a user-facing change?