[BUG] deviceshare plugin not handle AddPod\RemovePod correctly #1959

buptcozy · 2024-03-18T09:36:38Z

What happened:

in generally, when we execute AddPod logic here, the pod may be in scheduling status,it won't exist in nodeDeviceCache's used map, so there is a bug that when the framework execute RunFilterPluginsWithNominatedPods with AddPod for high priority pods, the plugin can't reserve resource for hese high priority pods, In RDMA\VF\nv-switch scenario, it can cause high priority pods assign fail due to some resources is assigned by low priority pods. So we reused the "Reserve" logic to generate an assign placement and save it in nominator. We will clear the nominator cache In "Reserve" and "UnReserve", which means we will do clean job no matter assign success or not, this is the same process of the origin k8s framework nominate process.

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

App version:
Kubernetes version (use kubectl version):
Install details (e.g. helm install args):
Node environment (for koordlet/runtime-proxy issue):
- Containerd/Docker version:
- OS version:
- Kernal version:
- Cgroup driver: cgroupfs/systemd
Others:

The text was updated successfully, but these errors were encountered:

…1959)

…1959) Signed-off-by: xingbao.zy <[email protected]>

ZiMengSheng · 2024-03-25T13:09:48Z

Problem Description

Let's analysize by following example.

PodA with low priority, requested 8 gpu, scheduled to node1.
PodB with high priority, requested 4 gpu, preempt PodA, expected to reserve 0-3 gpu, status.nominatedNodeName updated to node1, PodB enter into backoffQ.
PodC with mid priority enter into scheduing cycle, requested 4 gpu, scheduled to node1 without considering PodB preemption result, so it may use 0-3 gpu unexpectedly.

Suggested Proposal

Let's design by the following examples.

Example1

PodA with low priority, requested 8 gpu, scheduled to node1
PodB with high priority, requested 4 gpu, preempt PodA, invoke ReserveNominatedPod(PodB) to reserve PodB's nominated resource: 0-3 gpu, status.nominatedNodeName updated to node1, enter into backoffQ.
PodC with mid priority enter into scheduling cycle, requested 4 gpu. In filter phase, framework will invoke RunPreFilterExtensionAddPod(higher priorioty pod such as PodB). We have the chance to make PodB's nonimated resource reserved in current scheduling cycle here. So PodC can't use 0-3 gpu.

Example2

PodA with low priority, requested 8 gpu, scheduled to node1
PodB with high priority, requested 4 gpu, preempt PodA, invoke ReserveNominatedPod(PodB) to reserve PodB's nominated resource: 0-3 gpu, status.nominatedNodeName updated to node1, enter into backoffQ.
PodC with high+ priority, requested 4 gpu, scheduled to node1, normally allocated resource: 0-3 gpu. It is overlap with PodB's nominated resource. So, here, we need to invalidate PodB's outdated nominated resource. This make this fix a best-effort.

Scheduling Interpretability

We need sufficient metric or debug service to help us diagnosize and illustrate to users when pod is pending.

buptcozy added the kind/bug Create a report to help us improve label Mar 18, 2024

buptcozy pushed a commit to buptcozy/koordinator that referenced this issue Mar 18, 2024

scheduler : fix deviceshare plugin of add\remove pod (koordinator-sh#…

ef5117b

…1959)

buptcozy pushed a commit to buptcozy/koordinator that referenced this issue Mar 18, 2024

scheduler : fix deviceshare plugin of add\remove pod (koordinator-sh#…

b7dee44

…1959)

buptcozy pushed a commit to buptcozy/koordinator that referenced this issue Mar 18, 2024

scheduler : fix deviceshare plugin of add\remove pod (koordinator-sh#…

34c565b

…1959)

buptcozy pushed a commit to buptcozy/koordinator that referenced this issue Mar 18, 2024

scheduler : fix deviceshare plugin of add\remove pod (koordinator-sh#…

74980e0

…1959)

buptcozy pushed a commit to buptcozy/koordinator that referenced this issue Mar 18, 2024

scheduler : fix deviceshare plugin of add\remove pod (koordinator-sh#…

2097d43

…1959)

buptcozy pushed a commit to buptcozy/koordinator that referenced this issue Mar 18, 2024

scheduler : fix deviceshare plugin of add\remove pod (koordinator-sh#…

b169d49

…1959)

buptcozy pushed a commit to buptcozy/koordinator that referenced this issue Mar 18, 2024

scheduler : fix deviceshare plugin of add\remove pod (koordinator-sh#…

3a3cebd

…1959)

buptcozy pushed a commit to buptcozy/koordinator that referenced this issue Mar 18, 2024

scheduler : fix deviceshare plugin of add\remove pod (koordinator-sh#…

3452300

…1959)

buptcozy pushed a commit to buptcozy/koordinator that referenced this issue Mar 18, 2024

scheduler : fix deviceshare plugin of add\remove pod (koordinator-sh#…

e3ffaf1

…1959)

buptcozy pushed a commit to buptcozy/koordinator that referenced this issue Mar 20, 2024

scheduler : fix deviceshare plugin of add\remove pod (koordinator-sh#…

9c1a454

…1959)

buptcozy pushed a commit to buptcozy/koordinator that referenced this issue Mar 20, 2024

scheduler : fix deviceshare plugin of add\remove pod (koordinator-sh#…

e374ba6

…1959) Signed-off-by: xingbao.zy <[email protected]>

ZiMengSheng added this to the v1.6 milestone May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] deviceshare plugin not handle AddPod\RemovePod correctly #1959

[BUG] deviceshare plugin not handle AddPod\RemovePod correctly #1959

buptcozy commented Mar 18, 2024

ZiMengSheng commented Mar 25, 2024 •

edited

[BUG] deviceshare plugin not handle AddPod\RemovePod correctly #1959

[BUG] deviceshare plugin not handle AddPod\RemovePod correctly #1959

Comments

buptcozy commented Mar 18, 2024

ZiMengSheng commented Mar 25, 2024 • edited

Problem Description

Suggested Proposal

Example1

Example2

Scheduling Interpretability

ZiMengSheng commented Mar 25, 2024 •

edited