Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: the problem that the pending tasks cannot be scheduled during the backfill action #4029

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

hansongChina
Copy link

What type of PR is this?

What this PR does / why we need it:

During the backfill action, in the loop of pending tasks, if a previous task fails to match a node or an exception occurs when calling the PrePredicateFn method, all subsequent tasks will stop being scheduled.

Which issue(s) this PR fixes:

Fixes #4028

Special notes for your reviewer:

Does this PR introduce a user-facing change?


@volcano-sh-bot volcano-sh-bot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 21, 2025
@volcano-sh-bot volcano-sh-bot added area/scheduling size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Feb 21, 2025
Copy link
Member

@hwdef hwdef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Feb 24, 2025
@hwdef
Copy link
Member

hwdef commented Feb 24, 2025

/ok-to-test

@volcano-sh-bot volcano-sh-bot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Feb 24, 2025
@volcano-sh-bot volcano-sh-bot removed the lgtm Indicates that a PR is ready to be merged. label Feb 25, 2025
@volcano-sh-bot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign hwdef
You can assign the PR to them by writing /assign @hwdef in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@JesseStutler
Copy link
Member

JesseStutler commented Feb 25, 2025

Great. But I'm wondering isn't there also a problem in the allocate action? After the break here:

, the job is not pushed back, and the remaining pods in this job are not scheduled

cc @Monokaix @lowang-bh @hwdef

@lowang-bh
Copy link
Member

lowang-bh commented Feb 25, 2025

Great. But I'm wondering isn't there also a problem in the allocate action? After the break here:

, the job is not pushed back, and the remaining pods in this job are not scheduled
cc @Monokaix @lowang-bh @hwdef

It is ok when minMember equals to total replicas.
Otherwise, maybe it need to be considered as this:

if job.NeedContinueAllocating() {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/scheduling kind/bug Categorizes issue or PR as related to a bug. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

the problem that the pending tasks cannot be scheduled during the backfill action
5 participants