Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-53958] allow pipeline jobs to run when built-in is offline #9203

Merged
merged 1 commit into from May 8, 2024

Conversation

mawinter69
Copy link
Contributor

@mawinter69 mawinter69 commented Apr 27, 2024

When a pipeline starts building it creates an OneOffExecutor that takes care of the pipeline execution on the built-in node. The executor has some logic to prevent running things when an agent has gone offline in the timeframe between assigning the task to the executor and the executor actually starting running the task. But this logic falsely leads to a termination of the executor for the pipeline job and the attempts to restart the task fails as the task is no longer in the queue.
This change tries to avoid this by ignoring the online state for the built-in node as it will never be really offline (there is no channel that can be closed). One can take it temporarily offline but this should not prevent pipelines that do not explicitly make use of the built-in to start running.

See JENKINS-53958.

Testing done

Manual testing:

Scenario 1: pipeline not using built-in

  1. take the built-in node offline
  2. create a second agent
  3. Create pipeline job that will run something on the second agent
  4. run the pipeline -> pipeline run succeeds

Scenario 2: pipeline explicitly using built-in

  1. take the built-in node offline
  2. create a second agent
  3. Create pipeline job that will run something on the built-in
  4. run the pipeline -> pipeline run waits for the built-in node

Proposed changelog entries

  • Allow pipeline jobs to run when built-in is offline.

Proposed upgrade guidelines

N/A

Submitter checklist

Edit tasklist title
Beta Give feedback Tasklist Submitter checklist, more options

Delete tasklist

Delete tasklist block?
Are you sure? All relationships in this tasklist will be removed.
  1. The Jira issue, if it exists, is well-described.
    Options
  2. The changelog entries and upgrade guidelines are appropriate for the audience affected by the change (users or developers, depending on the change) and are in the imperative mood (see examples). Fill in the Proposed upgrade guidelines section only if there are breaking changes or changes that may require extra steps from users during upgrade.
    Options
  3. There is automated testing or an explanation as to why this change has no tests.
    Options
  4. New public classes, fields, and methods are annotated with @Restricted or have @since TODO Javadocs, as appropriate.
    Options
  5. New deprecations are annotated with @Deprecated(since = "TODO") or @Deprecated(forRemoval = true, since = "TODO"), if applicable.
    Options
  6. New or substantially changed JavaScript is not defined inline and does not call eval to ease future introduction of Content Security Policy (CSP) directives (see documentation).
    Options
  7. For dependency updates, there are links to external changelogs and, if possible, full differentials.
    Options
  8. For new APIs and extension points, there is a link to at least one consumer.
    Options

Desired reviewers

@mention

Before the changes are marked as ready-for-merge:

Maintainer checklist

Edit tasklist title
Beta Give feedback Tasklist Maintainer checklist, more options

Delete tasklist

Delete tasklist block?
Are you sure? All relationships in this tasklist will be removed.
  1. There are at least two (2) approvals for the pull request and no outstanding requests for change.
    Options
  2. Conversations in the pull request are over, or it is explicit that a reviewer is not blocking the change.
    Options
  3. Changelog entries in the pull request title and/or Proposed changelog entries are accurate, human-readable, and in the imperative mood.
    Options
  4. Proper changelog labels are set so that the changelog can be generated automatically.
    Options
  5. If the change needs additional upgrade steps from users, the upgrade-guide-needed label is set and there is a Proposed upgrade guidelines section in the pull request title (see example).
    Options
  6. If it would make sense to backport the change to LTS, a Jira issue must exist, be a Bug or Improvement, and be labeled as lts-candidate to be considered (see query).
    Options

When a pipeline starts building it creates an OneOffExecutor that takes
care of the pipeline execution on the built-in node.
The executor has some logic to prevent running things when an agent has
gone offline in the timeframe between assiging the task to the executor
and the executor actually starting running the task. But this logic
falsely leads to a termination of the executor for the pipeline job and
the attempts to restart the task fails as the task is no longer in the
queue.
Thas change tries to avoid this by ignoring the online state for the
built-in node as it will never be really offline (there is no channel
that can be closed). One can take it temporarily offline but this should
not prevent pipelines that do not explicitly make use of the built-in to
start running.
@NotMyFault NotMyFault requested a review from a team April 29, 2024 08:20
@NotMyFault NotMyFault added the rfe For changelog: Minor enhancement. use `major-rfe` for changes to be highlighted label Apr 29, 2024
@NotMyFault NotMyFault requested a review from a team May 1, 2024 07:58
@StefanSpieker
Copy link
Contributor

In general, I like the idea, but in the past the current behavior also helped us when the controller disk space ran out and the built-in node was taken offline automatically. We do not run jobs on the built-in node, but since it is needed for starting jobs, it prevented further jobs to run.
We have a better monitoring today, so we hadn't had this issue for quite some time, but it might be worth considering.

@mawinter69
Copy link
Contributor Author

I think that when a node is offline it should only affect explicit usage of the node i.e. because it is matched by a label expression in a node step or due to a job restriction.
The situation as it is now is bad as pipelines jobs just fail as if they were never triggered, only in the logs is some message where it is not instantly clear what happened. Better have the job fail with disk problems I would say.
Maybe one should check before starting the flyweighttask if the node is online and if not then leave it in the queue. But that is a bigger change and requires thorough testing.

Copy link
Member

@timja timja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/label ready-for-merge


This PR is now ready for merge, after ~24 hours, we will merge it if there's no negative feedback.

Thanks!

@comment-ops-bot comment-ops-bot bot added the ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback label May 6, 2024
@Luckybangar

This comment was marked as off-topic.

@timja timja merged commit fa0464c into jenkinsci:master May 8, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback rfe For changelog: Minor enhancement. use `major-rfe` for changes to be highlighted
Projects
None yet
5 participants