Skip to content

NO-ISSUE: add RestartSec=10s for systemd service #4978

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

lance5890
Copy link
Contributor

@lance5890 lance5890 commented May 28, 2025

Which issue(s) this PR addresses:

As the default RestartSec is 100ms,The microshift may not startup in default StartLimitBurst(5 times) ; we should expand the systemd RestartSec to 10s,just like the etcd serivce does as follows:

@openshift-ci openshift-ci bot requested review from copejon and eslutsky May 28, 2025 08:27
Copy link
Contributor

openshift-ci bot commented May 28, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: lance5890
Once this PR has been reviewed and has the lgtm label, please assign jerpeter1 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 28, 2025
Copy link
Contributor

openshift-ci bot commented May 28, 2025

Hi @lance5890. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@lance5890
Copy link
Contributor Author

/cc @pacevedom

@openshift-ci openshift-ci bot requested a review from pacevedom May 28, 2025 08:31
@lance5890 lance5890 changed the title add RestartSec=10s for systemd service NO-ISSUE: add RestartSec=10s for systemd service May 28, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 28, 2025
@openshift-ci-robot
Copy link

@lance5890: This pull request explicitly references no jira issue.

In response to this:

Which issue(s) this PR addresses:

As the default RestartSec is 100ms,The microshift may not startup in default StartLimitBurst(5 times) ; we should expand the systemd RestartSec to 10s,just like the etcd serivce does as follows:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@pmtk
Copy link
Member

pmtk commented Jun 5, 2025

Hey @lance5890, thanks for your contribution.

Could you explain how increasing time between restarts will help you?
What problem are you observing?

/label ok-to-test

Copy link
Contributor

openshift-ci bot commented Jun 5, 2025

@pmtk: The label(s) /label ok-to-test cannot be applied. These labels are supported: acknowledge-critical-fixes-only, platform/aws, platform/azure, platform/baremetal, platform/google, platform/libvirt, platform/openstack, ga, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, px-approved, docs-approved, qe-approved, ux-approved, no-qe, downstream-change-needed, rebase/manual, cluster-config-api-changed, approved, backport-risk-assessed, bugzilla/valid-bug, cherry-pick-approved, jira/valid-bug, stability-fix-approved, staff-eng-approved. Is this label configured under labels -> additional_labels or labels -> restricted_labels in plugin.yaml?

In response to this:

Hey @lance5890, thanks for your contribution.

Could you explain how increasing time between restarts will help you?
What problem are you observing?

/label ok-to-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@pmtk
Copy link
Member

pmtk commented Jun 5, 2025

/ok-to-test

@openshift-ci openshift-ci bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 5, 2025
@lance5890
Copy link
Contributor Author

lance5890 commented Jun 5, 2025

Hey @lance5890, thanks for your contribution.

Could you explain how increasing time between restarts will help you? What problem are you observing?

/label ok-to-test

We deployed MicroShift on a host with insufficient performance. During startup, we found that the network was not fully configured before the systemd service started, which caused MicroShift to fail to start. After failing more than five times, it stopped attempting to start. The issue was resolved after we manually adjusted the RestartSec=10s.
image

@lance5890
Copy link
Contributor Author

/retest-required

@lance5890 lance5890 force-pushed the add_RestartSec_10s branch 2 times, most recently from 21a2dbd to 7f7c30c Compare June 6, 2025 05:40
@lance5890 lance5890 force-pushed the add_RestartSec_10s branch from 7f7c30c to 2449320 Compare June 6, 2025 05:41
@pmtk
Copy link
Member

pmtk commented Jun 6, 2025

I noticed you've taken steps to address the problem with StartLimitIntervalSec.
However, I'm afraid that this change needlessly increases the timeout that some feature depend on and I'm hesitant to apply it by default.

I would suggest instead that you use a systemd drop-in to configure required settings to match your environment.

@lance5890
Copy link
Contributor Author

I noticed you've taken steps to address the problem with StartLimitIntervalSec. However, I'm afraid that this change needlessly increases the timeout that some feature depend on and I'm hesitant to apply it by default.

I would suggest instead that you use a systemd drop-in to configure required settings to match your environment.

directly extending the RestartSec parameter to 10s might be a bit risky, but I still suggest starting by modifying this value to 1s (compared to the default of 100ms) would resolve most issues

Copy link
Contributor

openshift-ci bot commented Jun 6, 2025

@lance5890: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 12, 2025
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. ok-to-test Indicates a non-member PR verified by an org member that is safe to test.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants