Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clusterlint claims that webhook timeoutSeconds of 30 is too high #6907

Open
kyrofa opened this issue Apr 15, 2024 · 2 comments
Open

clusterlint claims that webhook timeoutSeconds of 30 is too high #6907

kyrofa opened this issue Apr 15, 2024 · 2 comments

Comments

@kyrofa
Copy link

kyrofa commented Apr 15, 2024

I've been using cert-manager for years. I really can't sing its praises enough. I've got it deployed as a helm chart. In that deployment method, the default value of the webhook timeoutSeconds is 30. According to clusterlint, this value is actually too high, and the max value should be 29. It's telling me that it will block the upgrade from k8s v1.28 to v1.29 in one of my clusters (which I host at DigitalOcean).

I guess I will change it to 29 in my local deployment to make clusterlint happy, but I'd like to get to the bottom of this. If clusterlint is correct (not saying it is, which is why I'm logging this issue in both places), you might consider making the default value 29. On the other hand, if 30 is indeed a valid value, clusterlint probably shouldn't complain about it.

@maelvls
Copy link
Member

maelvls commented Apr 18, 2024

Hey, thanks for raising this.

There is this comment in kubelint's code:

Webhooks with TimeoutSeconds set: less than 1 or greater than or equal to 30 is bad.

and

TimeoutSeconds value should be set to a non-nil value (greater than or equal to 1 and less than 30). If the TimeoutSeconds value is set to nil and the cluster version is 1.13.*, users are unable to configure the TimeoutSeconds value and this value will stay at nil, breaking upgrades. It's only for versions >= 1.14 that the value will default to 30 seconds.

I can't tell why 30 seconds is "bad" 😅

For context, I pushed for changing this value from 10 seconds to 30 seconds in #6488 with the intention of increasing the "context deadline timeout" to its maximum value so that the underlying timeout error message has more chance of being returned to the end user, thus making it easier to debug networking errors.

@kyrofa
Copy link
Author

kyrofa commented Apr 18, 2024

Ah, thank you for those additional details! I actually tried changing the timeout to 31 seconds and got an error, too. Combining what we're seeing in the code with that behavior, I believe 30 seconds to indeed be a valid value. I will bring this back to the clusterlint issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants