Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with Cilium in eBPF replacement mode on Rocky 9.1 #32465

Closed
3 tasks done
kreeuwijk opened this issue May 10, 2024 · 1 comment
Closed
3 tasks done

Issues with Cilium in eBPF replacement mode on Rocky 9.1 #32465

kreeuwijk opened this issue May 10, 2024 · 1 comment
Labels
kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps.

Comments

@kreeuwijk
Copy link
Contributor

kreeuwijk commented May 10, 2024

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

When trying to run Cilium in eBPF replacement mode on Rocky 9.1, I'm running into several issues:

  • The agent fails to start, logging it could not find the kernel config file. I can work around this by manually adjusting the agent daemonset to mount the /boot directory again, as was removed in 1.13.x.
  • Even if the /boot mount workaround is implemented, I still have to temporarily delete the cert-manager webhooks, otherwise the kube API server is unable to process the requests from the cilium operator to add the custom resourcs. I also needed to manually restart the kube API server container after removing the webhooks to get this to go through.

Running Cilium in regular kube-proxy mode has no issues at all.
Any idea what could be causing this?

Cilium Version

Client: 1.15.3 22dfbc58 2024-03-26T11:45:10+01:00 go version go1.21.8 linux/amd64
Daemon: 1.15.3 22dfbc58 2024-03-26T11:45:10+01:00 go version go1.21.8 linux/amd64

Kernel Version

Linux maas-pool-10.metal.dreamworx.nl 5.14.0-362.24.1.el9_3.0.1.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Apr 4 22:31:43 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes Version

Client Version: v1.28.5
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.5

Regression

Unknown

Sysdump

cilium-sysdump-20240510-101723.zip

Relevant log output

No response

Anything else?

No response

Cilium Users Document

  • Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

  • I agree to follow this project's Code of Conduct
@kreeuwijk kreeuwijk added kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. labels May 10, 2024
@kreeuwijk
Copy link
Contributor Author

Solved it. It turned out the /etc/resolv.conf contained a DNS server on the first line that was a leftover of the image-build process. This invalid DNS server entry was causing the actual issue with the kube API server, as the DNS lookup delay triggered the timeout for the cert-manager webhook.
Clearing the invalid entry from /etc/resolv.conf fixed all issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps.
Projects
None yet
Development

No branches or pull requests

1 participant