Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reboot on network errors #101

Closed
wants to merge 13 commits into from
Closed

Reboot on network errors #101

wants to merge 13 commits into from

Conversation

majst01
Copy link
Contributor

@majst01 majst01 commented Jan 28, 2023

closes #100

maybe we should also add a dhclient impl into metal-hammer because kernel dhcp does not retry on errors:

https://github.com/u-root/u-root/blob/main/cmds/boot/pxeboot/pxeboot.go#L77

Current Errors

2023-01-31T07:58:04.498Z        error   failed waiting for allocation   {"retry after": 2, "error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp: lookup metal.metalstack.cloud on 1.1.1.1:53: read udp 10.255.253.201:60811->1.1.1.1:53: i/o timeout\""}

2023-01-31T07:58:10.633Z        error   event   {"cannot send event": "Alive", "error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial tcp: lookup metal.metalstack.cloud on 1.1.1.1:53: read udp 10.255.253.201:60811->1.1.1.1:53: i/o timeout\""}

@majst01 majst01 changed the base branch from master to sonic-support January 31, 2023 08:01
@majst01 majst01 changed the base branch from sonic-support to master January 31, 2023 08:37
@majst01 majst01 changed the base branch from master to sonic-support February 7, 2023 11:21
@majst01 majst01 marked this pull request as ready for review February 7, 2023 11:22
@majst01 majst01 requested a review from a team as a code owner February 7, 2023 11:22
Base automatically changed from sonic-support to master March 2, 2023 12:05
@Gerrit91
Copy link
Contributor

Gerrit91 commented Mar 2, 2023

Needs rebase

@majst01 majst01 marked this pull request as draft March 29, 2023 12:42
@majst01
Copy link
Contributor Author

majst01 commented Apr 10, 2024

Wont be taken,

@majst01 majst01 closed this Apr 10, 2024
@majst01 majst01 deleted the reboot-on-network-errors branch April 10, 2024 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

If network is not reachable anymore, hammer will get stuck in a unusable state
3 participants