Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature/bug: expected behavior around short circuiting a netboot request #243

Open
jacobweinstock opened this issue Mar 23, 2022 · 2 comments
Labels
area/dhcp Issues or PRs related to DHCP

Comments

@jacobweinstock
Copy link
Member

Currently, there is logic to short circuit a netboot request based on some tink/cacher hardware data. see here. I looked through the Tink code base and didn't see any code paths where hardware data was updated based on a workflow progression. The tink worker sends report statuses as it progresses (ref here) but tink server doesn't update hardware data in any way. Taking all this into account it appears that the code here in Boots is expected some external entity to update the hardware data in conjunction with a workflow's progress. This makes the Boots -> Tink server combo always netboot unless hardware data is manually updated. This, in my option, is not expected behavior. This was also raised in the Tinkerbell community Slack channel, here. This feels like probably a feature request more than a bug. But at a bare minimum, a non-documented quark that affects a generally expected behavior, in my opinion.

CC @rothgar

Expected Behaviour

After a machine has been provisioned we should be able to boot from a local disk without changing the boot order.

Current Behaviour

See above.

Possible Solution

Write a workflow action that updates tink hardware data. This is just for the sake of giving any kind of workaround. I don't think this is a viable mid-long term solution.

Steps to Reproduce (for bugs)

Context

Your Environment

  • Operating System and version (e.g. Linux, Windows, MacOS):

  • How are you running Tinkerbell? Using Vagrant & VirtualBox, Vagrant & Libvirt, on Packet using Terraform, or give details:

  • Link to your project or a code example to reproduce issue:

@jacobweinstock
Copy link
Member Author

Thinking about this some more. There is currently functionality that allows a user to provide a custom ipxe script. In this custom ipxe case, tink worker is never launched. So if a solution to this issue is to have boots look for an "active" workflow, we'll need to think about how to handle this same idea with custom ipxe scripts.

@rothgar
Copy link

rothgar commented Apr 4, 2022

I updated my tink stack and found that once my system successfully completes the workflow on the next boot, boots doesn't respond from that mac address with a pxe response. This does the intended thing but has a different side effect I didn't anticipate.

Once my servers fail PXE and boot from the local disk my systems (for reasons unknown to me) add a UEFI boot item to the top of the boot order. This means if I ever want to pxe boot again I have to go into the bios for each device and change the order.

A better approach would be to respond to all PXE events and have a default workflow/iPXE target to boot locally (this is how cobbler and RHEL satellite/foreman worked in environments I worked in). There are a couple different iPXE options to do that and it will keep boot priority and ownership centralized in boots rather than rely on different BIOS support/configuration.

An ipxe script with this content will likely work in most situations

sanboot --no-describe --drive 0x80

More details in the docs
https://ipxe.org/cmd/sanboot

@jacobweinstock jacobweinstock added the area/dhcp Issues or PRs related to DHCP label Jul 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dhcp Issues or PRs related to DHCP
Projects
None yet
Development

No branches or pull requests

2 participants