enhance flow for custom machineconfigs for specific machinesets #1619

cgwalters · 2020-04-06T14:18:53Z

The CI team is trying to use new AWS m5d.xlarge instances which have two NVMe disks attached. We crafted a custom RAID partition machineconfig to enable that.

We added this as part of the main worker pool - the MCD will fail to roll out the partitioning on the existing workers, but that's fine because the plan was to "roll" the worker pool. Basically get the new MC in the pool, have new workers come online with that config, then scale down the old workers.

However, there are a few issues here.

First, this whole thing would obviously be a lot better if we had machineset-specific machineconfigs. That would solve a bunch of races and be much more elegant.

What we're seeing right now is that one new m5d node went OutOfDisk=true because it was booted with just a 16G root volume from the old config. That unschedulable node then blocks rollout of further changes.

I think we can unstick ourselves here by deleting that node and getting the MCO to roll out the new config.

The text was updated successfully, but these errors were encountered:

cgwalters · 2020-04-06T15:33:57Z

And the real problem here is that the MCS serves:

currConf := mp.Status.Configuration.Name

So new nodes will only get the last config that successfully rolled out, so this attempt deadlocks.
We really need machineset-specific MCs.

cgwalters · 2020-04-06T15:36:13Z

Another idea: some sort of "force MCS to serve mp.Spec.Configuration.Name" config option?

cgwalters · 2020-04-06T15:40:07Z

First, this whole thing would obviously be a lot better if we had machineset-specific machineconfigs.

This though gets into the "node identity" problem - see also #784
Which could be e.g. a secret value passed through user data and then provided to the MCS, which can link that to a machine object, and from there a machineset.

cgwalters · 2020-04-06T15:45:06Z

If someone wants to change the machine type of their workers today I think it'd require a custom worker-new pool and also creating a custom user-data that does s/worker/worker-new/ that's referenced from the new machineset.

Alternatively, scaling down to 0 workers then back up should work, but obviously that's a lot more disruptive.

cgwalters · 2020-04-06T15:48:27Z

The other alternative would be having something like a spec.machineSet in a pool - then the MCO could understand how to say "okay, the new config has partitioning changes, so let me destroy the machine objects and provide new configs to newly provisioned nodes". That'd be the most elegant but would require tightest integration between MCO and machineAPI.

cgwalters · 2020-04-06T19:20:48Z

An entirely different idea is for something in the stack to have a high level tunable knob like:

storage:
  instanceAttached: /var

This would be a MachineConfig fragment.

This logic could potentially live in the MCD; rather than having a machineConfig object describe a specific number of disks, the MCD would dynamically look at the node and say "oh this instance has 2" and set things up accordingly.

The tricky part with this is that simply waiting for "all disks" gets racy. It'd really be best done with knowledge of how many disks to expect. We may need some integration with the machineAPI so we know how many disks to expect per instance type (per configuration) - it feels dangerous to just unilaterally take over block devices we see, though maybe there are provider-specific ways to know they're ephemeral disks.

michaelgugino · 2020-04-07T02:01:15Z

This machineconfig provided in first comment: openshift/release#8102
And custom worker instructions: https://github.com/openshift/machine-config-operator/blob/master/docs/custom-pools.md

Work together well. All that you need to do is create a custom user-data secret. This can be done easily.

First, find an existing machineset you want to modify, and download the user-data secret.

./oc get secrets -n openshift-machine-api worker-user-data -oyaml > worker-user-data.out.yaml

Modify scrape the secret contents into a new file, base64 decode those contents (you'll get some json), edit the URL near the beginning of the JSON to specify the new mcp name you created. Base64 re-encode that modified json (make sure to use -w0 option so you don't get line wrap), paste that into the user-data yaml file, update the name to something sensible and oc apply.

Update your preferred machineset to utilize the m5d.4xlarge instance type and to utilize the new user-data secret you created. That's it.

cgwalters · 2020-04-07T15:37:12Z

Modify scrape the secret contents into a new file, base64 decode those contents (you'll get some json), edit the URL near the beginning of the JSON to specify the new mcp name you created.

This part is pretty user hostile I think. But doing better requires much tighter MCO/machineAPI integration; something like having the machineAPI ask the MCO "please give me the user data secret for pool X" or so.

michaelgugino · 2020-04-07T19:56:33Z

Modify scrape the secret contents into a new file, base64 decode those contents (you'll get some json), edit the URL near the beginning of the JSON to specify the new mcp name you created.

This part is pretty user hostile I think. But doing better requires much tighter MCO/machineAPI integration; something like having the machineAPI ask the MCO "please give me the user data secret for pool X" or so.

@cgwalters

We more or less are doing that by specifying a user-data secret.

The MCO could stamp out user-data for each pool, and users could just update/create a machineset that points to that secret.

cgwalters · 2020-08-06T18:51:37Z

I think in general a good pattern would be using /var for an ephemeral drive - but to make that work right we need to move e.g. the pull secret into /etc. Same reason we want to do that for #1190

michaelgugino · 2020-08-07T14:32:37Z

I think /var/lib/containers should be it's own drive. All of /var is too broad and might make the system impossible to recover if that volume is lost for whatever reason.

cgwalters · 2020-08-07T14:54:02Z

I think /var/lib/containers should be it's own drive.

Agree this is the most obvious thing to start with for OpenShift.

All of /var is too broad and might make the system impossible to recover if that volume is lost for whatever reason.

Remember for CoreOS systems, the system boots with an empty /var - the default files and directories there are populated e.g. by systemd-tmpfiles. We don't explicitly test nuking it though, and actually doing so gets into issues I raised in internal email like the fact that will remove your ssh keys and for that matter the whole /var/home/core directory.

michaelgugino · 2020-08-07T15:54:56Z

actually doing so gets into issues I raised in internal email like the fact that will remove your ssh keys and for that matter the whole /var/home/core directory.

Right, that's why I think maybe it's not the best idea to use all of /var as an ephemeral volume. That opens a totally different can of worms.

I think if we identify what the primary purpose of using these drives is, it can help us better align what the implementation should look like. For me, the primary purpose is increased IOPS for containers. Since the containers themselves are ephemeral, an ephermal disk seems like a good choice. If a pod needs persistent storage, it should utilize a PVC (static pods not withstanding, etcd, etc).

openshift-bot · 2020-11-05T19:54:40Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

cgwalters · 2020-11-24T19:31:04Z

/remove-lifecycle stale

openshift-bot · 2021-02-23T01:10:44Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

cgwalters · 2021-03-22T15:05:32Z

/lifecycle frozen

cgwalters · 2021-10-25T18:29:18Z

One thing that was pointed out in a chat about this is that today, one can also do this by providing a custom user data Ignition config for the desired machineset.

A core problem today is that nothing really "owns" the pointer data. But, we could make this cleaner if we had an explicit way to include a machineconfig fragment in a particular machineset's pointer data. That would get us out of the MCO having to be aware of this. But, it would mean "day 2" changes for that machineconfig object wouldn't work, or at least would only work for new nodes.

cgwalters · 2022-03-09T14:50:09Z

Also tangentially related to this, I did do a spike on https://github.com/cgwalters/coreos-cloud-instance-store-provisioner/

cgwalters mentioned this issue Apr 6, 2020

MCO: bump golang to 1.13 in 4.5 and above release openshift/release#8148

Merged

cgwalters mentioned this issue May 8, 2020

machine-specific machineconfigs #1720

Open

cgwalters mentioned this issue Jun 8, 2020

RFE: Move user-data secret creation from the installer into the MCO #683

Closed

This was referenced Jun 23, 2020

Automatically create user-data secret for each MCP openshift/enhancements#319

Closed

MCO: manage machines userdata secret openshift/enhancements#368

Merged

cgwalters mentioned this issue Aug 27, 2020

Bug 1873288: server: Target the spec configuration if we have at least one node #2035

Merged

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 5, 2020

cgwalters mentioned this issue Nov 24, 2020

Support cloud-specific instance storage coreos/ignition#1126

Closed

openshift-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 24, 2020

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 23, 2021

openshift-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 22, 2021

cgwalters mentioned this issue Jun 14, 2022

Heterogeneous architecture clusters openshift/enhancements#1014

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enhance flow for custom machineconfigs for specific machinesets #1619

enhance flow for custom machineconfigs for specific machinesets #1619

cgwalters commented Apr 6, 2020

cgwalters commented Apr 6, 2020

cgwalters commented Apr 6, 2020

cgwalters commented Apr 6, 2020

cgwalters commented Apr 6, 2020 •

edited

cgwalters commented Apr 6, 2020

cgwalters commented Apr 6, 2020 •

edited

michaelgugino commented Apr 7, 2020

cgwalters commented Apr 7, 2020

michaelgugino commented Apr 7, 2020

cgwalters commented Aug 6, 2020

michaelgugino commented Aug 7, 2020

cgwalters commented Aug 7, 2020

michaelgugino commented Aug 7, 2020

openshift-bot commented Nov 5, 2020

cgwalters commented Nov 24, 2020

openshift-bot commented Feb 23, 2021

cgwalters commented Mar 22, 2021

cgwalters commented Oct 25, 2021

cgwalters commented Mar 9, 2022

enhance flow for custom machineconfigs for specific machinesets #1619

enhance flow for custom machineconfigs for specific machinesets #1619

Comments

cgwalters commented Apr 6, 2020

cgwalters commented Apr 6, 2020

cgwalters commented Apr 6, 2020

cgwalters commented Apr 6, 2020

cgwalters commented Apr 6, 2020 • edited

cgwalters commented Apr 6, 2020

cgwalters commented Apr 6, 2020 • edited

michaelgugino commented Apr 7, 2020

cgwalters commented Apr 7, 2020

michaelgugino commented Apr 7, 2020

cgwalters commented Aug 6, 2020

michaelgugino commented Aug 7, 2020

cgwalters commented Aug 7, 2020

michaelgugino commented Aug 7, 2020

openshift-bot commented Nov 5, 2020

cgwalters commented Nov 24, 2020

openshift-bot commented Feb 23, 2021

cgwalters commented Mar 22, 2021

cgwalters commented Oct 25, 2021

cgwalters commented Mar 9, 2022

cgwalters commented Apr 6, 2020 •

edited

cgwalters commented Apr 6, 2020 •

edited