Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: changes for building CAPA amis #1583

Merged
merged 1 commit into from
Jan 30, 2025

Conversation

richardcase
Copy link
Member

Change description

Related issues

  • Fixes #

Additional context

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 10, 2024
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 10, 2024
@holmanb
Copy link

holmanb commented Oct 10, 2024

@richardcase per your comment:

We're running into an error when the boothook script runs in the local stage as the network and AWS creds are not setup yet and so the AWS cli calls fail. However, with this CAPA branch there is still a reboot and so when the machine comes up the second time the boothook script runs (as the network and creds are setup) and we get k8s coming up.

I think we can resolve this by not attempting to get data during the network stage. To accomplish this, we can apply the following change, which will override the local datasource to do nothing and return false when DataSourceEc2KubernetesLocal._get_data() is called, so that during network stage cloud-init will try to get data using the DataSourceEc2Kubernetes class. This should force the boothook to run during the network stage, at which point the AWS creds should presumably be setup (although I'm now sure what is responsible for that).

diff --git a/cloudinit/sources/DataSourceEc2Kubernetes.py b/cloudinit/sources/DataSourceEc2Kubernetes.py
index 50f51faf0..115bba3b6 100644
--- a/cloudinit/sources/DataSourceEc2Kubernetes.py
+++ b/cloudinit/sources/DataSourceEc2Kubernetes.py
@@ -89,7 +89,8 @@ class DataSourceEc2Kubernetes(DataSourceEc2.DataSourceEc2):
 
 
 class DataSourceEc2KubernetesLocal(DataSourceEc2Kubernetes):
-    perform_dhcp_setup = True  # Use dhcp before querying metadata
+    def _get_data(self):
+        return False
 
 
 # Used to match classes to dependencies

@Nalum
Copy link

Nalum commented Oct 10, 2024

@holmanb does DataSourceEc2KubernetesLocal need to extend DataSourceEc2Kubernetes? We were discussing this and thought we might be able to do DataSourceEc2KubernetesLocal(DataSourceEc2.DataSourceEc2), but overriding _get_data works too.

@holmanb
Copy link

holmanb commented Oct 10, 2024

@holmanb does DataSourceEc2KubernetesLocal need to extend DataSourceEc2Kubernetes?

Not necessarily. There are a few different ways to accomplish the same thing.

We were discussing this and thought we might be able to do DataSourceEc2KubernetesLocal(DataSourceEc2.DataSourceEc2), but overriding _get_data works too.

It really doesn't matter which you inherit from for this local datasource. Either way you still need to override _get_data to do nothing and return false. If you just inherit from DataSourceEc2.DataSourceEc2, then I would expect that getting the user-data (the MIME containing the boothook and include-url) will succeed in the local stage and it won't try again in network stage, so it wouldn't behave any differently than the current Ec2 datasource does.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 9, 2025
@richardcase
Copy link
Member Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 9, 2025
@richardcase
Copy link
Member Author

@dlipovetsky - we should follow up on this so that we can stop downgrading and pinning cloud-init

@richardcase
Copy link
Member Author

After chatting with @faiq i will pick this up again.

@faiq
Copy link
Contributor

faiq commented Jan 23, 2025

I gave this code a try here

main...faiq:image-builder:custom-ami-ds

and then built an AMI with it

root@<redacted>:/var/snap/amazon-ssm-agent/9881# bash
root@<redacted>:/var/snap/amazon-ssm-agent/9881# cat /var/log/cloud-init-output.log 
[2025-01-23 20:04:00] Cloud-init v. 24.4-0ubuntu1~22.04.1 running 'init-local' at Thu, 23 Jan 2025 20:04:00 +0000. Up 10.33 seconds.
[2025-01-23 20:04:03] Cloud-init v. 24.4-0ubuntu1~22.04.1 running 'init' at Thu, 23 Jan 2025 20:04:03 +0000. Up 13.05 seconds.
[2025-01-23 20:04:03] ci-info: ++++++++++++++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++++++++++++++
[2025-01-23 20:04:03] ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
[2025-01-23 20:04:03] ci-info: | Device |  Up  |           Address           |      Mask     | Scope  |     Hw-Address    |
[2025-01-23 20:04:03] ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
[2025-01-23 20:04:03] ci-info: |  ens5  | True |         <redacted>        | 255.255.192.0 | global | 06:50:f1:f8:c1:6f |
[2025-01-23 20:04:03] ci-info: |  ens5  | True | <redacted> |       .       |  link  | 06:50:f1:f8:c1:6f |
[2025-01-23 20:04:03] ci-info: |   lo   | True |          127.0.0.1          |   255.0.0.0   |  host  |         .         |
[2025-01-23 20:04:03] ci-info: |   lo   | True |           ::1/128           |       .       |  host  |         .         |
[2025-01-23 20:04:03] ci-info: +--------+------+-----------------------------+---------------+--------+-------------------+
[2025-01-23 20:04:03] ci-info: +++++++++++++++++++++++++++++Route IPv4 info++++++++++++++++++++++++++++++
[2025-01-23 20:04:03] ci-info: +-------+-------------+------------+-----------------+-----------+-------+
[2025-01-23 20:04:03] ci-info: | Route | Destination |  Gateway   |     Genmask     | Interface | Flags |
[2025-01-23 20:04:03] ci-info: +-------+-------------+------------+-----------------+-----------+-------+
[2025-01-23 20:04:03] ci-info: |   0   |   0.0.0.0   | 10.0.128.1 |     0.0.0.0     |    ens5   |   UG  |
[2025-01-23 20:04:03] ci-info: |   1   |   10.0.0.2  | 10.0.128.1 | 255.255.255.255 |    ens5   |  UGH  |
[2025-01-23 20:04:03] ci-info: |   2   |  10.0.128.0 |  0.0.0.0   |  255.255.192.0  |    ens5   |   U   |
[2025-01-23 20:04:03] ci-info: |   3   |  10.0.128.1 |  0.0.0.0   | 255.255.255.255 |    ens5   |   UH  |
[2025-01-23 20:04:03] ci-info: +-------+-------------+------------+-----------------+-----------+-------+
[2025-01-23 20:04:03] ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++
[2025-01-23 20:04:03] ci-info: +-------+-------------+---------+-----------+-------+
[2025-01-23 20:04:03] ci-info: | Route | Destination | Gateway | Interface | Flags |
[2025-01-23 20:04:03] ci-info: +-------+-------------+---------+-----------+-------+
[2025-01-23 20:04:03] ci-info: |   1   |  fe80::/64  |    ::   |    ens5   |   U   |
[2025-01-23 20:04:03] ci-info: |   3   |   anycast   |    ::   |    ens5   |   U   |
[2025-01-23 20:04:03] ci-info: |   4   |    local    |    ::   |    ens5   |   U   |
[2025-01-23 20:04:03] ci-info: |   5   |  multicast  |    ::   |    ens5   |   U   |
[2025-01-23 20:04:03] ci-info: +-------+-------------+---------+-----------+-------+
[2025-01-23 20:04:03] 2025-01-23 20:04:03,310 - user_data.py[WARNING]: [Errno 2] No such file or directory: '/etc/secret-userdata.txt' for url: file:///etc/secret-userdata.txt
[2025-01-23 20:04:03] 2025-01-23 20:04:03,312 - DataSourceEc2Kubernetes.py[WARNING]: Kubernetes is trying to restart cloud-init. This is no longer necessary and is temporarily circumvented by cloud-init. This will be a hard error in the future.
[2025-01-23 20:04:03] +++ [2025-01-23T20:04:03+00:00] aws.cluster.x-k8s.io encrypted cloud-init script /var/lib/cloud/instances/i-05e283ee175330d10/boothooks/part-001 started
[2025-01-23 20:04:03] +++ [2025-01-23T20:04:03+00:00] secret prefix: aws.cluster.x-k8s.io/9d38f6d6-6872-439d-8c95-b206a9e700fe
[2025-01-23 20:04:03] +++ [2025-01-23T20:04:03+00:00] secret count: 2
[2025-01-23 20:04:03] +++ [2025-01-23T20:04:03+00:00] getting userdata from AWS Secrets Manager
[2025-01-23 20:04:03] +++ [2025-01-23T20:04:03+00:00] getting secret value from AWS Secrets Manager
[2025-01-23 20:04:15] +++ [2025-01-23T20:04:15+00:00] AWS CLI reported successful execution for SecretsManager::GetSecretValue
[2025-01-23 20:04:15] +++ [2025-01-23T20:04:15+00:00] appending data to temporary file /etc/secret-userdata.txt.gz
[2025-01-23 20:04:15] +++ [2025-01-23T20:04:15+00:00] getting userdata from AWS Secrets Manager
[2025-01-23 20:04:15] +++ [2025-01-23T20:04:15+00:00] getting secret value from AWS Secrets Manager
[2025-01-23 20:04:16] +++ [2025-01-23T20:04:16+00:00] AWS CLI reported successful execution for SecretsManager::GetSecretValue
[2025-01-23 20:04:16] +++ [2025-01-23T20:04:16+00:00] appending data to temporary file /etc/secret-userdata.txt.gz
[2025-01-23 20:04:16] +++ [2025-01-23T20:04:16+00:00] deleting secret from AWS Secrets Manager
[2025-01-23 20:04:17] +++ [2025-01-23T20:04:17+00:00] AWS CLI reported successful execution for SecretsManager::DeleteSecret
[2025-01-23 20:04:17] +++ [2025-01-23T20:04:17+00:00] deleting secret from AWS Secrets Manager
[2025-01-23 20:04:18] +++ [2025-01-23T20:04:18+00:00] AWS CLI reported successful execution for SecretsManager::DeleteSecret
[2025-01-23 20:04:18] +++ [2025-01-23T20:04:18+00:00] decompressing userdata to /etc/secret-userdata.txt
[2025-01-23 20:04:18] +++ [2025-01-23T20:04:18+00:00] restarting cloud-init
[2025-01-23 20:04:18] +++ [2025-01-23T20:04:18+00:00] aws.cluster.x-k8s.io encrypted cloud-init script /var/lib/cloud/instances/i-05e283ee175330d10/boothooks/part-001 finished
[2025-01-23 20:04:19] +++ [2025-01-23T20:04:19+00:00] aws.cluster.x-k8s.io encrypted cloud-init script /var/lib/cloud/instances/i-05e283ee175330d10/boothooks/part-001 started
[2025-01-23 20:04:19] +++ [2025-01-23T20:04:19+00:00] secret prefix: aws.cluster.x-k8s.io/9d38f6d6-6872-439d-8c95-b206a9e700fe
[2025-01-23 20:04:19] +++ [2025-01-23T20:04:19+00:00] secret count: 2
[2025-01-23 20:04:19] +++ [2025-01-23T20:04:19+00:00] encrypted userdata already written to disk
[2025-01-23 20:04:19] +++ [2025-01-23T20:04:19+00:00] aws.cluster.x-k8s.io encrypted cloud-init script /var/lib/cloud/instances/i-05e283ee175330d10/boothooks/part-001 finished
[2025-01-23 20:04:20] Generating public/private rsa key pair.
[2025-01-23 20:04:20] Your identification has been saved in /etc/ssh/ssh_host_rsa_key
[2025-01-23 20:04:20] Your public key has been saved in /etc/ssh/ssh_host_rsa_key.pub
[2025-01-23 20:04:20] The key fingerprint is:
[2025-01-23 20:04:20] SHA256:04WxATNEf2A739Ht8f+GEhVlNF2pK7SeBjv/BByOagk root@ip-10-0-168-254
[2025-01-23 20:04:20] The key's randomart image is:
[2025-01-23 20:04:20] +---[RSA 3072]----+
[2025-01-23 20:04:20] |       o*.=    +O|
[2025-01-23 20:04:20] |         = B  .++|
[2025-01-23 20:04:20] |          *.o ooo|
[2025-01-23 20:04:20] |         .+*.o.oo|
[2025-01-23 20:04:20] |     E  S.o+o.o o|
[2025-01-23 20:04:20] |      . oo oo.  .|
[2025-01-23 20:04:20] |       +  + oo ..|
[2025-01-23 20:04:20] |      .  o +o . o|
[2025-01-23 20:04:20] |          +..o ..|
[2025-01-23 20:04:20] +----[SHA256]-----+
[2025-01-23 20:04:20] Generating public/private ecdsa key pair.
[2025-01-23 20:04:20] Your identification has been saved in /etc/ssh/ssh_host_ecdsa_key
[2025-01-23 20:04:20] Your public key has been saved in /etc/ssh/ssh_host_ecdsa_key.pub
[2025-01-23 20:04:20] The key fingerprint is:
[2025-01-23 20:04:20] SHA256:WQNF5Z01wFRBlBBeTMCa7N2IQKrdszPfBfeGGpYZ1mc root@ip-10-0-168-254
[2025-01-23 20:04:20] The key's randomart image is:
[2025-01-23 20:04:20] +---[ECDSA 256]---+
[2025-01-23 20:04:20] |        .oo.=*@O+|
[2025-01-23 20:04:20] |         o ..+.=.|
[2025-01-23 20:04:20] |        o + +.o  |
[2025-01-23 20:04:20] |       . + = .   |
[2025-01-23 20:04:20] |      o S o =.+.E|
[2025-01-23 20:04:20] |     . . o + *o=.|
[2025-01-23 20:04:20] |          o = ..o|
[2025-01-23 20:04:20] |         + ..o.. |
[2025-01-23 20:04:20] |          +...   |
[2025-01-23 20:04:20] +----[SHA256]-----+
[2025-01-23 20:04:20] Generating public/private ed25519 key pair.
[2025-01-23 20:04:20] Your identification has been saved in /etc/ssh/ssh_host_ed25519_key
[2025-01-23 20:04:20] Your public key has been saved in /etc/ssh/ssh_host_ed25519_key.pub
[2025-01-23 20:04:20] The key fingerprint is:
[2025-01-23 20:04:20] SHA256:PCMrFLxJqM34AbLCNl2YIAkjnAsvTBT1ckaOWFFx5wM root@ip-10-0-168-254
[2025-01-23 20:04:20] The key's randomart image is:
[2025-01-23 20:04:20] +--[ED25519 256]--+
[2025-01-23 20:04:20] |O=++o+.E .       |
[2025-01-23 20:04:20] |==+oB . +        |
[2025-01-23 20:04:20] |*oo=+*   o       |
[2025-01-23 20:04:20] |+Xo.=+ .  .      |
[2025-01-23 20:04:20] |==+.+ . S        |
[2025-01-23 20:04:20] |o..o   o o       |
[2025-01-23 20:04:20] |  . . .          |
[2025-01-23 20:04:20] |     .           |
[2025-01-23 20:04:20] |                 |
[2025-01-23 20:04:20] +----[SHA256]-----+
[2025-01-23 20:04:23] Cloud-init v. 24.4-0ubuntu1~22.04.1 running 'modules:config' at Thu, 23 Jan 2025 20:04:23 +0000. Up 33.75 seconds.
[2025-01-23 20:04:25] Cloud-init v. 24.4-0ubuntu1~22.04.1 running 'modules:final' at Thu, 23 Jan 2025 20:04:25 +0000. Up 35.54 seconds.
[2025-01-23 20:04:25] Cloud-init v. 24.4-0ubuntu1~22.04.1 finished at Thu, 23 Jan 2025 20:04:25 +0000. Datasource DataSourceEc2Kubernetes.  Up 35.70 seconds

these changes worked great for me.

i think we can add the following.

  1. this seems to only apply to new versions of cloud-init and can be expected to work in focal (20.04):
    https://cloudinit.readthedocs.io/en/latest/development/feature_flags.html#cloudinit.features.ERROR_ON_USER_DATA_FAILURE
    with this being said, we should only do this when cloud-init is a newer version.

  2. right now there's changes that seem to imply to only work for debian -- mainly this file90_dpkg.cfg we should make sure that this works for non-debian as well.

EDIT:

although the file /etc/secret-userdata.txt was present and written out with the expected contents the actual bootstrapping didn't work so i will give this another try again.

@faiq
Copy link
Contributor

faiq commented Jan 28, 2025

I believe this PR is ready to come out of draft I've tested with both ubuntu 22.04 and 24.04 images

@faiq
Copy link
Contributor

faiq commented Jan 29, 2025

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 29, 2025
@richardcase richardcase marked this pull request as ready for review January 29, 2025 17:16
@richardcase richardcase changed the title WIP: chore: changes for building CAPA amis chore: changes for building CAPA amis Jan 29, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 29, 2025
Copy link
Member

@AverageMarcus AverageMarcus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/assign @mboersma

Copy link
Contributor

@mboersma mboersma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@richardcase could you squash the commits?

@faiq
Copy link
Contributor

faiq commented Jan 30, 2025

i can squash the commits @mboersma

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 30, 2025
Copy link
Contributor

@mboersma mboersma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 30, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mboersma

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 30, 2025
cloud-init restart

Signed-off-by: Richard Case <[email protected]>
Co-authored-by: Faiq Raza <[email protected]>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 30, 2025
@dlipovetsky
Copy link

Thanks to everyone for working together to solve this 🙇

@faiq
Copy link
Contributor

faiq commented Jan 30, 2025

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 30, 2025
@k8s-ci-robot k8s-ci-robot merged commit 73be926 into kubernetes-sigs:main Jan 30, 2025
11 checks passed
@SriRamanujam
Copy link

Can confirm this works on the latest Ubuntu 24.04 daily images running cloud-init 24.4-0ubuntu1~24.04.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants