Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow scaling events to be logged during cluster validation #16354

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

rifelpet
Copy link
Member

The e2e upgrade jobs that have been migrated to the new prow cluster are failing to validate mid rolling-update

I0209 13:36:37.740937 14123 instancegroups.go:560] Cluster did not pass validation within deadline: InstanceGroup "nodes-us-west-2a" did not have enough nodes 1 vs 4.

We can get scaling activities on the ASG which should mention if the AWS autoscaling service is failing to launch nodes for some reason (resource quota, capacity, etc.)

This allows those events to be logged at --v=4 level, and sets that level on the upgrade scripts.

I included autoscaling activity for both AWS and GCE. other providers can add their implementations separately.

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. area/provider/aws Issues or PRs related to aws provider labels Feb 15, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from rifelpet. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the area/provider/gcp Issues or PRs related to gcp provider label Feb 15, 2024
@rifelpet
Copy link
Member Author

trying this out:

/test pull-kops-e2e-aws-upgrade-k127-ko127-to-klatest-kolatest-many-addons

though this job sounds invalid, since a k8s 1.27 cluster can't be upgraded to k8s latest (1.29)

@justinsb
Copy link
Member

I like the idea!

though this job sounds invalid, since a k8s 1.27 cluster can't be upgraded to k8s latest (1.29)

(Do we know that it can't? It's not considered a supported path, but in practice most skip-upgrades do work)

Copy link
Member

@justinsb justinsb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool idea!

@@ -709,6 +710,17 @@ func (c *instanceGroupManagerClientImpl) List(ctx context.Context, project, zone
return ms, nil
}

func (c *instanceGroupManagerClientImpl) ListErrors(ctx context.Context, project, zone, name string) ([]*compute.InstanceManagedByIgmError, error) {
Copy link
Member

@justinsb justinsb Feb 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside: I sort of regret our having this layer in GCE, I'm not sure it adds much!

@@ -1196,13 +1196,29 @@ func awsBuildCloudInstanceGroup(ctx context.Context, c AWSCloud, cluster *kops.C
return nil, fmt.Errorf("failed to fetch instances: %v", err)
}

scalingReq := &autoscaling.DescribeScalingActivitiesInput{
Copy link
Member

@justinsb justinsb Feb 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know whether this gets expensive, but I wonder if we should plumb through a flag here as to whether we want this information (or add a method or func to CloudInstanceGroup that could query it on-demand)

for _, activity := range p.Activities {
event := cloudinstances.ScalingEvent{
Timestamp: aws.TimeValue(activity.StartTime),
Description: aws.StringValue(activity.Description),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there's some extra fields; I'm not sure if Description has all/most of the information, but you might consider including the Raw any field (which will then get printed!)

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/aws Issues or PRs related to aws provider area/provider/gcp Issues or PRs related to gcp provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants