Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: adding testing trigger for running our e2e suite against the NodeAutoProvisioning Preview #178

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

Bryce-Soghigian
Copy link
Contributor

@Bryce-Soghigian Bryce-Soghigian commented Mar 4, 2024

Fixes #195

Description
This PR adds additional functionality to our testing toolkit to run the e2e suites on node auto provisioning, alongside the consolidation e2e suite which only works against node auto provisioning and not fully properly with unmanaged karpenter.

How was this change tested?

Then runs validating the standard e2es functionality haven't been tarnished: https://github.com/Azure/karpenter-provider-azure/actions/runs/8232044776
Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: #
  • No

Release Note


update

fix: fix

blah

test: nap suite consolidation

test

no need for acr

fix: adding preview cli

fix: removing make az-perm as we are using nap which should set this up
@Bryce-Soghigian Bryce-Soghigian changed the title Bsoghigian/running nap clean test: adding testing trigger for running our e2e suite against the NodeAutoProvisioning Preview Mar 4, 2024
@coveralls
Copy link

coveralls commented Mar 4, 2024

Pull Request Test Coverage Report for Build 8232173689

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 97.691%

Totals Coverage Status
Change from base Build 8194965275: 0.0%
Covered Lines: 35586
Relevant Lines: 36427

💛 - Coveralls

Copy link
Contributor Author

@Bryce-Soghigian Bryce-Soghigian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/test-nap

Copy link
Contributor Author

@Bryce-Soghigian Bryce-Soghigian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/test-nap

Copy link
Contributor Author

@Bryce-Soghigian Bryce-Soghigian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/nap

Copy link
Contributor Author

@Bryce-Soghigian Bryce-Soghigian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/nap

Copy link
Contributor Author

@Bryce-Soghigian Bryce-Soghigian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/test

Copy link
Contributor Author

@Bryce-Soghigian Bryce-Soghigian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/nap

@Bryce-Soghigian
Copy link
Contributor Author

Copy link
Contributor Author

@Bryce-Soghigian Bryce-Soghigian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/test

@Bryce-Soghigian
Copy link
Contributor Author

Validating standard tests wtill work and standard workflow triggers still work

@Bryce-Soghigian
Copy link
Contributor Author

running bash hack/boilerplate.sh is not generating the right configuration for the consolidation header for some reason

@Bryce-Soghigian Bryce-Soghigian marked this pull request as ready for review March 11, 2024 11:49
@Bryce-Soghigian Bryce-Soghigian self-assigned this Mar 11, 2024
@Bryce-Soghigian Bryce-Soghigian added the area/e2e-test-framework Issues or PRs related to refactoring the e2e test framework label Mar 11, 2024
Copy link
Collaborator

@charliedmcb charliedmcb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial review. Didn't get to go over the files within .github yet. Will be completing a second review for that.

Overall looking good, but I'd like a couple small adjustments, the ability to signal running the E2E tests for NAP or not via input params, and I think for cleanliness/focus it makes sense to split this into 3 PRs:

  • workflow change, and required supporting Makefile//test changes
  • new consolidation test suite
  • general unrelated cleanup to the e2e test files.

@@ -3,17 +3,19 @@ ifeq ($(CODESPACES),true)
AZURE_RESOURCE_GROUP ?= $(CODESPACE_NAME)
AZURE_ACR_NAME ?= $(subst -,,$(CODESPACE_NAME))
else
AZURE_RESOURCE_GROUP ?= karpenter
AZURE_RESOURCE_GROUP ?= karpeeer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why changing karpenter to karpeeer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not on purpose :O

Makefile-az.mk Show resolved Hide resolved
@@ -36,6 +38,16 @@ az-mkaks: az-mkacr ## Create test AKS cluster (with --vm-set-type AvailabilitySe
az aks get-credentials --name $(AZURE_CLUSTER_NAME) --resource-group $(AZURE_RESOURCE_GROUP) --overwrite-existing
skaffold config set default-repo $(AZURE_ACR_NAME).azurecr.io/karpenter

az-mkaks-cilium-nap:
az extension add --name aks-preview
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I slightly question auto-adding this. It should be a pre-requirement the user has for running this command in my opinion.

Having the side effect of running this make command be that we add the aks-preview extension to the users cli seems questionable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need it because there is no way to registere specific versions of the preview cli in the github action from what I could find. Added a makefile command following the existing pattern used for creating the cluster.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still be wary on adding it to this make command. Separating it out as an az-apply-nap-prerequirements make command along with the feature registration as I mention in the below comment I think would be good. That way in the action we can call az-apply-nap-prerequirements, but anyone doing local work, or such can avoid having the prerequirements side effects.

@@ -36,6 +38,16 @@ az-mkaks: az-mkacr ## Create test AKS cluster (with --vm-set-type AvailabilitySe
az aks get-credentials --name $(AZURE_CLUSTER_NAME) --resource-group $(AZURE_RESOURCE_GROUP) --overwrite-existing
skaffold config set default-repo $(AZURE_ACR_NAME).azurecr.io/karpenter

az-mkaks-cilium-nap:
az extension add --name aks-preview
az feature register --namespace "Microsoft.ContainerService" --name "NodeAutoProvisioningPreview"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also highly question having this side effect as well. I think again this is registering NodeAutoProvisioningPreview feature for the subscription.

A possible solution here would be to have an az-apply-nap-prerequirements with a note on it that it will update the cli to have the aks-preview extension, plus register the sub.

This way it can be something that users only have to apply once, but if it comes up in some pipeline, etc can always be called as a prerequirement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sub registration could be ran once, but we need the add of the aks-preview extension to be updated to the latest preview cli

--enable-oidc-issuer --enable-managed-identity --node-provisioning-mode Auto

az aks get-credentials --name $(AZURE_CLUSTER_NAME) --resource-group $(AZURE_RESOURCE_GROUP) --overwrite-existing
kubectl taint nodes CriticalAddonsOnly=true:NoSchedule --all
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't add this taint here. If you are wanting this within the e2e pipeline, put it into the workflows instead.

The makefiles I think should be a place for general commands, and if a user were to create a NAP cluster should they actually be applying this taint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should ALWAYS taint the systempool. NAP should create all of the userpool nodes imo.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if we should add something about it to the docs than?

So far tainting has been something we've done within the e2e test workflow themselves not the make commands.

However, if it is a best practice we want to recommend than I could see adding it to the make command, and/or documentation.

Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should! Maybe referencing the existing systempool documentation makes sense and explaining a bit more about the role of the system-surge pool

test/pkg/environment/common/expectations.go Show resolved Hide resolved
test/suites/consolidation/suite_test.go Show resolved Hide resolved
test/pkg/environment/azure/setup.go Outdated Show resolved Hide resolved
test/pkg/environment/azure/environment.go Show resolved Hide resolved
test/pkg/environment/azure/environment.go Show resolved Hide resolved
Copy link
Collaborator

@charliedmcb charliedmcb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Little mixed on how much to duplicate the workflow files, but I think at least a few of them can definitely be merged with the regular flow:

  • approval-comment-nap.yaml (merge with regular)
  • cleanup-nap/action.yaml (merge with regular)

I'd try and merge based on some form of flag:

  • e2e-matrix-trigger-nap.yaml
  • e2e-matrix-nap.yaml

I think that e2e-nap.yaml might be able to be consolidated, but I'm wondering if its right to as it somewhat seems right that a "nap e2e test is different". I think at least through the point of we are selecting the matrix of tests to run, and how to run them seems reasonable.

I think keeping a separate create action for nap does make sense though.

Not certain on all the functionality available for getting some sort of flag for this, and if it'd be too messy in the code, but those are my general thoughts on reducing duplication for now.

.github/workflows/approval-comment-nap.yaml Outdated Show resolved Hide resolved
@@ -32,4 +32,4 @@ jobs:
secrets:
E2E_CLIENT_ID: ${{ secrets.E2E_CLIENT_ID }}
E2E_TENANT_ID: ${{ secrets.E2E_TENANT_ID }}
E2E_SUBSCRIPTION_ID: ${{ secrets.E2E_SUBSCRIPTION_ID }}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this was an accidental change?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not, than add it to the general e2e cleanup PR to keep files being touched focused?

.github/workflows/README.md Outdated Show resolved Hide resolved
.github/workflows/e2e-matrix-trigger-nap.yaml Show resolved Hide resolved
.github/workflows/e2e-matrix-nap.yaml Show resolved Hide resolved
.github/workflows/e2e-nap.yaml Show resolved Hide resolved
.github/actions/e2e/create-nap-cluster/action.yaml Outdated Show resolved Hide resolved
run: az account set --subscription ${{ inputs.subscription-id }}
- name: delete cluster ${{ inputs.cluster_name }}
shell: bash
run: az aks delete --name ${{ inputs.cluster_name }} --resource-group ${{ inputs.resource_group }} --yes --no-wait
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just doesn't have the delete acr? set. Could simply remove that from the general cleanup since resource group delete should be enough. I think that's worth removing the need for an almost duplicate file.

@charliedmcb charliedmcb mentioned this pull request Mar 27, 2024
3 tasks
* revert changes to non .github/ files

* remove new test suite

* Revert "remove new test suite"

This reverts commit 0e7f3ab.

* Revert "revert changes to non .github/ files"

This reverts commit 40a6377.

* update nap triggers to be based on same approval comment

---------

Co-authored-by: Charlie McBride <[email protected]>
@Bryce-Soghigian
Copy link
Contributor Author

cleanup prs separated out:
#241
#240

@Bryce-Soghigian
Copy link
Contributor Author

test ping @Azure/aks-karpenter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/e2e-test-framework Issues or PRs related to refactoring the e2e test framework
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Running the e2e suite we have developed against the NodeAutoProvisioning Preview feature
3 participants