Skip to content

Commit

Permalink
Introduce VMO based clusters
Browse files Browse the repository at this point in the history
Add support for Azure virtual machine scaleset orchestration mode vm (VMO) based Shoot clusters.
Each worker pool will get its own VMO instead its of a shared resource like for AvailabilitySets based clusters.
The Shoot cluster need to be non-zoned and have the annotation "alpha.azure.provider.extensions.gardener.cloud/vmo=true" to enable VMO which we currently treat as a preview feature.

- In case the corresponding fault domain count in the CloudProfile changes then the VMO will be replaced. This will lead to a rolling update of the machines in each worker pool.
- Orphan Gardener managed VMOs will be removed during the Worker reconcilation. So it is safe to be deployed into existing resource group without leaking resources.
- VMO based clusters will make use of the Standard SKU Loadbalancer.
- The NatGateway will be available for VMO clusters. This is a difference to AvailabilitySet based clusters which does not support NatGateway.
- Accelerated network will work with VMO clusters.

Other changes:
- The Azure SDK is updated to support compute client in version 2019-12-01 (required to support VMO)
- VMO clusters will use the "vmss" version of the vm-controller in the Azure cloud-controller-manager (activated in the cloud-provider config via "vmType=vmss").
- A vmss client has been added to the client factory.
- Added mocks for the client factory and the vmss client.
- The Worker controller tests have been restructured partially to allow more reuse for the machine dependency tests.
- The machineclass chart structure to configure AvailabilitySets has been alligned to the way to configure VMOs.
  • Loading branch information
dkistner committed Dec 7, 2020
1 parent 5541dfb commit d7bb323
Show file tree
Hide file tree
Showing 80 changed files with 33,790 additions and 658 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ loadBalancerSku: "basic"
{{- else }}
loadBalancerSku: "standard"
{{- end }}
{{- if hasKey .Values "vmType" }}
vmType: "{{ .Values.vmType }}"
{{- end }}
cloudProviderBackoff: true
cloudProviderBackoffRetries: 6
cloudProviderBackoffExponent: 1.5
Expand Down
1 change: 1 addition & 0 deletions charts/internal/cloud-provider-config/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@ securityGroupName: sgname
region: location
maxNodes: 0
# acrIdentityClientId: identityClientID
# vmType: standard
7 changes: 4 additions & 3 deletions charts/internal/machineclass/templates/machineclass.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,10 @@ spec:
{{- if hasKey $machineClass "zone" }}
zone: {{ $machineClass.zone }}
{{- end }}
{{- if hasKey $machineClass "availabilitySetID" }}
availabilitySet:
id: {{ $machineClass.availabilitySetID }}
{{- if hasKey $machineClass "machineSet" }}
machineSet:
id: {{ $machineClass.machineSet.id }}
kind: {{ $machineClass.machineSet.kind }}
{{- end }}
{{- if hasKey $machineClass "identityID" }}
identityID: {{ $machineClass.identityID }}
Expand Down
37 changes: 36 additions & 1 deletion charts/internal/machineclass/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,42 @@ machineClasses:
network:
vnet: my-vnet
subnet: my-subnet-in-my-vnet
availabilitySetID: /subscriptions/subscription-id/resourceGroups/resource-group-name/providers/Microsoft.Compute/availabilitySets/availablity-set-name
machineSet:
id: /subscriptions/subscription-id/resourceGroups/resource-group-name/providers/Microsoft.Compute/availabilitySets/availablity-set-name
kind: availabilityset
tags:
Name: shoot-crazy-botany
kubernetes.io-cluster-shoot-crazy-botany: "1"
kubernetes.io-role-node: "1"
secret:
clientID: ABCD
clientSecret: ABCD
subscriptionID: abc
tenantID: abc
cloudConfig: abc
machineType: Standard_DS1_V2
image:
#urn: "CoreOS:CoreOS:Stable:1576.5.0"
id: "/subscriptions/<subscription ID where the gallery is located>/resourceGroups/myGalleryRG/providers/Microsoft.Compute/galleries/myGallery/images/myImageDefinition/versions/1.0.0"
osDisk:
size: 50
type: Standard_LRS
# dataDisks:
# - lun: 0
# caching: None
# diskSizeGB: 100
# storageAccountType: Standard_LRS
# name: sdb
sshPublicKey: ssh-rsa AAAAB3...
- name: class-3-vmo
region: westeurope
resourceGroup: my-resource-group
network:
vnet: my-vnet
subnet: my-subnet-in-my-vnet
machineSet:
id: /subscriptions/subscription-id/resourceGroups/resource-group-name/providers/Microsoft.Compute/virtualmachinescaleset/vmo-name
kind: vmo
tags:
Name: shoot-crazy-botany
kubernetes.io-cluster-shoot-crazy-botany: "1"
Expand Down
17 changes: 17 additions & 0 deletions docs/usage-as-end-user.md
Original file line number Diff line number Diff line change
Expand Up @@ -257,3 +257,20 @@ All worker machines of the cluster will be automatically configured to use [Azur
The prerequisites are that the cluster must be zoned, and the used machine type and operating system image version are compatible for Accelerated Networking.
`Availability Set` based shoot clusters will not be enabled for accelerated networking even if the machine type and operating system support it, this is necessary because all machines from the availability set must be scheduled on special hardware, more daitls can be found [here](https://github.com/MicrosoftDocs/azure-docs/issues/10536).
Supported machine types are listed in the CloudProfile in `.spec.providerConfig.machineTypes[].acceleratedNetworking` and the supported operating system image versions are defined in `.spec.providerConfig.machineImages[].versions[].acceleratedNetworking`.

### Preview: Shoot clusters with VMSS Orchestration Mode VM (VMO)

Azure Shoot clusters can be created with machines which are attached to an [Azure Virtual Machine ScaleSet orchestraion mode VM (VMO)](https://docs.microsoft.com/en-us/azure/virtual-machine-scale-sets/orchestration-modes).
The orchestraion mode VM of Virtual Machine ScaleSet is currently in preview mode and not yet general available on Azure.

Azure VMO are intended to replace Azure AvailabilitySet for non-zoned Azure Shoot clusters in the mid-term as VMO come with less disadvantages like no blocking machine operations or compability with `Standard` SKU loadbalancer etc.

To configure an Azure Shoot cluster which make use of VMO you need to do the following:
- The `InfrastructureConfig` of the Shoot configuration need to contain `.zoned=false`
- Shoot resource need to have the following annotation assigned: `alpha.azure.provider.extensions.gardener.cloud/vmo=true`

Some key facts about VMO based clusters:
- Unlike regular non-zonal Azure Shoot clusters, which have a primary AvailabilitySet which is shared between all machines in all worker pools of a Shoot cluster, a VMO based cluster has an own VMO for each workerpool
- In case the configuration of the VMO will change (e.g. amount of fault domains in a region change; configured in the CloudProfile) all machines of the worker pool need to be rolled
- It is not possible to migrate an existing primary AvailabilitySet based Shoot cluster to VMO based Shoot cluster and vice versa
- VMO based clusters are using `Standard` SKU LoadBalancers instead of `Basic` SKU LoadBalancers for AvailabilitySet based Shoot clusters
Loading

0 comments on commit d7bb323

Please sign in to comment.