Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big refactoring #348

Merged
merged 187 commits into from
Aug 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
187 commits
Select commit Hold shift + click to select a range
33c4f91
Cleanup
vitobotta Apr 16, 2024
e0a99f3
Cleanup
vitobotta Apr 16, 2024
1f4b687
Cleanup
vitobotta Apr 16, 2024
e1c51e6
Remove duplication for: adding toleration
vitobotta Apr 16, 2024
afb9c7a
Cleanup
vitobotta Apr 16, 2024
6a8fb2a
Remove duplication for code that applies a kubectl command
vitobotta Apr 16, 2024
26db62a
Big refactoring!
vitobotta Apr 17, 2024
17c15de
Update README
vitobotta Apr 17, 2024
8000e41
Change Crystal extension
vitobotta Apr 17, 2024
0c9ec19
Limit concurrency
vitobotta Apr 17, 2024
8eb9d1c
Change max instances per placement group limit
vitobotta Apr 17, 2024
9355466
Print API response when server cannot be created
vitobotta Apr 17, 2024
7ff530d
Raise concurrency to 10
vitobotta Apr 17, 2024
c099511
Update README.md
vitobotta Apr 17, 2024
5cf5afa
Merge branch 'more-refactoring' of github.com:vitobotta/hetzner-k3s i…
vitobotta Apr 17, 2024
5745984
Raise concurrency
vitobotta Apr 17, 2024
65f4c3b
Cluster creation working fine
vitobotta Apr 17, 2024
77e970a
Cleanup
vitobotta Apr 17, 2024
a2a052c
Fix DNS issue
vitobotta Apr 18, 2024
5581c25
Fix kubelet args
vitobotta Apr 18, 2024
0f3903a
Add script to delete unscheduled pods
vitobotta Apr 18, 2024
d150ea4
Use a mutex to synchronise attaching instances to the network
vitobotta Apr 18, 2024
37d2e97
Change formatting of extra args
vitobotta Apr 18, 2024
2b66d2d
Do not sleep after attaching instance to netwrk
vitobotta Apr 18, 2024
88d10e5
Replace single subnet setting with private network settings
vitobotta Apr 18, 2024
8bcb77f
Handle configuration when private network is disabled
vitobotta Apr 18, 2024
bf4910e
Refactor configuration
vitobotta Apr 18, 2024
09bad78
Refactor configuration of manifests
vitobotta Apr 18, 2024
b449c05
Fix CNI validation
vitobotta Apr 18, 2024
2334049
Expand paths for the SSH keys
vitobotta Apr 18, 2024
29c4e5b
Cleanup
vitobotta Apr 18, 2024
387a2d0
Add missing cni variable to worker install script
vitobotta Apr 18, 2024
5a0747b
Synchronise attaching instances to network
vitobotta Apr 18, 2024
fd5ce34
Add missing public ip detection to worker install script
vitobotta Apr 18, 2024
17cc80e
Fix network name for Hetzner secret
vitobotta Apr 18, 2024
8a72c36
Update worker install script
vitobotta Apr 18, 2024
fe66ba2
Update worker install script
vitobotta Apr 18, 2024
3e3e309
Disable network policies as they cause problems
vitobotta Apr 18, 2024
07dc0a7
Install CCM without network support if private network is disabled
vitobotta Apr 18, 2024
e868af0
Fizes for supporting disabled private network
vitobotta Apr 19, 2024
6a7d71f
Don't wait when creating workers; creare placement groups concurrently
vitobotta Apr 19, 2024
f3ceb5a
Memoize token
vitobotta Apr 19, 2024
1ab889d
Disable CNI installation when CNI is disabled
vitobotta Apr 19, 2024
81ca5b1
Update log message for waiting instance to be running
vitobotta Apr 19, 2024
2c8da32
Allow dismissing output of shell command
vitobotta Apr 19, 2024
37005c5
Add script to reset Cilium network configuration
vitobotta Apr 19, 2024
a61f791
Cleanup
vitobotta Apr 19, 2024
5e133cd
Maximise use of placement groups
vitobotta Apr 19, 2024
dc48cf7
Fix issue with fetching placement group if there are more than 25
vitobotta Apr 19, 2024
4eaa120
Fix handling of placement groups
vitobotta Apr 19, 2024
6a3cf78
Delete placement groups when they are supposed to be deleted
vitobotta Apr 19, 2024
2405bf2
Apply firewall in non disruptive way
vitobotta Apr 19, 2024
323faf0
Validate presence of kubectl and helm in PATH
vitobotta Apr 19, 2024
0e85e61
Add native support for Cilium
vitobotta Apr 19, 2024
13b6f6d
Fix Cilium config
vitobotta Apr 19, 2024
496ff80
Fix support for publoic network-only clusters
vitobotta Apr 21, 2024
70634bb
Cleanup
vitobotta Apr 21, 2024
fda1b4f
Cleanup
vitobotta Apr 21, 2024
5c670ee
Fix Flannel config
vitobotta Apr 21, 2024
fc3f12d
Sort sans to reduce changes to the scripts
vitobotta Apr 21, 2024
69d99a8
Fix assignment to placement group
vitobotta Apr 21, 2024
fc22f6a
Reduce verbosity
vitobotta Apr 21, 2024
f1765b9
Update README with new cluster creation record
vitobotta Apr 21, 2024
92c2bc8
Update cluster creation time in README
vitobotta Apr 21, 2024
68d1ef3
Cleanup
vitobotta Apr 21, 2024
1fe608a
Change default image to Ubuntu 24.04
vitobotta Apr 27, 2024
2ee119b
Only process 10 servers per time
vitobotta Apr 27, 2024
233f5e9
Deploy k3s on ready nodes while creating other nodes
vitobotta Apr 27, 2024
d1b2a82
LImit channels size
vitobotta Apr 27, 2024
19d8f6b
Introduce state file
vitobotta Apr 27, 2024
da4a1aa
Add cilium and calicoctl to .gitignore
vitobotta Apr 27, 2024
d618ae0
Refactor Hetzner::Client methods to handle rate limiting and response…
vitobotta Apr 27, 2024
087b673
Open node port range in firewall
vitobotta Apr 27, 2024
e689d18
Open wireguard port for Cilium
vitobotta Apr 27, 2024
47fc698
Update default image to Ubuntu 24.04
vitobotta Apr 27, 2024
bc0d624
Refactor Hetzner API client to exit early if locations or instance ty…
vitobotta Apr 27, 2024
df110ee
Use retriable to retry requests
vitobotta Apr 27, 2024
fe87923
Cleanup
vitobotta Apr 28, 2024
fe13372
Print remaining time
vitobotta Apr 28, 2024
1ff22f3
Use a Tuple instead of Array
vitobotta Apr 28, 2024
e2a21b1
Reduce API calls by finding existing nodes with kubectl
vitobotta Apr 28, 2024
fb6a3a1
Retry request
vitobotta Apr 28, 2024
3851baf
Return result of SSH command
vitobotta Apr 28, 2024
30b59c9
Ignore cluster info if kubeconfig is not available
vitobotta Apr 28, 2024
0f9eb76
Omit output if disabled
vitobotta Apr 28, 2024
291ccb0
Rename Flannel class to CNI and add mode attribute to choose which CN…
vitobotta Apr 28, 2024
ca83227
Add CNI mode attributes and helper methods to CNI configuration class
vitobotta Apr 28, 2024
00dee0d
Refactor Kubernetes::Installer to use CNI configuration for flannel b…
vitobotta Apr 28, 2024
c678903
Remove cluster state class
vitobotta Apr 28, 2024
7263892
Refactor Kubernetes::Installer to install Cilium CNI when selected as…
vitobotta Apr 28, 2024
6334561
Make Cilium chart version configurable
vitobotta Apr 28, 2024
58f28f3
Enable Wireguard encryption for Cilium only if encryption has been en…
vitobotta Apr 28, 2024
6ddc910
Update Cilium chart version to v1.15.4
vitobotta Apr 28, 2024
1031f9d
Enable different Wireguard ports dpending on CNI
vitobotta Apr 28, 2024
1ff9c01
Refactor CNI configuration to enable/disable kube-proxy based on CNI …
vitobotta Apr 28, 2024
679cca2
Refactor Kubernetes::Installer to install Spegel when enabled
vitobotta Apr 28, 2024
897e951
Add missing require
vitobotta Apr 28, 2024
840ac45
Merge branch 'main' into more-refactoring
vitobotta Apr 28, 2024
3c8580b
Add logo
vitobotta Apr 28, 2024
ee5e594
Merge branch 'main' into more-refactoring
vitobotta Apr 28, 2024
ef356ed
Refactor firewall rule description for wireguard traffic
vitobotta Apr 28, 2024
30aafc6
Fix condition in Kubernetes::Installer to check if cluster is running
vitobotta May 1, 2024
7c85ead
Fixed a recursion issue
vitobotta May 1, 2024
8dc9fde
Remove duplicate method definition
vitobotta May 1, 2024
c1cab80
Cilium: fix deletion of unmanaged pods after installing
vitobotta May 1, 2024
3ea9d95
Use internal ip in firewall when CCM hasn't been installed yet but ma…
vitobotta May 1, 2024
0af97e8
Cleanup
vitobotta May 1, 2024
9cd47af
Fix syntax
vitobotta May 1, 2024
b04f0a0
Fix issues due to concurrent access to properties
vitobotta May 1, 2024
8cd91e3
Delete kubeconfig file when deleting the cluster
vitobotta May 1, 2024
c945eef
Sort placement groups by name so the list is always consistent
vitobotta May 1, 2024
e8d857f
Remove code that kills unmanaged pods when installing cilium
vitobotta May 1, 2024
a98419c
Pass settings to instance delete
vitobotta May 1, 2024
b7302a3
Pass settings to initializer for instance finder
vitobotta May 1, 2024
74e48c4
Enable SSH support for instance deletion
vitobotta May 1, 2024
047808e
Add SSH support to instance finder
vitobotta May 1, 2024
59a77c1
Cleanup
vitobotta May 1, 2024
9b54564
Randomize script filename
vitobotta May 1, 2024
10b5c65
Fix placement group assignment
vitobotta May 1, 2024
315e7b1
Fix creation of remaining placement groups
vitobotta May 1, 2024
bdacc14
Raise concurency to 25
vitobotta May 1, 2024
15f1246
Start installing software before setting up worker nodes
vitobotta May 1, 2024
7e4536c
Raise concurrency to 50
vitobotta May 1, 2024
3ed7647
Use a mutex for rate limit wait msg
vitobotta May 1, 2024
1a90ee0
Start the server on creation if private network is disabled
vitobotta May 1, 2024
fad7011
Cheeck if API server is reachable before checking for existence of se…
vitobotta May 2, 2024
e1e7677
Start instance on create if there is no private network
vitobotta May 2, 2024
cf468ef
Add support for embedded registry mirror
vitobotta May 2, 2024
9fd68e3
Remove extra arg left by mistake
vitobotta May 2, 2024
d824c12
Change ipam mode so not to cause Cilium network to overlap with hetzn…
vitobotta May 2, 2024
9770705
Upgrade Alpine version
vitobotta Jun 22, 2024
ca97798
Fix deprecation
vitobotta Jul 22, 2024
f2d6875
Merge branch 'main' into more-refactoring
vitobotta Jul 22, 2024
8124b48
Retry creating the instance
vitobotta Jul 23, 2024
88e306d
Fix detection of whether the cluster is ready or not
vitobotta Jul 23, 2024
22a9fdd
Do not include instance type in instance name by default, make it opt…
vitobotta Jul 23, 2024
728cb8a
Increase timeout for SSH connections to 5 seconds
vitobotta Jul 23, 2024
bcde429
Print a message when creating an instance fails
vitobotta Jul 23, 2024
bc3e2d3
Add support for Hillsboro region
vitobotta Jul 23, 2024
aae0c98
Add current timestamp to each log line
vitobotta Jul 23, 2024
07a6aad
Fix setting network for autoscaled instances
vitobotta Jul 23, 2024
6be1c97
Upgrade CSI driver
vitobotta Jul 23, 2024
9876720
Do not open etcd ports if using external datastore
vitobotta Jul 23, 2024
958eb30
Upgrade Cilium to latest version
vitobotta Jul 23, 2024
470e3b9
Enable local path provisioner
vitobotta Jul 23, 2024
bf52992
Split documentation into multiple files
vitobotta Jul 23, 2024
ba7f5e7
Improve docs
vitobotta Jul 23, 2024
ae266ec
Add note about external datastores to the recommendations page
vitobotta Jul 23, 2024
5b72f2e
Add note about helm to prerequisites
vitobotta Jul 24, 2024
9a0b5a3
Move section to more relevant page
vitobotta Jul 24, 2024
24491a5
Add blank line to avoid confusion
vitobotta Jul 24, 2024
193c867
Update default k3s version
vitobotta Jul 24, 2024
4174189
Ensure worker scripts also wait for cloud init to finish its stuff
vitobotta Jul 24, 2024
bb0c820
Merge branch 'main' into more-refactoring
vitobotta Jul 24, 2024
b4eefbf
Fix setting names
vitobotta Jul 25, 2024
f39396e
Fix instance names for labels and taints
vitobotta Jul 25, 2024
71745b7
Fix support for multiline post create commands
vitobotta Jul 25, 2024
424d81f
Add a couple of clarifying comments in sample conf
jpetazzo Jul 30, 2024
6dcfb15
Merge pull request #392 from jpetazzo/add-clarifying-comments-in-samp…
vitobotta Jul 30, 2024
3b7762e
Timeout increase 30->60s
Jul 30, 2024
abbde98
chore: #389 Cloud init wait improvements
Jul 30, 2024
b6a05b0
Merge pull request #393 from axiros/more-refactoring
vitobotta Jul 30, 2024
df3b681
fix: #393 My wait function definition was buggy, causing autoscaler c…
Jul 31, 2024
5ca5e81
Merge pull request #394 from axiros/more-refactoring
vitobotta Jul 31, 2024
9a60d92
Use correct config setting to enable encryption with Cilium
vitobotta Aug 4, 2024
abfacb1
Merge branch 'more-refactoring' of github.com:vitobotta/hetzner-k3s i…
vitobotta Aug 4, 2024
10c5a62
Merge branch 'main' into more-refactoring
vitobotta Aug 6, 2024
b867654
fix: solve timeout when api server hostname is given
Aug 7, 2024
cfd3673
Fix SSH when using non-default port + add test harness
jpetazzo Jul 29, 2024
b8b721e
Attach instance to network automatically
vitobotta Aug 12, 2024
ae4230c
Merge pull request #391 from jpetazzo/fix-ssh-port-customization
vitobotta Aug 17, 2024
b02f9d7
Merge pull request #407 from axiros/api-server
vitobotta Aug 17, 2024
7f1442a
Restore change removed by mistake
vitobotta Aug 18, 2024
60e1126
Fix handling of custom SSH port when openssh server is configured for…
vitobotta Aug 18, 2024
b000a04
Improve handling of the script to wait for cloud init to finish
vitobotta Aug 18, 2024
d0cd104
Improve log message to tell user to delete autoscaled nodes if firewa…
vitobotta Aug 18, 2024
a1e0fff
Attach instance to network if it doesn't happen automatically as expe…
vitobotta Aug 18, 2024
6fa526b
WIP
vitobotta Aug 18, 2024
7eade3c
User multiple contexts instead of load balancer
vitobotta Aug 18, 2024
e3d5e44
Upgrade CSI
vitobotta Aug 18, 2024
37f2016
Update CSI manifest url
vitobotta Aug 18, 2024
0007caf
Automatically delete autoscaled nodes
vitobotta Aug 18, 2024
6e4fcc6
Update version
vitobotta Aug 18, 2024
173f4d1
Update log message
vitobotta Aug 18, 2024
cc03f6a
Update version in docs
vitobotta Aug 18, 2024
674cc9f
Update version in docs
vitobotta Aug 18, 2024
b822ad8
Update README
vitobotta Aug 18, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,8 @@
"redhat.vscode-yaml",
"mutantdino.resourcemonitor",
"technosophos.vscode-helm",
"crystal-lang-tools.crystal-lang"
"jgillich.crystal-lang-fixed"
// "crystal-lang-tools.crystal-lang"
],
"recommendations": [
"GitHub.copilot",
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ jobs:
binary_arch_suffix: arm64
arch: aarch64
distro: alpine_latest
- runs_on_tag: ubuntu-22.04
- runs_on_tag: ubuntu-24.04
binary_os_suffix: linux
binary_arch_suffix: amd64
arch: none
Expand Down
9 changes: 8 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@
dist/hetzner-k3s.jar
dist/hetzner-k3s

/docs/
/lib/
/bin/
/.shards/
Expand All @@ -28,3 +27,11 @@ actions-runner
.zsh_history
temp
docker-compose.override.yml
cilium
calicoctl

e2e-tests/env
e2e-tests/sshkey*
e2e-tests/test-*
create.log
delete.log
2 changes: 1 addition & 1 deletion Dockerfile.dev
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM alpine:3.19.1
FROM alpine:3.20.1

RUN apk update \
&& apk add --update --no-cache gcc gmp-dev libevent-static musl-dev pcre-dev pcre2-dev libxml2-dev \
Expand Down
525 changes: 17 additions & 508 deletions README.md

Large diffs are not rendered by default.

81 changes: 0 additions & 81 deletions cluster_config.yaml.example

This file was deleted.

50 changes: 50 additions & 0 deletions docs/Contributing_and_support.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Contributing and support

Please create a PR if you want to propose any changes, or open an issue if you are having trouble with the tool - I will do my best to help if I can.

If you would like to financially support the project, consider [becoming a sponsor](https://github.com/sponsors/vitobotta).

___
## Building from source

This tool is written in [Crystal](https://crystal-lang.org/). To build it, or to make some changes in the code and try them, you will need to install Crystal locally, or to work in a container.

This repository contains a Dockerfile that builds a container image with Crystal as well as the other required dependencies. There is also a Compose file to conveniently run a container using that image, and mount the source code into the container. Finally, there is a devcontainer file that you can use with compatible IDEs like Visual Studio Code and the Dev Containers extension.


### Developing with VSCode

You need [Visual Studio Code](https://code.visualstudio.com/) and the [Dev Containers extension](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers&ssr=false). Open the project in VSCode (for instance, by executing `code .` in the root directory of this git repository). You should see a pop-up dialog prompting you to "Reopen in Container". Do that, then wait until the build is complete and the server has started; then click on "+" to open a terminal inside the container.

Note: if for some reason you can't find the Dev Containers extension in the Marketplace (for instance, if the first result is the Docker extension instead of Dev Containers), check that you have the official build of VSCode. It looks like if you're running an Open Source build, some extensions are disabled.


### Developing with Compose

If you can't or won't install VSCode, you can also develop in the exact same container with Docker and Compose.

To build and run the development container, run:
```bash
docker compose up -d
```

Then, to enter the container:
```bash
docker compose exec hetzner-k3s bash
```


### Inside the container

Once you are inside the dev container (whether you used VSCode or directly Docker Compose), you can run `hetzner-k3s` like this:
```bash
crystal run ./src/hetzner-k3s.cr -- create --config cluster_config.yaml
```

To generate a binary, you can do:
```bash
crystal build ./src/hetzner-k3s.cr --static
```

The `--static` flag will make sure that the resulting binary is statically linked, and doesn't have dependencies on libraries that may or may not be available on the system where you will want to run it.

206 changes: 206 additions & 0 deletions docs/Creating_a_cluster.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
# Creating a cluster

The tool requires a simple configuration file in order to create/upgrade/delete clusters, in the YAML format like in the example below (commented lines are for optional settings):

```yaml
---
hetzner_token: <your token>
cluster_name: test
kubeconfig_path: "./kubeconfig"
k3s_version: v1.30.3+k3s1

networking:
ssh:
port: 22
use_agent: false # set to true if your key has a passphrase
public_key_path: "~/.ssh/id_ed25519.pub"
private_key_path: "~/.ssh/id_ed25519"
allowed_networks:
ssh:
- 0.0.0.0/0
api: # this will firewall port 6443 on the nodes; it will NOT firewall the API load balancer
- 0.0.0.0/0
public_network:
ipv4: true
ipv6: true
private_network:
enabled : true
subnet: 10.0.0.0/16
existing_network_name: ""
cni:
enabled: true
encryption: false
mode: flannel

# cluster_cidr: 10.244.0.0/16 # optional: a custom IPv4/IPv6 network CIDR to use for pod IPs
# service_cidr: 10.43.0.0/16 # optional: a custom IPv4/IPv6 network CIDR to use for service IPs. Warning, if you change this, you should also change cluster_dns!
# cluster_dns: 10.43.0.10 # optional: IPv4 Cluster IP for coredns service. Needs to be an address from the service_cidr range


# manifests:
# cloud_controller_manager_manifest_url: "https://github.com/hetznercloud/hcloud-cloud-controller-manager/releases/download/v1.20.0/ccm-networks.yaml"
# csi_driver_manifest_url: "https://raw.githubusercontent.com/hetznercloud/csi-driver/v2.9.0/deploy/kubernetes/hcloud-csi.yml"
# system_upgrade_controller_deployment_manifest_url: "https://github.com/rancher/system-upgrade-controller/releases/download/v0.13.4/system-upgrade-controller.yaml"
# system_upgrade_controller_crd_manifest_url: "https://github.com/rancher/system-upgrade-controller/releases/download/v0.13.4/crd.yaml"
# cluster_autoscaler_manifest_url: "https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/hetzner/examples/cluster-autoscaler-run-on-master.yaml"

datastore:
mode: etcd # etcd (default) or external
external_datastore_endpoint: postgres://....

schedule_workloads_on_masters: false

# image: rocky-9 # optional: default is ubuntu-24.04
# autoscaling_image: 103908130 # optional, defaults to the `image` setting
# snapshot_os: microos # optional: specified the os type when using a custom snapshot

masters_pool:
instance_type: cpx21
instance_count: 3
location: nbg1

worker_node_pools:
- name: small-static
instance_type: cpx21
instance_count: 4
location: hel1
# image: debian-11
# labels:
# - key: purpose
# value: blah
# taints:
# - key: something
# value: value1:NoSchedule
- name: medium-autoscaled
instance_type: cpx31
instance_count: 2
location: fsn1
autoscaling:
enabled: true
min_instances: 0
max_instances: 3

embedded_registry_mirror:
enabled: true

# additional_packages:
# - somepackage

# post_create_commands:
# - apt update
# - apt upgrade -y
# - apt autoremove -y

# kube_api_server_args:
# - arg1
# - ...
# kube_scheduler_args:
# - arg1
# - ...
# kube_controller_manager_args:
# - arg1
# - ...
# kube_cloud_controller_manager_args:
# - arg1
# - ...
# kubelet_args:
# - arg1
# - ...
# kube_proxy_args:
# - arg1
# - ...
# api_server_hostname: k8s.example.com # optional: DNS for the k8s API LoadBalancer. After the script has run, create a DNS record with the address of the API LoadBalancer.
```

Most settings should be self explanatory; you can run `hetzner-k3s releases` to see a list of the available k3s releases.

If you don't want to specify the Hetzner token in the config file (for example if you want to use the tool with CI or want to safely commit the config file to a repository), then you can use the `HCLOUD_TOKEN` environment variable instead, which has precedence.

If you set `masters_pool.instance_count` to 1 then the tool will create a non highly available control plane; for production clusters you may want to set it to a number greater than 1. This number must be odd to avoid split brain issues with etcd and the recommended number is 3.

You can specify any number of worker node pools, static or autoscaled, and have mixed nodes with different specs for different workloads.

Hetzner cloud init settings (`additional_packages` & `post_create_commands`) can be defined in the configuration file at root level as well as for each pool if different settings are needed for different pools. If these settings are configured for a pool, these override the settings at root level.

At the moment Hetzner Cloud has five locations: two in Germany (`nbg1`, Nuremberg and `fsn1`, Falkenstein), one in Finland (`hel1`, Helsinki) and two in the USA (`ash`, Ashburn, Virginia, and `hil`, Hillsboro, Oregon). Please keep in mind that US locations only offer instances with AMD CPUs at the moment, while the newly introduced ARM instances are only available in Falkenstein-fsn1 for now.

For the available instance types and their specs, either check from inside a project when adding an instance manually or run the following with your Hetzner token:

```bash
curl -H "Authorization: Bearer $API_TOKEN" 'https://api.hetzner.cloud/v1/server_types'
```

To create the cluster run:

```bash
hetzner-k3s create --config cluster_config.yaml | tee create.log
```

This will take a few minutes depending on the number of masters and worker nodes.

### Disabling public IPs (IPv4 or IPv6 or both) on nodes

With `enable_public_net_ipv4: false` and `enable_public_net_ipv6: false` you can disable the public interface for all nodes for improved security and saving on ipv4 addresses costs. These settings are global and effects all master and worker nodes. If you disable public IPs be sure to run hetzer-k3s from a machine that has access to the same private network as the nodes either directly or via some VPN.
Additional networking setup is required via cloud init, so it's important that the machine from which you run hetzner-k3s have internet access and DNS configured correctly, otherwise the cluster creation process will get stuck after creating the nodes. See [this discussion](https://github.com/vitobotta/hetzner-k3s/discussions/252) for additional information and instructions.

### Using alternative OS images

By default, the image in use is `ubuntu-24.04` for all the nodes, but you can specify a different default image with the root level `image` config option or even different images for different static node pools by setting the `image` config option in each node pool. This way you can, for example, have some node pools with ARM instances use the correct OS image for ARM. To do this and use say Ubuntu 24.04 on ARM instances, set `image` to `103908130` with a specific image ID. With regard to autoscaling, due to a limitation in the Cluster Autoscaler for Hetzner it is not possible yet to specify a different image for each autoscaled pool, so for now you can specify the image for all autoscaled pools by setting the `autoscaling_image` setting if you want to use an image different from the one specified in `image`.

To see the list of available images, run the following:

```bash
export API_TOKEN=...

curl -H "Authorization: Bearer $API_TOKEN" 'https://api.hetzner.cloud/v1/images?per_page=100'
```

Besides the default OS images, It's also possible to use a snapshot that you have already created from an existing instance. Also with custom snapshots you'll need to specify the **ID** of the snapshot/image, not the description you gave when you created the template instance.

I've tested snapshots for [openSUSE MicroOS](https://microos.opensuse.org/) but others might work too. You can easily create a snapshot for MicroOS using [this tool](https://github.com/kube-hetzner/packer-hcloud-microos). Creating the snapshot takes just a couple of minutes and then you can use it with hetzner-k3s by setting the config option `image` to the **ID** of the snapshot, and `snapshot_os` to `microos`.


### Keeping a project per cluster

If you want to create multiple clusters per project, see [Configuring Cluster-CIDR and Service-CIDR](#configuring-cluster-cidr-and-service-cidr). Make sure, that every cluster has its own dedicated Cluster- and Service-CIDR. If they overlap, it will cause problems. But I still recommend keeping clusters separated from each other. This way, if you want to delete a cluster with all the resources created for it, you can just delete the project.

### Configuring Cluster-CIDR and Service-CIDR

Cluster-CIDR and Service-CIDR describe the IP-Ranges that are used for pods and services respectively. Under normal circumstances you should not need to change these values. However, advanced scenarios may require you to change them to avoid networking conflicts.

**Changing the Cluster-CIDR (Pod IP-Range):**

To change the Cluster-CIDR, uncomment/add the `cluster_cidr` option in your cluster configuration file and provide a valid CIDR notated network to use. The provided network must not be a subnet of your private network.

**Changing the Service-CIDR (Service IP-Range):**

To change the Service-CIDR, uncomment/add the `service_cidr` option in your cluster configuration file and provide a valid CIDR notated network to use. The provided network must not be a subnet of your private network.

Also uncomment the `cluster_dns` option and provide a single IP-Address from your `service_cidr` range. `cluster_dns` sets the IP-Address of the coredns service.

**Sizing the Networks**

The networks you provide should provide enough space for the expected amount of pods/services. By default `/16` networks are used. Please make sure you chose an adequate size, as changing the CIDR afterwards is not supported.

### Idempotency

The `create` command can be run any number of times with the same configuration without causing any issue, since the process is idempotent. This means that if for some reason the create process gets stuck or throws errors (for example if the Hetzner API is unavailable or there are timeouts etc), you can just stop the current command, and re-run it with the same configuration to continue from where it left.

Note that the kubeconfig will be overwritten when you re-run the `create` command.


### Limitations:

- if possible, please use modern SSH keys since some operating systems have deprecated old crypto based on SHA1; therefore I recommend you use ECDSA keys instead of the old RSA type
- if you use a snapshot instead of one of the default images, the creation of the instances will take longer than when using a regular image
- the setting `networking`.`allowed_networks`.`api` allows specifying which networks can access the Kubernetes API, but this only works with single master clusters currently. Multi-master HA clusters require a load balancer for the API, but load balancers are not yet covered by Hetzner's firewalls
- if you enable autoscaling for one or more nodepools, do not change that setting afterwards as it can cause problems to the autoscaler
- autoscaling is only supported when using Ubuntu or one of the other default images, not snapshots
- worker nodes created by the autoscaler must be deleted manually from the Hetzner Console when deleting the cluster (this will be addressed in a future update)
- SSH keys with passphrases can only be used if you set `networking`.`ssh`.`use_ssh_agent` to `true` and use an SSH agent to access your key. To start and agent e.g. on macOS:

```bash
eval "$(ssh-agent -s)"
ssh-add --apple-use-keychain ~/.ssh/<private key>
```

12 changes: 12 additions & 0 deletions docs/Deleting_a_cluster.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Deleting a cluster

To delete a cluster, running

```bash
hetzner-k3s delete --config cluster_config.yaml
```

This will delete all the resources in the Hetzner Cloud project created by `hetzner-k3s` directly.

**NOTE:** at the moment instances created by the cluster autoscaler, as well as load balancers and persistent volumes created by deploying your applications must be deleted manually. This may be addressed in a future release.

Loading
Loading