In this tutorial, you will learn how to deploy TrilioVault for Kubernetes
(or TVK
) to your DOKS
cluster, create backups
, and recover
from a backup if something goes wrong. You can back up your entire
cluster, or optionally choose a namespace
or label
based backups. Helm Releases
backups is supported as well, which is a nice addition for the Starter Kit
where every installation is Helm
based.
Advantages of using Trilio
:
- Take
full
(orincremental
) backups of your cluster andrestore
in case of data loss. Migrate
from one cluster to another.Helm
release backups are supported.- Run
pre
andpost hooks
for backup and restore operations. - Web management console, that allows you to inspect your backup/restore operations state in detail (and many other features).
- Define
retention policies
for your backups. - Application lifecycle (meaning, TVK itself) can be managed via a dedicated
TrilioVault Operator
if desired. Velero
integration (Trilio supports monitoring Velero backups, restores, and backup/snapshot locations via the web management console).- You can backup and restore
Operator
based applications.
TVK
follows a cloud native
architecture, meaning that it has several components that together form the Control Plane
and Data Plane
layers. Everything is managed via CRDs
, thus making it fully Kubernetes
native. What is nice about Trilio
is the clear separation of concerns, and how effective it handles backup and restore operations.
Each TrilioVault
application consists of a bunch of Controllers
and the associated CRDs
. Every time a CRD
is created or updated, the responsible controller is notified and performs cluster reconciliation. Then, the controller in charge spawns Kubernetes
jobs that perform the real operation (like backup
, restore
, etc) in parallel.
Control Plane
consists of:
Target Controller
, defines thestorage
backend (S3
,NFS
, etc) via specific CRDs.BackupPlan Controller
, defines the components to backup, automated backups schedule, retention strategy, etc via specific CRDs.Restore Controller
, defines restore operations via specific CRDs.
Data Plane
consists of:
Datamover
Pods, responsible with transferring data between persistent volumes and backup media (orTarget
).TrilioVault
works withPersistent Volumes
(PVs) using theCSI
interface. For eachPV
that needs to be backed up, an ephemeralDatamover Pod
is created. After each operation finishes, the associated pod is destroyed.Metamover
Pods, responsible with transferringKubernetes API
objects data to backup media (orTarget
).Metamover
pods areephemeral
, just like theDatamover
ones.
TrilioVault
for Kubernetes works based on scope
, meaning you can have a Namespaced
or a Cluster
type of installation.
A Namespaced
installation allows you to backup
and restore
at the namespace
level only. In other words, the backup is meant to protect a set of applications that are bound to a namespace that you own. This is how a BackupPlan
and the corresponding Backup
CRD works. You cannot mutate those CRDs in other namespaces, they must be created in the same namespace where the application to be backed up is located.
On the other hand, a Cluster
type installation is not scoped or bound to any namespace or a set of applications. You define cluster type backups via the Cluster
prefixed CRDs
, like: ClusterBackupPlan
, ClusterBackup
, etc. Cluster type backups are a little bit more flexible, in the sense that you are not tied to a specific namespace or set of applications to backup and restore. You can perform backup/restore operations for multiple namespaces and applications at once, including PVs
as well (you can also backup etcd
databased content).
In order to make sure that TVK application scope
and rules
are followed correctly, TrilioVault
is using an Admission Controller
. It intercepts
and validates
each CRD
that you want to push for TVK
, before it is actually created. In case TVK application scope is not followed, the admission controller will reject CRD creation in the cluster.
Another important thing to consider and remember is that a TVK License
is application scope specific. In other words, you need to generate one type of license for either a Namespaced
or a Cluster
type installation.
Namespaced
vs Cluster
TVK application scope - when to use one or the other? It all depends on the use case. For example, a Namespaced
scope is a more appropriate option when you don't have access to the whole Kubernetes cluster, only to specific namespaces and applications. Most of the cases you want to protect only the applications tied to a specific namespace that you own. On the other hand, a cluster scoped installation type works at the global level, meaning it can trigger backup/restore operations for any namespace or resource from a Kubernetes cluster (including PVs
and the etcd
database).
To summarize:
- If you are a cluster administrator, then you will most probably want to perform
cluster
leveloperations
via corresponding CRDs, like:ClusterBackupPlan
,ClusterBackup
,ClusterRestore
, etc. - If you are a regular user, then you will usually perform
namespaced
only operations (application centric) via corresponding CRDs, like:BackupPlan
,Backup
,Restore
, etc.
The application interface is very similar or uniform when comparing the two types: Cluster
vs non-Cluster
prefixed CRDs
. So, if you're familiar with one type, it's pretty straightforward to use the counterpart.
For more information, please refer to the TVK CRDs official documentation.
Whenever you want to backup
an application, you start by creating a BackupPlan
(or ClusterBackupPlan
) CRD, followed by a Backup
(or ClusterBackup
) object. Trilio Backup Controller
is notified about the change and performs backup object inspection and validation (i.e. whether it is cluster
backup, namespace
backup, etc.). Then, it spawns worker pods (Metamover
, Datamover
) responsible with moving the actual data (Kubernetes metadata, PVs data) to the backend storage (or Target
), such as DigitalOcean Spaces
.
Similarly whenever you create a Restore
object, the Restore Controller
is notified to restore from a Backup
object. Then, Trilio Restore Controller
spawns worker nodes (Metamover
, Datamover
), responsible with moving backup data out of the DigitalOcean Spaces
storage (Kubernetes metadata, PVs data). Finally, the restore process is initiated from the particular backup object.
Below is a diagram that shows the Backup/Restore
workflow for TVK
:
Trilio
is ideal for the disaster
recovery use case, as well as for snapshotting
your application state, prior to performing system operations
on your cluster
, like upgrades
. For more details on this topic, please visit the Trilio Features and Trilio Use Case official page.
After finishing this tutorial, you should be able to:
- Configure
DO Spaces
storage backend forTrilio
to use. Backup
andrestore
yourapplications
Backup
andrestore
your entireDOKS
cluster.- Create
scheduled
backups for your applications. - Create
retention policies
for your backups.
- Introduction
- Prerequisites
- Step 1 - Installing TrilioVault for Kubernetes
- Step 2 - Creating a TrilioVault Target to Store Backups
- Step 3 - Getting to Know the TVK Web Management Console
- Step 4 - Namespaced Backup and Restore Example
- Step 5 - Backup and Restore Whole Cluster Example
- Step 6 - Scheduled Backups
- Step 7 - Backups Retention Policy
- Conclusion
To complete this tutorial, you need the following:
- A DO Spaces Bucket and
access
keys. Save theaccess
andsecret
keys in a safe place for later use. - A Git client, to clone the
Starter Kit
repository. - Helm, for managing
TrilioVault Operator
releases and upgrades. - Doctl, for
DigitalOcean
API interaction. - Kubectl, for
Kubernetes
interaction.
Important note:
In order for TrilioVault
to work correctly and to backup your PVCs
, DOKS
needs to be configured to support the Container Storage Interface
(or CSI
, for short). By default it comes with the driver already installed and configured. You can check using below command:
kubectl get storageclass
The output should look similar to (notice the provisioner is dobs.csi.digitalocean.com):
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
do-block-storage (default) dobs.csi.digitalocean.com Delete Immediate true 10d
The TrilioVault
installation also needs volumeSnapshot
Custom Resource Definition (CRD) for a successful installation. You can check using below command:
kubectl get crd | grep volumesnapshot
The output should look similar to (if not installed, refer to Installing VolumeSnapshot CRDs):
volumesnapshotclasses.snapshot.storage.k8s.io 2022-02-01T06:01:14Z
volumesnapshotcontents.snapshot.storage.k8s.io 2022-02-01T06:01:14Z
volumesnapshots.snapshot.storage.k8s.io 2022-02-01T06:01:15Z
Also make sure that the CRD supports both v1beta1
and v1
API version. You can run the below command to check the API version:
kubectl get crd volumesnapshots.snapshot.storage.k8s.io -o yaml
At the end of the CRD yaml you should see a storedVersions
list, containing both v1beta1
and v1
values (if not installed, refer to Installing VolumeSnapshot CRDs):
...
- lastTransitionTime: "2022-01-20T07:58:06Z"
message: approved in https://github.com/kubernetes-csi/external-snapshotter/pull/419
reason: ApprovedAnnotation
status: "True"
type: KubernetesAPIApprovalPolicyConformant
storedVersions:
- v1beta1
- v1
In this step, you will learn how to deploy TrilioVault
for DOKS
, and manage TVK
installations via Helm
. Backups data will be stored in the DO Spaces
bucket created earlier in the Prerequisites section.
TrilioVault
application can be installed many ways:
- Via the tvk-oneclick
krew
plugin. It has some interesting features, like: checking Kubernetes cluster prerequisites, post install validations, automatic licensing of the product (using the free basic license), application upgrades management, etc. - Via the
TrilioVault Operator
(installable viaHelm
). You define aTrilioVaultManager
CRD, which tellsTrilioVault
operator how to handle theinstallation
,post-configuration
steps, and futureupgrades
of theTrilio
application components. - Fully managed by
Helm
, via the triliovault-operator chart (covered in this tutorial).
Important note:
Starter Kit
tutorial is using the Cluster
installation type for the TVK
application (applicationScope
Hem value is set to "Cluster"
). All examples from this tutorial rely on this type of installation to function properly.
Please follow the steps below, to install TrilioVault
via Helm
:
-
First, clone the
Starter Kit
Git repository and change directory to your local copy:git clone https://github.com/digitalocean/Kubernetes-Starter-Kit-Developers.git cd Kubernetes-Starter-Kit-Developers
-
Next, add the
TrilioVault
Helm repository, and list the available charts:helm repo add triliovault-operator http://charts.k8strilio.net/trilio-stable/k8s-triliovault-operator helm repo update triliovault-operator helm search repo triliovault-operator
The output looks similar to the following:
NAME CHART VERSION APP VERSION DESCRIPTION triliovault-operator/k8s-triliovault-operator 2.9.2 2.9.2 K8s-TrilioVault-Operator is an operator designe...
Note:
The chart of interest is
triliovault-operator/k8s-triliovault-operator
, which will installTrilioVault for Kubernetes
on the cluster along with theTrilioVault-Manager
. You can runhelm show values triliovault-operator/k8s-triliovault-operator
, and export to a file to see all the available options. -
Then, open and inspect the TrilioVault
Helm
values file file provided in theStarter kit
repository, using an editor of your choice (preferably withYAML
lint support). You can use VS Code for example:code 05-setup-backup-restore/assets/manifests/triliovault-values-v2.9.2.yaml
-
Finally, install
TrilioVault for Kubernetes
usingHelm
:helm install triliovault-operator triliovault-operator/k8s-triliovault-operator \ --namespace tvk \ --create-namespace \ -f 05-setup-backup-restore/assets/manifests/triliovault-values.yaml
Note: Above command install both
TrilioVault Operator
andTriloVault Manager
(TVM) Custom Resource using the parameters provided in thetriliovault-values.yaml
. TheTVK
version is now managed by thetag
field in the05-setup-backup-restore/assets/manifests/triliovault-values.yaml
file, so the helm command always have the latest version ofTVK
. User can update below fields in values.yaml:installTVK.applicationScope
for TVK installation scoped e.g.Cluster
orNamespaced
installTVK.ingressConfig.host
for TVK UI hostname e.g.tvk-doks.com
installTVK.ComponentConfiguration.ingressController.service.type
for service type to access the TVK UI e.g.NodePort
orLoadBalancer
Now, please check your TVK
deployment:
helm ls -n tvk
The output looks similar to the following (STATUS
column should display deployed
):
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
triliovault-manager-tvk tvk 1 2022-06-08 08:30:08.490304959 +0000 UTC deployed k8s-triliovault-2.9.2 2.9.2
triliovault-operator tvk 1 2022-06-08 11:32:55.755395 +0300 EEST deployed k8s-triliovault-operator-2.9.2 2.9.2
Next, verify that TrilioVault
is up and running:
kubectl get deployments -n tvk
The output looks similar to the following (all deployments pods must be in the Ready
state):
NAME READY UP-TO-DATE AVAILABLE AGE
k8s-triliovault-admission-webhook 1/1 1 1 83s
k8s-triliovault-control-plane 1/1 1 1 83s
k8s-triliovault-exporter 1/1 1 1 83s
k8s-triliovault-ingress-nginx-controller 1/1 1 1 83s
k8s-triliovault-web 1/1 1 1 83s
k8s-triliovault-web-backend 1/1 1 1 83s
triliovault-operator-k8s-triliovault-operator 1/1 1 1 4m22s
If the output looks like above, you installed TVK
successfully. Next, you will learn how to check license type and validity, as well as how to renew.
By default, when installing TVK
via Helm
, there is no Free Trial
license installed automatically. You can always go to the Trilio
website and generate a new license for your cluster that suits your needs (for example, you can pick the basic license
type that lets you run TrilioVault
indefinetly if your cluster capacity
doesn't exceed 10 nodes
). A free trial license lets you run TVK
for one month
on unlimited
cluster nodes.
Notes:
- TrilioVault is free of charge for Kubernetes clusters with up to 100000 nodes for DigitalOcean users. They can follow below steps to create a special license available for DO customers only
Starter Kit
examples relies on aCluster
license type to function properly.
Please run below command to create a new license for your cluster (it is managed via the License
CRD):
kubectl apply -f 05-setup-backup-restore/assets/manifests/triliovault/tvk_install_license.yaml
Above command will create a job job.batch/tvk-license-digitalocean
which will run a pod tvk-license-digitalocean-828rx
to pull the license from Trilio License Server
and install on the DOKS cluster.
After the job is complete, it will be deleted in 60 seconds.
NOTE:
- If you are downloading a free license from Trilio's website, apply it using below command:
kubectl apply -f <YOUR_LICENSE_FILE_NAME>.yaml -n tvk
Please run below command to see if license is installed and in Active
state on your cluster (it is managed via the License
CRD):
kubectl get license -n tvk
The output looks similar to (notice the STATUS
which should be Active
, as well as the license type in the EDITION
column and EXPIRATION TIME
):
NAME STATUS MESSAGE CURRENT NODE COUNT GRACE PERIOD END TIME EDITION CAPACITY EXPIRATION TIME MAX NODES
test-license-1 Active Cluster License Activated successfully. 1 FreeTrial 100000 2023-02-25T00:00:00Z 1
The license is managed via a special CRD
, namely the License
object. You can inspect it by running below command:
kubectl describe license test-license-1 -n tvk
The output looks similar to (notice the Message
and Capacity
fields, as well as the Edition
):
Name: test-license-1
Namespace: tvk
Labels: <none>
Annotations: generation: 1
triliovault.trilio.io/creator: system:serviceaccount:tvk:k8s-triliovault
triliovault.trilio.io/instance-id: b060660d-4851-482b-8e60-4addd260e1d3
triliovault.trilio.io/updater:
[{"username":"system:serviceaccount:tvk:k8s-triliovault","lastUpdatedTimestamp":"2022-02-24T06:38:21.418828262Z"}]
API Version: triliovault.trilio.io/v1
Kind: License
Metadata:
Creation Timestamp: 2022-02-24T06:38:21Z
...
Status:
Condition:
Message: License Key changed
Timestamp: 2022-02-24T06:38:21Z
Message: Cluster License Activated successfully.
Status: Active
Timestamp: 2022-02-24T06:38:21Z
Current Node Count: 1
Max Nodes: 1
Message: Cluster License Activated successfully.
Properties:
Active: true
Capacity: 100000
Company: TRILIO-KUBERNETES-LICENSE-GEN-DIGITALOCEAN-BASIC
Creation Timestamp: 2022-02-24T00:00:00Z
Edition: FreeTrial
Expiration Timestamp: 2023-02-25T00:00:00Z
Kube UID: b060660d-4851-482b-8e60-4addd260e1d3
License ID: TVAULT-5a4b42c6-953c-11ec-8116-0cc47a9fd48e
Maintenance Expiry Timestamp: 2023-02-25T00:00:00Z
Number Of Users: -1
Purchase Timestamp: 2022-02-24T00:00:00Z
Scope: Cluster
...
The above output will also tell you when the license is going to expire in the Expiration Timestamp
field, and the Scope
(Cluster
based in this case). You can opt for a cluster
wide license type, or for a namespace
based license. More details can be found on the Trilio Licensing documentation page.
To renew the license, you will have to request a new one from the Trilio website by navigating to the licensing page, to replace the old one. After completing the form, you should receive the License
YAML manifest, which can be applied to your cluster using kubectl
. Below commands assume that TVK is installed in the default tvk
namespace (please replace the <>
placeholders accordingly, where required):
kubectl apply -f <YOUR_LICENSE_FILE_NAME>.yaml -n tvk
Then, you can check the new license status as you already learned via:
# List available TVK licenses first from the `tvk` namespace
kubectl get license -n tvk
# Get information about a specific license from the `tvk` namespace
kubectl describe license <YOUR_LICENSE_NAME_HERE> -n tvk
In the next step, you will learn how to define the storage backend for TrilioVault
to store backups, called a target
.
TrilioVault
needs to know first where to store your backups. TrilioVault refers to the storage backend by using the target
term, and it's managed via a special CRD
named Target
. The following target types are supported: S3
and NFS
. For DigitalOcean
and the purpose of the Starter Kit
, it makes sense to rely on the S3
storage type because it's cheap
and scalable
. To benefit from an enhanced level of protection you can create multiple target types (for both S3
and NFS
), so that your data is kept safe in multiple places, thus achieving backup redundancy.
Typical Target
definition looks like below:
apiVersion: triliovault.trilio.io/v1
kind: Target
metadata:
name: trilio-s3-target
namespace: tvk
spec:
type: ObjectStore
vendor: Other
enableBrowsing: true
objectStoreCredentials:
bucketName: <YOUR_DO_SPACES_BUCKET_NAME_HERE>
region: <YOUR_DO_SPACES_BUCKET_REGION_HERE> # e.g.: nyc1
url: "https://<YOUR_DO_SPACES_BUCKET_ENDPOINT_HERE>" # e.g.: nyc1.digitaloceanspaces.com
credentialSecret:
name: trilio-s3-target
namespace: tvk
thresholdCapacity: 10Gi
Explanation for the above configuration:
spec.type
: Type of target for backup storage (S3 is an object store).spec.vendor
: Third party storage vendor hosting the target (forDigitalOcean Spaces
you need to useOther
instead ofAWS
).spec.enableBrowsing
: Enable browsing for the target.spec.objectStoreCredentials
: Defines requiredcredentials
(viacredentialSecret
) to access theS3
storage, as well as other parameters such as bucket region and name.spec.thresholdCapacity
: Maximum threshold capacity to store backup data.
To access S3
storage, each target needs to know bucket credentials. A Kubernetes Secret
must be created as well:
apiVersion: v1
kind: Secret
metadata:
name: trilio-s3-target
namespace: tvk
type: Opaque
stringData:
accessKey: <YOUR_DO_SPACES_ACCESS_KEY_ID_HERE> # value must be base64 encoded
secretKey: <YOUR_DO_SPACES_SECRET_KEY_HERE> # value must be base64 encoded
Notice that the secret name is trilio-s3-target
, and it's referenced by the spec.objectStoreCredentials.credentialSecret
field of the Target
CRD explained earlier. The secret
can be in the same namespace
where TrilioVault
was installed (defaults to tvk
), or in another namespace of your choice. Just make sure that you reference the namespace correctly. On the other hand, please make sure to protect
the namespace
where you store TrilioVault
secrets via RBAC
, for security
reasons.
Steps to create a Target
for TrilioVault
:
-
First, change directory where the
Starter Kit
Git repository was cloned on your local machine:cd Kubernetes-Starter-Kit-Developers
-
Next, create the Kubernetes secret containing your target S3 bucket credentials (please replace the
<>
placeholders accordingly):kubectl create secret generic trilio-s3-target \ --namespace=tvk \ --from-literal=accessKey="<YOUR_DO_SPACES_ACCESS_KEY_HERE>" \ --from-literal=secretKey="<YOUR_DO_SPACES_SECRET_KEY_HERE>"
-
Then, open and inspect the
Target
manifest file provided in theStarter Kit
repository, using an editor of your choice (preferably withYAML
lint support). You can use VS Code for example:code 05-setup-backup-restore/assets/manifests/triliovault/triliovault-s3-target.yaml
-
Now, please replace the
<>
placeholders accordingly for your DO SpacesTrilio
bucket, like:bucketName
,region
,url
andcredentialSecret
. -
Finally, save the manifest file and create the
Target
object usingkubectl
:kubectl apply -f 05-setup-backup-restore/assets/manifests/triliovault/triliovault-s3-target.yaml
What happens next is, TrilioVault
will spawn a worker job
named trilio-s3-target-validator
responsible with validating your S3 bucket (like availability, permissions, etc.). If the job finishes successfully, the bucket is considered to be healthy or available and the trilio-s3-target-validator
job resource is deleted afterwards. If something bad happens, the S3 target validator job is left up and running so that you can inspect the logs and find the possible issue.
Now, please go ahead and check if the Target
resource created earlier is healthy
:
kubectl get target trilio-s3-target -n tvk
The output looks similar to (notice the STATUS
column value - should be Available
, meaning it's in a healthy
state):
NAME TYPE THRESHOLD CAPACITY VENDOR STATUS BROWSING ENABLED
trilio-s3-target ObjectStore 10Gi Other Available
If the output looks like above, then you configured the S3 target object successfully.
Hint:
In case the target object fails to become healthy, you can inspect the logs from the trilio-s3-target-validator
Pod to find the issue:
# First, you need to find the target validator
kubectl get pods -n tvk | grep trilio-s3-target-validator
# Output looks similar to:
#trilio-s3-target-validator-tio99a-6lz4q 1/1 Running 0 104s
# Now, fetch logs data
kubectl logs pod/trilio-s3-target-validator-tio99a-6lz4q -n tvk
The output looks similar to (notice the exception as an example):
...
INFO:root:2021-11-24 09:06:50.595166: waiting for mount operation to complete.
INFO:root:2021-11-24 09:06:52.595772: waiting for mount operation to complete.
ERROR:root:2021-11-24 09:06:54.598541: timeout exceeded, not able to mount within time.
ERROR:root:/triliodata is not a mountpoint. We can't proceed further.
Traceback (most recent call last):
File "/opt/tvk/datastore-attacher/mount_utility/mount_by_target_crd/mount_datastores.py", line 56, in main
utilities.mount_datastore(metadata, datastore.get(constants.DATASTORE_TYPE), base_path)
File "/opt/tvk/datastore-attacher/mount_utility/utilities.py", line 377, in mount_datastore
mount_s3_datastore(metadata_list, base_path)
File "/opt/tvk/datastore-attacher/mount_utility/utilities.py", line 306, in mount_s3_datastore
wait_until_mount(base_path)
File "/opt/tvk/datastore-attacher/mount_utility/utilities.py", line 328, in wait_until_mount
base_path))
Exception: /triliodata is not a mountpoint. We can't proceed further.
...
Next, you will discover the TVK web console which is a really nice and useful addition to help you manage backup and restore operations very easy, among many others.
While you can manage backup and restore operations from the CLI
entirely via kubectl
and CRDs
, TVK
provides a Web Management Console to accomplish the same operations via the GUI. The management console simplifies common tasks via point and click operations, provides better visualization and inspection of TVK cluster objects, as well as to create disaster recovery plans (or DRPs
).
The Helm based installation covered in Step 1 - Installing TrilioVault for Kubernetes already took care of installing the required components for the web management console.
To be able to access the console and explore the features it offers, you need to port forward the ingress controller service for TVK.
First, you need to identify the ingress-nginx-controller
service from the tvk
namespace:
kubectl get svc -n tvk
The output looks similar to (search for the k8s-triliovault-ingress-nginx-controller
line, and notice that it listens on port 80
in the PORT(S)
column):
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
k8s-triliovault-admission-webhook ClusterIP 10.245.202.17 <none> 443/TCP 13m
k8s-triliovault-ingress-nginx-controller NodePort 10.245.192.140 <none> 80:32448/TCP,443:32588/TCP 13m
k8s-triliovault-ingress-nginx-controller-admission ClusterIP 10.3.20.89 <none> 443/TCP 13m
k8s-triliovault-web ClusterIP 10.245.214.13 <none> 80/TCP 13m
k8s-triliovault-web-backend ClusterIP 10.245.10.221 <none> 80/TCP 13m
triliovault-operator-k8s-triliovault-operator-webhook-service ClusterIP 10.245.186.59 <none> 443/TCP 16m
TVK
is using an Nginx Ingress Controller
to route traffic to the management web console services. Routing is host based, and the host name is tvk-doks.com
as defined in the Helm
values file from the Starter Kit
:
# The host name to use when accessing the web console via the TVK ingress nginx controller
installTVK:
ingressConfig:
host: "tvk-doks.com"
Having the above information at hand, please go ahead and edit the /etc/hosts
file, and add this entry:
127.0.0.1 tvk-doks.com
Next, create the port forward for the TVK ingress controller service:
kubectl port-forward svc/k8s-triliovault-ingress-nginx-controller 8080:80 -n tvk
Finally export the kubeconfig
file for your DOKS cluster. This step is required so that the web console can authenticate you:
# List the available clusters
doctl k8s cluster list
# Save cluster configuration to YAML
doctl kubernetes cluster kubeconfig show <YOUR_CLUSTER_NAME_HERE> > config_<YOUR_CLUSTER_NAME_HERE>.yaml
Hint: If you have only one cluster, the below command can be used:
DOKS_CLUSTER_NAME="$(doctl k8s cluster list --no-header --format Name)"
doctl kubernetes cluster kubeconfig show $DOKS_CLUSTER_NAME > config_${DOKS_CLUSTER_NAME}.yaml
After following the above presented steps, you can access the console in your web browser by navigating to: http://tvk-doks.com:8080. When asked for the kubeconfig
file, please select the one that you created in the last command from above.
Note:
Please keep the generated kubeconfig
file safe because it contains sensitive data.
The home page looks similar to:
Go ahead and explore each section from the left, like:
-
Cluster Management: This shows the list of primary cluster and other clusters having TVK instances, added to the primary OVH cluster using
Multi-Cluster Management
feature. -
Backup & Recovery: This is the main dashboard which gives you a general overview for whole cluster, like: Discovered namespaces, Applications, Backupplans list, Targets, Hooks, Policies etc.
- Namespaces:
- Applications:
- Backupplans:
- Targets:
- Scheduling Policy:
- Retention Policy:
-
Monitoring: This has two options-
TrilioVault Monitoring
andVelero Monitoring
if user has Velero configured on their OVH cluster.TrilioVault Monitoring
: It shows the backup and restore summary of the kubernetes cluster.
Velero Monitoring
:
-
Disaster Recovery: Allows you to manage and perform disaster recovery operations.
You can also see the S3 Target created earlier, by navigating to Backup & Recovery -> Targets -> Select the Namespace tvk from the dropdown on the top
:
Going further, you can browse the target and list the available backups by clicking on the Actions
button from the right, and then select Launch Browser
option from the pop-up menu (for this to work the target must have the enableBrowsing
flag set to true
):
For more information and available features, please consult the TVK Web Management Console User Interface official documentation.
Next, you will learn how to perform backup and restore operations for specific use cases, like:
- Specific
namespace(s)
backup and restore. - Whole
cluster
backup and restore.
In this step, you will learn how to create a one-time backup
for an entire namespace
from your DOKS
cluster and restore
it afterwards, making sure that all the resources are re-created. The namespace in question is ambassador
. TVK has a neat feature that allows you to perform backups at a higher level than just namespaces, meaning: Helm Releases
. You will learn how to accomplish such a task, in the steps to follow.
Next, you will perform the following tasks:
Create
theambassador
Helm releasebackup
, viaBackupPlan
andBackup
CRDs.Delete
theambassador
Helm release.Restore
theambassador
Helm release, viaRestore
CRD.Check
theambassador
Helm release resources restoration.
To perform backups for a single application at the namespace level (or Helm release), a BackupPlan
followed by a Backup
CRD is required. A BackupPlan
allows you to:
- Specify a
target
where backups should bestored
. - Define a set of resources to backup (e.g.:
namespace
orHelm releases
). Encryption
, if you want to encrypt your backups on the target (this is a very nice feature for securing your backups data).- Define
schedules
forfull
orincremental
type backups. - Define
retention
policies for your backups.
In other words a BackupPlan
is a definition of 'what'
, 'where'
, 'to'
and 'how'
of the backup process, but it doesn't perform the actual backup. The Backup
CRD is responsible with triggering the actual backup process, as dictated by the BackupPlan
spec.
Typical BackupPlan
CRD looks like below:
apiVersion: triliovault.trilio.io/v1
kind: BackupPlan
metadata:
name: ambassador-helm-release-backup-plan
namespace: ambassador
spec:
backupConfig:
target:
name: trilio-s3-target
namespace: tvk
backupPlanComponents:
helmReleases:
- ambassador
Explanation for the above configuration:
spec.backupConfig.target.name
: TellsTVK
what targetname
to use for storing backups.spec.backupConfig.target.namespace
: TellsTVK
in what namespace the target was created.spec.backupComponents
: Defines alist
ofresources
to back up (can benamespaces
orHelm releases
).
Typical Backup
CRD looks like below:
apiVersion: triliovault.trilio.io/v1
kind: Backup
metadata:
name: ambassador-helm-release-full-backup
namespace: ambassador
spec:
type: Full
backupPlan:
name: ambassador-helm-release-backup-plan
namespace: ambassador
Explanation for the above configuration:
spec.type
: Specifies backup type (e.g.Full
orIncremental
).spec.backupPlan
: Specifies theBackupPlan
which thisBackup
should use.
Steps to initiate the Ambassador
Helm release one time backup:
-
First, make sure that the
Ambassador Edge Stack
is deployed in your cluster by following the steps from the Ambassador Ingress tutorial. -
Next, change directory where the
Starter Kit
Git repository was cloned on your local machine:cd Kubernetes-Starter-Kit-Developers
-
Then, open and inspect the Ambassador
BackupPlan
andBackup
manifest files provided in theStarter Kit
repository, using an editor of your choice (preferably withYAML
lint support). You can use VS Code for example:code 05-setup-backup-restore/assets/manifests/triliovault/ambassador-helm-release-backup-plan.yaml code 05-setup-backup-restore/assets/manifests/triliovault/ambassador-helm-release-backup.yaml
-
Finally, create the
BackupPlan
andBackup
resources, usingkubectl
, please note that theBackupPlan
needs to be available first so it mai take a minute to create that:kubectl apply -f 05-setup-backup-restore/assets/manifests/triliovault/ambassador-helm-release-backup-plan.yaml kubectl apply -f 05-setup-backup-restore/assets/manifests/triliovault/ambassador-helm-release-backup.yaml
Now, inspect the BackupPlan
status (targeting the ambassador
Helm release), using kubectl
:
kubectl get backupplan ambassador-helm-release-backup-plan -n ambassador
The output looks similar to (notice the STATUS
column value which should be set to Available
):
NAME TARGET ... STATUS
ambassador-helm-release-backup-plan trilio-s3-target ... Available
Next, check the Backup
object status, using kubectl
:
kubectl get backup ambassador-helm-release-full-backup -n ambassador
The output looks similar to (notice the STATUS
column value which should be set to InProgress
, as well as the BACKUP TYPE
set to Full
):
NAME BACKUPPLAN BACKUP TYPE STATUS ...
ambassador-helm-release-full-backup ambassador-helm-release-backup-plan Full InProgress ...
After all the ambassador
Helm release components finish uploading to the S3
target, you should get below results:
# Inspect the cluster backup status again for the `ambassador` namespace
kubectl get backup ambassador-helm-release-full-backup -n ambassador
# The output looks similar to (notice that the `STATUS` changed to `Available`, and `PERCENTAGE` is `100`)
NAME BACKUPPLAN BACKUP TYPE STATUS ... PERCENTAGE
ambassador-helm-release-full-backup ambassador-helm-release-backup-plan Full Available ... 100
If the output looks like above, you successfully backed up the ambassador
Helm release. You can go ahead and see how TrilioVault
stores Kubernetes
metadata by listing the TrilioVault S3 Bucket
contents. For example, you can use s3cmd:
s3cmd ls s3://trilio-starter-kit --recursive
The output looks similar to (notice that the listing contains the json manifests and UIDs, representing Kubernetes objects):
2021-11-25 07:04 28 s3://trilio-starter-kit/6c68af15-5392-45bb-a70b-b26a93605bd9/
2021-11-25 07:04 28 s3://trilio-starter-kit/6c68af15-5392-45bb-a70b-b26a93605bd9/5ebfffb5-442a-455c-b0de-1db98e18b425/
2021-11-25 07:04 311 s3://trilio-starter-kit/6c68af15-5392-45bb-a70b-b26a93605bd9/5ebfffb5-442a-455c-b0de-1db98e18b425/backup-namespace.json.manifest.00000004
2021-11-25 07:04 302 s3://trilio-starter-kit/6c68af15-5392-45bb-a70b-b26a93605bd9/5ebfffb5-442a-455c-b0de-1db98e18b425/backup.json.manifest.00000004
2021-11-25 07:04 305 s3://trilio-starter-kit/6c68af15-5392-45bb-a70b-b26a93605bd9/5ebfffb5-442a-455c-b0de-1db98e18b425/backupplan.json.manifest.00000004
2021-11-25 07:04 28 s3://trilio-starter-kit/6c68af15-5392-45bb-a70b-b26a93605bd9/5ebfffb5-442a-455c-b0de-1db98e18b425/custom/
2021-11-25 07:04 28 s3://trilio-starter-kit/6c68af15-5392-45bb-a70b-b26a93605bd9/5ebfffb5-442a-455c-b0de-1db98e18b425/custom/metadata-snapshot/
2021-11-25 07:04 330 s3://trilio-starter-kit/6c68af15-5392-45bb-a70b-b26a93605bd9/5ebfffb5-442a-455c-b0de-1db98e18b425/custom/metadata-snapshot/metadata.json.manifest.00000002
...
Hint:
In case the backup fails to become available, you can inspect the logs from the metamover
Pod to find the issue:
# First, you need to find the metamover pod
kubectl get pods -n ambassador | grep metamover
# Output looks similar to:
ambassador-helm-release-full-backup-metamover-mg9gl0--1-2d6wx 1/1 Running 0 4m32s
# Now, fetch logs data
kubectl logs pod/ambassador-helm-release-full-backup-metamover-mg9gl0--1-2d6wx -n ambassador -f
The output looks similar to (any errors during the backup will be shown here):
...
{"component":"meta-mover","file":"pkg/metamover/snapshot/parser/commons.go:1366","func":"github.com/trilioData/k8s-triliovault/pkg/metamover/snapshot/parser.(*Component).ParseForDataComponents","level":"info","msg":"Parsing data components of resource rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding: [edge-stack]","time":"2022-06-14T06:20:56Z"}
{"component":"meta-mover","file":"pkg/metamover/snapshot/parser/commons.go:1366","func":"github.com/trilioData/k8s-triliovault/pkg/metamover/snapshot/parser.(*Component).ParseForDataComponents","level":"info","msg":"Parsing data components of resource rbac.authorization.k8s.io/v1, Kind=RoleBinding: [edge-stack-agent-config]","time":"2022-06-14T06:20:56Z"}
...
Finally, you can check that the backup is available in the web console as well, by navigating to Resource Management -> ambassador -> Backup Plans
(notice that it's in the Available
state, and that the ambassador
Helm release was backed up in the Component Details
sub-view):
Now, go ahead and simulate a disaster, by intentionally deleting the ambassador
Helm release:
helm delete ambassador -n ambassador
Next, check that the namespace resources were deleted (listing should be empty):
kubectl get all -n ambassador
Finally, verify that the echo
and quote
backend services endpoint
is DOWN
(please refer to Creating the Ambassador Edge Stack Backend Services), regarding the backend applications
used in the Starter Kit
tutorial). You can use curl
to test (or you can use your web browser):
curl -Li http://quote.starter-kit.online/quote/
curl -Li http://echo.starter-kit.online/echo/
Important notes:
- If restoring into the same namespace, ensure that the original application components have been removed. Especially the PVC of application are deleted.
- If restoring to another cluster (migration scenario), ensure that TrilioVault for Kubernetes is running in the remote namespace/cluster as well. To restore into a new cluster (where the Backup CR does not exist),
source.type
must be set tolocation
. Please refer to the Custom Resource Definition Restore Section to view arestore
bylocation
example. - When you delete the
ambassador
namespace, the load balancer resource associated with the ambassador service will be deleted as well. So, when you restore theambassador
service, theLB
will be recreated byDigitalOcean
. The issue is that you will get aNEW IP
address for yourLB
, so you will need toadjust
theA records
for gettingtraffic
into your domains hosted on the cluster.
To restore a specific Backup
, you need to create a Restore
CRD. Typical Restore
CRD looks like below:
apiVersion: triliovault.trilio.io/v1
kind: Restore
metadata:
name: ambassador-helm-release-restore
namespace: ambassador
spec:
source:
type: Backup
backup:
name: ambassador-helm-release-full-backup
namespace: ambassador
skipIfAlreadyExists: true
Explanation for the above configuration:
spec.source.type
: Specifies what backup type to restore from.spec.source.backup
: Contains a reference to the backup object to restore from.spec.skipIfAlreadyExists
: Specifies whether to skip restore of a resource if it already exists in the namespace restored.
Restore
allows you to restore the last successful Backup
for an application. It is used to restore a single namespaces
or Helm release
, protected by the Backup
CRD. The Backup
CRD is identified by its name: ambassador-helm-release-full-backup
.
First, inspect the Restore
CRD example from the Starter Kit
Git repository:
code 05-setup-backup-restore/assets/manifests/triliovault/ambassador-helm-release-restore.yaml
Then, create the Restore
resource using kubectl
:
kubectl apply -f 05-setup-backup-restore/assets/manifests/triliovault/ambassador-helm-release-restore.yaml
Finally, inspect the Restore
object status:
kubectl get restore ambassador-helm-release-restore -n ambassador
The output looks similar to (notice the STATUS column set to Completed
, as well as the PERCENTAGE COMPLETED
set to 100
):
NAME STATUS DATA SIZE START TIME END TIME PERCENTAGE COMPLETED DURATION
ambassador-helm-release-restore Completed 0 2021-11-25T15:06:52Z 2021-11-25T15:07:35Z 100 43.524191306s
If the output looks like above, then the ambassador
Helm release restoration
process completed successfully.
Check that all the ambassador
namespace resources
are in place and running:
kubectl get all -n ambassador
The output looks similar to:
NAME READY STATUS RESTARTS AGE
pod/ambassador-5bdc64f9f6-42wzr 1/1 Running 0 9m58s
pod/ambassador-5bdc64f9f6-nrkzd 1/1 Running 0 9m58s
pod/ambassador-agent-bcdd8ccc8-ktmcv 1/1 Running 0 9m58s
pod/ambassador-redis-64b7c668b9-69drs 1/1 Running 0 9m58s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/ambassador LoadBalancer 10.245.173.90 157.245.23.93 80:30304/TCP,443:30577/TCP 9m59s
service/ambassador-admin ClusterIP 10.245.217.211 <none> 8877/TCP,8005/TCP 9m59s
service/ambassador-redis ClusterIP 10.245.77.142 <none> 6379/TCP 9m59s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/ambassador 2/2 2 2 9m59s
deployment.apps/ambassador-agent 1/1 1 1 9m59s
deployment.apps/ambassador-redis 1/1 1 1 9m59s
NAME DESIRED CURRENT READY AGE
replicaset.apps/ambassador-5bdc64f9f6 2 2 2 9m59s
replicaset.apps/ambassador-agent-bcdd8ccc8 1 1 1 9m59s
replicaset.apps/ambassador-redis-64b7c668b9 1 1 1 9m59s
Ambassador Hosts
:
kubectl get hosts -n ambassador
The output looks similar to (STATE
should be Ready
, as well as the HOSTNAME
column pointing to the fully qualified host name):
NAME HOSTNAME STATE PHASE COMPLETED PHASE PENDING AGE
echo-host echo.starter-kit.online Ready 11m
quote-host quote.starter-kit.online Ready 11m
Ambassador Mappings
:
kubectl get mappings -n ambassador
The output looks similar to (notice the echo-backend
which is mapped to the echo.starter-kit.online
host and /echo/
source prefix, same for quote-backend
):
NAME SOURCE HOST SOURCE PREFIX DEST SERVICE STATE REASON
ambassador-devportal /documentation/ 127.0.0.1:8500
ambassador-devportal-api /openapi/ 127.0.0.1:8500
ambassador-devportal-assets /documentation/(assets|styles)/(.*)(.css) 127.0.0.1:8500
ambassador-devportal-demo /docs/ 127.0.0.1:8500
echo-backend echo.starter-kit.online /echo/ echo.backend
quote-backend quote.starter-kit.online /quote/ quote.backend
Now, you need to update your DNS A records
, because the DigitalOcean load balancer resource was recreated, and it has a new external IP
assigned.
Finally, check if the backend applications
respond to HTTP
requests as well (please refer to Creating the Ambassador Edge Stack Backend Services), regarding the backend applications
used in the Starter Kit
tutorial):
curl -Li http://quote.starter-kit.online/quote/
curl -Li http://echo.starter-kit.online/echo/
Next step deals with whole cluster backup and restore, thus covering a disaster recovery scenario.
In this step, you will simulate a disaster recovery
scenario. The whole DOKS
cluster will be deleted, and then the important applications restored from a previous backup.
Next, you will perform the following tasks:
Create
themulti-namespace backup
, using aClusterBackupPlan
CRD that targetsall important namespaces
from yourDOKS
cluster.Delete
theDOKS
cluster, usingdoctl
.Re-install
TVK and configure the S3 target (you're going to use the same S3 bucket, where your important backups are stored)Restore
all the important applications by using the TVK web console.Check
theDOKS
cluster applications integrity.
The main idea here is to perform a DOKS
cluster backup
by including all important namespaces
, that hold your essential applications
and configurations
. Basically, we cannot name it a full cluster backup and restore, but rather a multi-namespace
backup and restore operation. In practice this is all that's needed, because everything is namespaced
in Kubernetes
. You will also learn how to perform a cluster restore operation via location
from the target
. The same flow applies when you need to perform cluster migration.
Typical ClusterBackupPlan
manifest targeting multiple namespaces looks like below:
apiVersion: triliovault.trilio.io/v1
kind: ClusterBackupPlan
metadata:
name: starter-kit-cluster-backup-plan
namespace: tvk
spec:
backupConfig:
target:
name: trilio-s3-target
namespace: tvk
backupComponents:
- namespace: ambassador
- namespace: backend
- namespace: monitoring
Notice that kube-system
(or other DOKS cluster related namespaces) is not included in the list. Usually, those are not required, unless there is a special case requiring some settings to be persisted at that level.
Steps to initiate a backup for all important namespaces in your DOKS cluster:
-
First, change directory where the
Starter Kit
Git repository was cloned on your local machine:cd Kubernetes-Starter-Kit-Developers
-
Then, open and inspect the
ClusterBackupPlan
andClusterBackup
manifest files provided in theStarter Kit
repository, using an editor of your choice (preferably withYAML
lint support). You can use VS Code for example:code 05-setup-backup-restore/assets/manifests/triliovault/starter-kit-cluster-backup-plan.yaml code 05-setup-backup-restore/assets/manifests/triliovault/starter-kit-cluster-backup.yaml
-
Finally, create the
ClusterBackupPlan
andClusterBackup
resources, usingkubectl
:kubectl apply -f 05-setup-backup-restore/assets/manifests/triliovault/starter-kit-cluster-backup-plan.yaml kubectl apply -f 05-setup-backup-restore/assets/manifests/triliovault/starter-kit-cluster-backup.yaml
Now, inspect the ClusterBackupPlan
status, using kubectl
:
kubectl get clusterbackupplan starter-kit-cluster-backup-plan -n tvk
The output looks similar to (notice the STATUS
column value which should be set to Available
):
NAME TARGET ... STATUS
starter-kit-cluster-backup-plan trilio-s3-target ... Available
Next, check the ClusterBackup
status, using kubectl
:
kubectl get clusterbackup starter-kit-cluster-backup -n tvk
The output looks similar to (notice the STATUS
column value which should be set to Available
, as well as the PERCENTAGE COMPLETE
set to 100
):
NAME BACKUPPLAN BACKUP TYPE STATUS ... PERCENTAGE COMPLETE
starter-kit-cluster-backup starter-kit-cluster-backup-plan Full Available ... 100
If the output looks like above then all your important application namespaces were backed up successfully.
Note:
Please bear in mind that it may take a while for the full cluster backup to finish, depending on how many namespaces and associated resources are involved in the process.
You can also open the web console main dashboard and inspect the multi-namespace
backup (notice how all the important namespaces that were backed up are highlighted in green color, in a honeycomb structure):
An important aspect to keep in mind is that whenever you destroy a DOKS
cluster and then restore it, a new Load Balancer
with a new external IP
is created as well when TVK
restores your ingress
controller. So, please make sure to update your DigitalOcean DNS A records
accordingly.
Now, delete the whole DOKS
cluster (make sure to replace the <>
placeholders accordingly):
doctl kubernetes cluster delete <DOKS_CLUSTER_NAME>
Next, re-create the cluster as described in Section 1 - Set up DigitalOcean Kubernetes.
To perform the restore operation, you need to install the TVK
application as described in Step 1 - Installing TrilioVault for Kubernetes. Please make sure to use the same Helm Chart version
- this is important!
After the installation finishes successfully, configure the TVK
target as described in Step 2 - Creating a TrilioVault Target to Store Backups, and point it to the same S3 bucket
where your backup data is located. Also, please make sure that target browsing
is enabled.
Next, verify and activate a new license as described in the TrilioVault Application Licensing section.
To get access to the web console user interface, please consult Getting Access to the TVK Web Management Console section.
Then, navigate to Resource Management -> TVK Namespace -> Targets
(in case of Starter Kit
the TVK Namespace is tvk
):
Going further, browse the target and list the available backups by clicking on the Actions
button from the right. Then, select Launch Browser
option from the pop-up menu (for this to work the target must have the enableBrowsing
flag set to true
):
Now, click on the starter-kit-cluster-backup-plan
item from the list, and then click and expand the starter-kit-cluster-backup
item from the right sub-window:
To start the restore process, click on the Restore
button. A progress window will be displayed similar to:
After a while, if the progress window looks like below, then the multi-namespace
restore operation completed successfully:
First, verify all cluster Kubernetes
resources (you should have everything in place):
kubectl get all --all-namespaces
Then, make sure that your DNS A records are updated to point to your new load balancer external IP.
Finally, the backend applications
should respond to HTTP
requests as well (please refer to Creating the Ambassador Edge Stack Backend Services), regarding the backend applications
used in the Starter Kit
tutorial):
curl -Li http://quote.starter-kit.online/quote/
curl -Li http://echo.starter-kit.online/echo/
In the next step, you will learn how to perform scheduled (or automatic) backups for your DOKS
cluster applications.
Taking backups automatically based on a schedule, is a really useful feature to have. It allows you to rewind back time
, and restore the system to a previous working state if something goes wrong. This section provides an example for an automatic backup on a 5 minute
schedule (the kube-system
namespace was picked).
First, you need to create a Policy
CRD of type Schedule
that defines the backup schedule in cron
format (same as Linux
cron). Schedule polices can be used for either BackupPlan
or ClusterBackupPlan
CRDs. Typical schedule policy CRD looks like below (defines a 5 minute
schedule):
kind: Policy
apiVersion: triliovault.trilio.io/v1
metadata:
name: scheduled-backup-every-5min
namespace: tvk
spec:
type: Schedule
scheduleConfig:
schedule:
- "*/5 * * * *" # trigger every 5 minutes
Next, you can apply the schedule policy to a ClusterBackupPlan
CRD for example, as seen below:
apiVersion: triliovault.trilio.io/v1
kind: ClusterBackupPlan
metadata:
name: kube-system-ns-backup-plan-5min-schedule
namespace: tvk
spec:
backupConfig:
target:
name: trilio-s3-target
namespace: tvk
schedulePolicy:
fullBackupPolicy:
name: scheduled-backup-every-5min
namespace: tvk
backupComponents:
- namespace: kube-system
- namespace: backend
Looking at the above, you can notice that it's a basic ClusterBackupPlan
CRD, referencing the Policy
CRD defined earlier via the spec.backupConfig.schedulePolicy
field. You can have separate policies created for full
or incremental
backups, hence the fullBackupPolicy
or incrementalBackupPolicy
can be specified in the spec.
Now, please go ahead and create the schedule Policy
, using the sample manifest provided by the Starter Kit
tutorial (make sure to change directory first, where the Starter Kit Git repository was cloned on your local machine):
kubectl apply -f 05-setup-backup-restore/assets/manifests/triliovault/scheduled-backup-every-5min.yaml
Check that the policy resource was created:
kubectl get policies -n tvk
The output looks similar to (notice the POLICY
type set to Schedule
):
NAME POLICY DEFAULT
scheduled-backup-every-5min Schedule false
Finally, create the resources for the kube-system
namespace scheduled backups:
# Create the backup plan first for kube-system namespace
kubectl apply -f 05-setup-backup-restore/assets/manifests/triliovault/kube-system-ns-backup-plan-scheduled.yaml
# Create and trigger the scheduled backup for kube-system namespace
kubectl apply -f 05-setup-backup-restore/assets/manifests/triliovault/kube-system-ns-backup-scheduled.yaml
Check the scheduled backup plan status for kube-system
:
kubectl get clusterbackupplan kube-system-ns-backup-plan-5min-schedule -n tvk
The output looks similar to (notice the FULL BACKUP POLICY
value set to the previously created scheduled-backup-every-5min
policy resource, as well as the STATUS
which should be Available
):
NAME TARGET ... FULL BACKUP POLICY STATUS
kube-system-ns-backup-plan-5min-schedule trilio-s3-target ... scheduled-backup-every-5min Available
Check the scheduled backup status for kube-system
:
kubectl get clusterbackup kube-system-ns-full-backup-scheduled -n tvk
The output looks similar to (notice the BACKUPPLAN
value set to the previously created backup plan resource, as well as the STATUS
which should be Available
):
NAME BACKUPPLAN BACKUP TYPE STATUS ...
kube-system-ns-full-backup-scheduled kube-system-ns-backup-plan-5min-schedule Full Available ...
Now, you can check that backups are performed on a regular interval (5 minutes), by querying the cluster backup resource and inspect the START TIME
column (kubectl get clusterbackup -n tvk
). It should reflect the 5 minute delta, as highlighted in the picture below:
In the next step, you will learn how to set up a retention policy for your backups.
The retention policy allows you to define the number
of backups to retain
and the cadence
to delete
backups as per compliance requirements. The retention policy CRD
provides a simple YAML
specification to define the number
of backups to retain in terms of days
, weeks
, months
, years
, latest etc.
Retention polices can be used for either BackupPlan
or ClusterBackupPlan
CRDs. Typical Policy
manifest for the Retention
type looks like below:
apiVersion: triliovault.trilio.io/v1
kind: Policy
metadata:
name: sample-policy
spec:
type: Retention
retentionConfig:
latest: 2
weekly: 1
dayOfWeek: Wednesday
monthly: 1
dateOfMonth: 15
monthOfYear: March
yearly: 1
Explanation for the above configuration:
spec.type
: Defines policy type. Can be:Retention
orSchedule
.spec.retentionConfig
: Describes retention configuration, like what interval to use for backups retention and how many.spec.retentionConfig.latest
: Maximum number of latest backups to be retained.spec.retentionConfig.weekly
: Maximum number of backups to be retained in a week.spec.retentionConfig.dayOfWeek
: Day of the week to maintain weekly backups.spec.retentionConfig.monthly
: Maximum number of backups to be retained in a month.spec.retentionConfig.dateOfMonth
: Date of the month to maintain monthly backups.spec.retentionConfig.monthOfYear
: Month of the backup to retain for yearly backups.spec.retentionConfig.yearly
: Maximum number of backups to be retained in a year.
The above retention policy translates to:
- On a
weekly
basis, keep one backup eachWednesday
. - On a
monthly
basis, keep one backup in the15th
day. - On a
yearly
basis, keep one backup everyMarch
. Overall
, I want to always have the2 most recent
backups available.
The basic flow for creating a retention policy resource goes the same way as for scheduled backups. You need a BackupPlan
or a ClusterBackupPlan
CRD defined to reference the retention policy, and then have a Backup
or ClusterBackup
object to trigger the process.
Typical ClusterBackupPlan
example configuration that has retention set, looks like below:
apiVersion: triliovault.trilio.io/v1
kind: ClusterBackupPlan
metadata:
name: kube-system-ns-backup-plan-5min-schedule
namespace: tvk
spec:
backupConfig:
target:
name: trilio-s3-target
namespace: tvk
retentionPolicy:
fullBackupPolicy:
name: ambassador-backups-retention-policy
namespace: tvk
backupComponents:
- namespace: kube-system
- namespace: backend
Notice that it uses a retentionPolicy
field to reference the policy in question. Of course, you can have a backup plan that has both types of policies set, so that it is able to perform scheduled backups, as well as to deal with retention strategies.
Having so many TVK resources each responsible with various operations like: scheduled backups, retention, etc, it is very probable for things to go wrong at some point in time. It means that some of the previously enumerated operations might fail due to various reasons, like: inaccessible storage, network issues for NFS, etc. So, what happens is that your DOKS
cluster will get crowded
with many Kubernetes objects
in a failed state
.
You need a way to garbage collect all those objects in the end and release associated resources, to avoid trouble in the future. Meet the Cleanup Policy
CRD:
apiVersion: triliovault.trilio.io/v1
kind: Policy
metadata:
name: garbage-collect-policy
spec:
type: Cleanup
cleanupConfig:
backupDays: 5
The above cleanup policy must be defined in the TVK
install namespace. Then, a cron job
is created automatically for you that runs every 30 mins
, and deletes failed backups
based on the value specified for backupdays
within the spec field.
This is a very neat feature that TVK provides to help you deal with this kind of situation.
In this tutorial, you learned how to perform one time
, as well as scheduled
backups, and to restore everything back. Having scheduled
backups in place, is very important as it allows you to revert to a previous snapshot in time, if something goes wrong along the way. You walked through a disaster recovery scenario, as well. Next, backups retention plays an important role as well, because storage is finite and sometimes it can get expensive if too many objects are implied.
All the basic tasks and operations explained in this tutorial, are meant to give you a basic introduction and understanding of what TrilioVault for Kubernetes
is capable of. You can learn more about TrilioVault for Kubernetes
and other interesting (or useful) topics, by following the links below:
- TVK CRD API documentation.
- How to Integrate Pre/Post Hooks for Backup Operations, with examples given for various databases.
- Immutable Backups, which restrict backups on the target storage to be overwritten.
- Helm Releases Backup, which shows examples for Helm releases backup strategies.
- Backups Encryption, which explains how to encrypt and protect sensitive data on the target (storage).
- Disaster Recovery Plan.
- Multi-Cluster Management.
- Restore Transforms.