Skip to content

Latest commit

 

History

History
1346 lines (961 loc) · 71.6 KB

File metadata and controls

1346 lines (961 loc) · 71.6 KB

How to Perform Backup and Restore Using TrilioVault for Kubernetes

Introduction

In this tutorial, you will learn how to deploy TrilioVault for Kubernetes (or TVK) to your DOKS cluster, create backups, and recover from a backup if something goes wrong. You can back up your entire cluster, or optionally choose a namespace or label based backups. Helm Releases backups is supported as well, which is a nice addition for the Starter Kit where every installation is Helm based.

Advantages of using Trilio:

  • Take full (or incremental) backups of your cluster and restore in case of data loss.
  • Migrate from one cluster to another.
  • Helm release backups are supported.
  • Run pre and post hooks for backup and restore operations.
  • Web management console, that allows you to inspect your backup/restore operations state in detail (and many other features).
  • Define retention policies for your backups.
  • Application lifecycle (meaning, TVK itself) can be managed via a dedicated TrilioVault Operator if desired.
  • Velero integration (Trilio supports monitoring Velero backups, restores, and backup/snapshot locations via the web management console).
  • You can backup and restore Operator based applications.

How TrilioVault for Kubernetes Works

TVK follows a cloud native architecture, meaning that it has several components that together form the Control Plane and Data Plane layers. Everything is managed via CRDs, thus making it fully Kubernetes native. What is nice about Trilio is the clear separation of concerns, and how effective it handles backup and restore operations.

Each TrilioVault application consists of a bunch of Controllers and the associated CRDs. Every time a CRD is created or updated, the responsible controller is notified and performs cluster reconciliation. Then, the controller in charge spawns Kubernetes jobs that perform the real operation (like backup, restore, etc) in parallel.

Control Plane consists of:

  • Target Controller, defines the storage backend (S3, NFS, etc) via specific CRDs.
  • BackupPlan Controller, defines the components to backup, automated backups schedule, retention strategy, etc via specific CRDs.
  • Restore Controller, defines restore operations via specific CRDs.

Data Plane consists of:

  • Datamover Pods, responsible with transferring data between persistent volumes and backup media (or Target). TrilioVault works with Persistent Volumes (PVs) using the CSI interface. For each PV that needs to be backed up, an ephemeral Datamover Pod is created. After each operation finishes, the associated pod is destroyed.
  • Metamover Pods, responsible with transferring Kubernetes API objects data to backup media (or Target). Metamover pods are ephemeral, just like the Datamover ones.

Understanding TrilioVault Application Scope

TrilioVault for Kubernetes works based on scope, meaning you can have a Namespaced or a Cluster type of installation.

A Namespaced installation allows you to backup and restore at the namespace level only. In other words, the backup is meant to protect a set of applications that are bound to a namespace that you own. This is how a BackupPlan and the corresponding Backup CRD works. You cannot mutate those CRDs in other namespaces, they must be created in the same namespace where the application to be backed up is located.

On the other hand, a Cluster type installation is not scoped or bound to any namespace or a set of applications. You define cluster type backups via the Cluster prefixed CRDs, like: ClusterBackupPlan, ClusterBackup, etc. Cluster type backups are a little bit more flexible, in the sense that you are not tied to a specific namespace or set of applications to backup and restore. You can perform backup/restore operations for multiple namespaces and applications at once, including PVs as well (you can also backup etcd databased content).

In order to make sure that TVK application scope and rules are followed correctly, TrilioVault is using an Admission Controller. It intercepts and validates each CRD that you want to push for TVK, before it is actually created. In case TVK application scope is not followed, the admission controller will reject CRD creation in the cluster.

Another important thing to consider and remember is that a TVK License is application scope specific. In other words, you need to generate one type of license for either a Namespaced or a Cluster type installation.

Namespaced vs Cluster TVK application scope - when to use one or the other? It all depends on the use case. For example, a Namespaced scope is a more appropriate option when you don't have access to the whole Kubernetes cluster, only to specific namespaces and applications. Most of the cases you want to protect only the applications tied to a specific namespace that you own. On the other hand, a cluster scoped installation type works at the global level, meaning it can trigger backup/restore operations for any namespace or resource from a Kubernetes cluster (including PVs and the etcd database).

To summarize:

  • If you are a cluster administrator, then you will most probably want to perform cluster level operations via corresponding CRDs, like: ClusterBackupPlan, ClusterBackup, ClusterRestore, etc.
  • If you are a regular user, then you will usually perform namespaced only operations (application centric) via corresponding CRDs, like: BackupPlan, Backup, Restore, etc.

The application interface is very similar or uniform when comparing the two types: Cluster vs non-Cluster prefixed CRDs. So, if you're familiar with one type, it's pretty straightforward to use the counterpart.

For more information, please refer to the TVK CRDs official documentation.

Backup and Restore Workflow

Whenever you want to backup an application, you start by creating a BackupPlan (or ClusterBackupPlan) CRD, followed by a Backup (or ClusterBackup) object. Trilio Backup Controller is notified about the change and performs backup object inspection and validation (i.e. whether it is cluster backup, namespace backup, etc.). Then, it spawns worker pods (Metamover, Datamover) responsible with moving the actual data (Kubernetes metadata, PVs data) to the backend storage (or Target), such as DigitalOcean Spaces.

Similarly whenever you create a Restore object, the Restore Controller is notified to restore from a Backup object. Then, Trilio Restore Controller spawns worker nodes (Metamover, Datamover), responsible with moving backup data out of the DigitalOcean Spaces storage (Kubernetes metadata, PVs data). Finally, the restore process is initiated from the particular backup object.

Below is a diagram that shows the Backup/Restore workflow for TVK:

Trilio Backup/Restore Workflow

Trilio is ideal for the disaster recovery use case, as well as for snapshotting your application state, prior to performing system operations on your cluster, like upgrades. For more details on this topic, please visit the Trilio Features and Trilio Use Case official page.

After finishing this tutorial, you should be able to:

  • Configure DO Spaces storage backend for Trilio to use.
  • Backup and restore your applications
  • Backup and restore your entire DOKS cluster.
  • Create scheduled backups for your applications.
  • Create retention policies for your backups.

Table of Contents

Prerequisites

To complete this tutorial, you need the following:

  1. A DO Spaces Bucket and access keys. Save the access and secret keys in a safe place for later use.
  2. A Git client, to clone the Starter Kit repository.
  3. Helm, for managing TrilioVault Operator releases and upgrades.
  4. Doctl, for DigitalOcean API interaction.
  5. Kubectl, for Kubernetes interaction.

Important note:

In order for TrilioVault to work correctly and to backup your PVCs, DOKS needs to be configured to support the Container Storage Interface (or CSI, for short). By default it comes with the driver already installed and configured. You can check using below command:

kubectl get storageclass

The output should look similar to (notice the provisioner is dobs.csi.digitalocean.com):

NAME                         PROVISIONER                 RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
do-block-storage (default)   dobs.csi.digitalocean.com   Delete          Immediate           true                   10d

The TrilioVault installation also needs volumeSnapshot Custom Resource Definition (CRD) for a successful installation. You can check using below command:

kubectl get crd | grep volumesnapshot

The output should look similar to (if not installed, refer to Installing VolumeSnapshot CRDs):

volumesnapshotclasses.snapshot.storage.k8s.io         2022-02-01T06:01:14Z
volumesnapshotcontents.snapshot.storage.k8s.io        2022-02-01T06:01:14Z
volumesnapshots.snapshot.storage.k8s.io               2022-02-01T06:01:15Z

Also make sure that the CRD supports both v1beta1 and v1 API version. You can run the below command to check the API version:

kubectl get crd volumesnapshots.snapshot.storage.k8s.io -o yaml

At the end of the CRD yaml you should see a storedVersions list, containing both v1beta1 and v1 values (if not installed, refer to Installing VolumeSnapshot CRDs):

...
- lastTransitionTime: "2022-01-20T07:58:06Z"
    message: approved in https://github.com/kubernetes-csi/external-snapshotter/pull/419
    reason: ApprovedAnnotation
    status: "True"
    type: KubernetesAPIApprovalPolicyConformant
  storedVersions:
  - v1beta1
  - v1

Step 1 - Installing TrilioVault for Kubernetes

In this step, you will learn how to deploy TrilioVault for DOKS, and manage TVK installations via Helm. Backups data will be stored in the DO Spaces bucket created earlier in the Prerequisites section.

TrilioVault application can be installed many ways:

  • Via the tvk-oneclick krew plugin. It has some interesting features, like: checking Kubernetes cluster prerequisites, post install validations, automatic licensing of the product (using the free basic license), application upgrades management, etc.
  • Via the TrilioVault Operator (installable via Helm). You define a TrilioVaultManager CRD, which tells TrilioVault operator how to handle the installation, post-configuration steps, and future upgrades of the Trilio application components.
  • Fully managed by Helm, via the triliovault-operator chart (covered in this tutorial).

Installing TrilioVault using Helm

Important note:

Starter Kit tutorial is using the Cluster installation type for the TVK application (applicationScope Hem value is set to "Cluster"). All examples from this tutorial rely on this type of installation to function properly.

Please follow the steps below, to install TrilioVault via Helm:

  1. First, clone the Starter Kit Git repository and change directory to your local copy:

    git clone https://github.com/digitalocean/Kubernetes-Starter-Kit-Developers.git
    cd Kubernetes-Starter-Kit-Developers
  2. Next, add the TrilioVault Helm repository, and list the available charts:

    helm repo add triliovault-operator http://charts.k8strilio.net/trilio-stable/k8s-triliovault-operator
    helm repo update triliovault-operator
    helm search repo triliovault-operator

    The output looks similar to the following:

    NAME                                            CHART VERSION   APP VERSION     DESCRIPTION
    triliovault-operator/k8s-triliovault-operator   2.9.2           2.9.2           K8s-TrilioVault-Operator is an operator designe...
    

    Note:

    The chart of interest is triliovault-operator/k8s-triliovault-operator, which will install TrilioVault for Kubernetes on the cluster along with the TrilioVault-Manager. You can run helm show values triliovault-operator/k8s-triliovault-operator, and export to a file to see all the available options.

  3. Then, open and inspect the TrilioVault Helm values file file provided in the Starter kit repository, using an editor of your choice (preferably with YAML lint support). You can use VS Code for example:

    code 05-setup-backup-restore/assets/manifests/triliovault-values-v2.9.2.yaml
  4. Finally, install TrilioVault for Kubernetes using Helm:

    helm install triliovault-operator triliovault-operator/k8s-triliovault-operator \
      --namespace tvk \
      --create-namespace \
      -f 05-setup-backup-restore/assets/manifests/triliovault-values.yaml

    Note: Above command install both TrilioVault Operator and TriloVault Manager (TVM) Custom Resource using the parameters provided in the triliovault-values.yaml. The TVK version is now managed by the tag field in the 05-setup-backup-restore/assets/manifests/triliovault-values.yaml file, so the helm command always have the latest version of TVK. User can update below fields in values.yaml:

    1. installTVK.applicationScope for TVK installation scoped e.g. Cluster or Namespaced
    2. installTVK.ingressConfig.host for TVK UI hostname e.g. tvk-doks.com
    3. installTVK.ComponentConfiguration.ingressController.service.type for service type to access the TVK UI e.g. NodePort or LoadBalancer

Now, please check your TVK deployment:

helm ls -n tvk

The output looks similar to the following (STATUS column should display deployed):

NAME                    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                           APP VERSION
triliovault-manager-tvk tvk             1               2022-06-08 08:30:08.490304959 +0000 UTC deployed        k8s-triliovault-2.9.2           2.9.2      
triliovault-operator    tvk             1               2022-06-08 11:32:55.755395 +0300 EEST   deployed        k8s-triliovault-operator-2.9.2  2.9.2

Next, verify that TrilioVault is up and running:

kubectl get deployments -n tvk

The output looks similar to the following (all deployments pods must be in the Ready state):

NAME                                            READY   UP-TO-DATE   AVAILABLE   AGE
k8s-triliovault-admission-webhook               1/1     1            1           83s
k8s-triliovault-control-plane                   1/1     1            1           83s
k8s-triliovault-exporter                        1/1     1            1           83s
k8s-triliovault-ingress-nginx-controller        1/1     1            1           83s
k8s-triliovault-web                             1/1     1            1           83s
k8s-triliovault-web-backend                     1/1     1            1           83s
triliovault-operator-k8s-triliovault-operator   1/1     1            1           4m22s

If the output looks like above, you installed TVK successfully. Next, you will learn how to check license type and validity, as well as how to renew.

TrilioVault Application Licensing

By default, when installing TVK via Helm, there is no Free Trial license installed automatically. You can always go to the Trilio website and generate a new license for your cluster that suits your needs (for example, you can pick the basic license type that lets you run TrilioVault indefinetly if your cluster capacity doesn't exceed 10 nodes). A free trial license lets you run TVK for one month on unlimited cluster nodes.

Notes:

  • TrilioVault is free of charge for Kubernetes clusters with up to 100000 nodes for DigitalOcean users. They can follow below steps to create a special license available for DO customers only
  • Starter Kit examples relies on a Cluster license type to function properly.

Creating and Checking TVK Application Licensing

Please run below command to create a new license for your cluster (it is managed via the License CRD):

kubectl apply -f 05-setup-backup-restore/assets/manifests/triliovault/tvk_install_license.yaml

Above command will create a job job.batch/tvk-license-digitalocean which will run a pod tvk-license-digitalocean-828rx to pull the license from Trilio License Server and install on the DOKS cluster. After the job is complete, it will be deleted in 60 seconds.

NOTE:

  • If you are downloading a free license from Trilio's website, apply it using below command:
kubectl apply -f <YOUR_LICENSE_FILE_NAME>.yaml -n tvk

Please run below command to see if license is installed and in Active state on your cluster (it is managed via the License CRD):

kubectl get license -n tvk

The output looks similar to (notice the STATUS which should be Active, as well as the license type in the EDITION column and EXPIRATION TIME):

NAME             STATUS   MESSAGE                                   CURRENT NODE COUNT   GRACE PERIOD END TIME   EDITION     CAPACITY   EXPIRATION TIME        MAX NODES
test-license-1   Active   Cluster License Activated successfully.   1                                            FreeTrial   100000     2023-02-25T00:00:00Z   1

The license is managed via a special CRD, namely the License object. You can inspect it by running below command:

kubectl describe license test-license-1 -n tvk 

The output looks similar to (notice the Message and Capacity fields, as well as the Edition):

Name:         test-license-1
Namespace:    tvk
Labels:       <none>
Annotations:  generation: 1
              triliovault.trilio.io/creator: system:serviceaccount:tvk:k8s-triliovault
              triliovault.trilio.io/instance-id: b060660d-4851-482b-8e60-4addd260e1d3
              triliovault.trilio.io/updater:
                [{"username":"system:serviceaccount:tvk:k8s-triliovault","lastUpdatedTimestamp":"2022-02-24T06:38:21.418828262Z"}]
API Version:  triliovault.trilio.io/v1
Kind:         License
Metadata:
  Creation Timestamp:  2022-02-24T06:38:21Z
...
Status:
  Condition:
    Message:           License Key changed
    Timestamp:         2022-02-24T06:38:21Z
    Message:           Cluster License Activated successfully.
    Status:            Active
    Timestamp:         2022-02-24T06:38:21Z
  Current Node Count:  1
  Max Nodes:           1
  Message:             Cluster License Activated successfully.
  Properties:
    Active:                        true
    Capacity:                      100000
    Company:                       TRILIO-KUBERNETES-LICENSE-GEN-DIGITALOCEAN-BASIC
    Creation Timestamp:            2022-02-24T00:00:00Z
    Edition:                       FreeTrial
    Expiration Timestamp:          2023-02-25T00:00:00Z
    Kube UID:                      b060660d-4851-482b-8e60-4addd260e1d3
    License ID:                    TVAULT-5a4b42c6-953c-11ec-8116-0cc47a9fd48e
    Maintenance Expiry Timestamp:  2023-02-25T00:00:00Z
    Number Of Users:               -1
    Purchase Timestamp:            2022-02-24T00:00:00Z
    Scope:                         Cluster
...

The above output will also tell you when the license is going to expire in the Expiration Timestamp field, and the Scope (Cluster based in this case). You can opt for a cluster wide license type, or for a namespace based license. More details can be found on the Trilio Licensing documentation page.

Renewing TVK Application License

To renew the license, you will have to request a new one from the Trilio website by navigating to the licensing page, to replace the old one. After completing the form, you should receive the License YAML manifest, which can be applied to your cluster using kubectl. Below commands assume that TVK is installed in the default tvk namespace (please replace the <> placeholders accordingly, where required):

kubectl apply -f <YOUR_LICENSE_FILE_NAME>.yaml -n tvk

Then, you can check the new license status as you already learned via:

# List available TVK licenses first from the `tvk` namespace
kubectl get license -n tvk

# Get information about a specific license from the `tvk` namespace
kubectl describe license <YOUR_LICENSE_NAME_HERE> -n tvk 

In the next step, you will learn how to define the storage backend for TrilioVault to store backups, called a target.

Step 2 - Creating a TrilioVault Target to Store Backups

TrilioVault needs to know first where to store your backups. TrilioVault refers to the storage backend by using the target term, and it's managed via a special CRD named Target. The following target types are supported: S3 and NFS. For DigitalOcean and the purpose of the Starter Kit, it makes sense to rely on the S3 storage type because it's cheap and scalable. To benefit from an enhanced level of protection you can create multiple target types (for both S3 and NFS), so that your data is kept safe in multiple places, thus achieving backup redundancy.

Typical Target definition looks like below:

apiVersion: triliovault.trilio.io/v1
kind: Target
metadata:
  name: trilio-s3-target
  namespace: tvk
spec:
  type: ObjectStore
  vendor: Other
  enableBrowsing: true
  objectStoreCredentials:
    bucketName: <YOUR_DO_SPACES_BUCKET_NAME_HERE>
    region: <YOUR_DO_SPACES_BUCKET_REGION_HERE>           # e.g.: nyc1
    url: "https://<YOUR_DO_SPACES_BUCKET_ENDPOINT_HERE>"  # e.g.: nyc1.digitaloceanspaces.com
    credentialSecret:
      name: trilio-s3-target
      namespace: tvk
  thresholdCapacity: 10Gi

Explanation for the above configuration:

  • spec.type: Type of target for backup storage (S3 is an object store).
  • spec.vendor: Third party storage vendor hosting the target (for DigitalOcean Spaces you need to use Other instead of AWS).
  • spec.enableBrowsing: Enable browsing for the target.
  • spec.objectStoreCredentials: Defines required credentials (via credentialSecret) to access the S3 storage, as well as other parameters such as bucket region and name.
  • spec.thresholdCapacity: Maximum threshold capacity to store backup data.

To access S3 storage, each target needs to know bucket credentials. A Kubernetes Secret must be created as well:

apiVersion: v1
kind: Secret
metadata:
  name: trilio-s3-target
  namespace: tvk
type: Opaque
stringData:
  accessKey: <YOUR_DO_SPACES_ACCESS_KEY_ID_HERE> # value must be base64 encoded
  secretKey: <YOUR_DO_SPACES_SECRET_KEY_HERE>    # value must be base64 encoded

Notice that the secret name is trilio-s3-target, and it's referenced by the spec.objectStoreCredentials.credentialSecret field of the Target CRD explained earlier. The secret can be in the same namespace where TrilioVault was installed (defaults to tvk), or in another namespace of your choice. Just make sure that you reference the namespace correctly. On the other hand, please make sure to protect the namespace where you store TrilioVault secrets via RBAC, for security reasons.

Steps to create a Target for TrilioVault:

  1. First, change directory where the Starter Kit Git repository was cloned on your local machine:

    cd Kubernetes-Starter-Kit-Developers
  2. Next, create the Kubernetes secret containing your target S3 bucket credentials (please replace the <> placeholders accordingly):

    kubectl create secret generic trilio-s3-target \
      --namespace=tvk \
      --from-literal=accessKey="<YOUR_DO_SPACES_ACCESS_KEY_HERE>" \
      --from-literal=secretKey="<YOUR_DO_SPACES_SECRET_KEY_HERE>"
  3. Then, open and inspect the Target manifest file provided in the Starter Kit repository, using an editor of your choice (preferably with YAML lint support). You can use VS Code for example:

    code 05-setup-backup-restore/assets/manifests/triliovault/triliovault-s3-target.yaml
  4. Now, please replace the <> placeholders accordingly for your DO Spaces Trilio bucket, like: bucketName, region, url and credentialSecret.

  5. Finally, save the manifest file and create the Target object using kubectl:

    kubectl apply -f 05-setup-backup-restore/assets/manifests/triliovault/triliovault-s3-target.yaml

What happens next is, TrilioVault will spawn a worker job named trilio-s3-target-validator responsible with validating your S3 bucket (like availability, permissions, etc.). If the job finishes successfully, the bucket is considered to be healthy or available and the trilio-s3-target-validator job resource is deleted afterwards. If something bad happens, the S3 target validator job is left up and running so that you can inspect the logs and find the possible issue.

Now, please go ahead and check if the Target resource created earlier is healthy:

kubectl get target trilio-s3-target  -n tvk

The output looks similar to (notice the STATUS column value - should be Available, meaning it's in a healthy state):

NAME               TYPE          THRESHOLD CAPACITY   VENDOR   STATUS      BROWSING ENABLED
trilio-s3-target   ObjectStore   10Gi                 Other    Available

If the output looks like above, then you configured the S3 target object successfully.

Hint: In case the target object fails to become healthy, you can inspect the logs from the trilio-s3-target-validator Pod to find the issue:

# First, you need to find the target validator
kubectl get pods -n tvk | grep trilio-s3-target-validator

# Output looks similar to:
#trilio-s3-target-validator-tio99a-6lz4q              1/1     Running     0          104s

# Now, fetch logs data
kubectl logs pod/trilio-s3-target-validator-tio99a-6lz4q -n tvk

The output looks similar to (notice the exception as an example):

...
INFO:root:2021-11-24 09:06:50.595166: waiting for mount operation to complete.
INFO:root:2021-11-24 09:06:52.595772: waiting for mount operation to complete.
ERROR:root:2021-11-24 09:06:54.598541: timeout exceeded, not able to mount within time.
ERROR:root:/triliodata is not a mountpoint. We can't proceed further.
Traceback (most recent call last):
  File "/opt/tvk/datastore-attacher/mount_utility/mount_by_target_crd/mount_datastores.py", line 56, in main
    utilities.mount_datastore(metadata, datastore.get(constants.DATASTORE_TYPE), base_path)
  File "/opt/tvk/datastore-attacher/mount_utility/utilities.py", line 377, in mount_datastore
    mount_s3_datastore(metadata_list, base_path)
  File "/opt/tvk/datastore-attacher/mount_utility/utilities.py", line 306, in mount_s3_datastore
    wait_until_mount(base_path)
  File "/opt/tvk/datastore-attacher/mount_utility/utilities.py", line 328, in wait_until_mount
    base_path))
Exception: /triliodata is not a mountpoint. We can't proceed further.
...

Next, you will discover the TVK web console which is a really nice and useful addition to help you manage backup and restore operations very easy, among many others.

Step 3 - Getting to Know the TVK Web Management Console

While you can manage backup and restore operations from the CLI entirely via kubectl and CRDs, TVK provides a Web Management Console to accomplish the same operations via the GUI. The management console simplifies common tasks via point and click operations, provides better visualization and inspection of TVK cluster objects, as well as to create disaster recovery plans (or DRPs).

The Helm based installation covered in Step 1 - Installing TrilioVault for Kubernetes already took care of installing the required components for the web management console.

Getting Access to the TVK Web Management Console

To be able to access the console and explore the features it offers, you need to port forward the ingress controller service for TVK.

First, you need to identify the ingress-nginx-controller service from the tvk namespace:

kubectl get svc -n tvk

The output looks similar to (search for the k8s-triliovault-ingress-nginx-controller line, and notice that it listens on port 80 in the PORT(S) column):

NAME                                                            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
k8s-triliovault-admission-webhook                               ClusterIP   10.245.202.17    <none>        443/TCP                      13m
k8s-triliovault-ingress-nginx-controller                        NodePort    10.245.192.140   <none>        80:32448/TCP,443:32588/TCP   13m
k8s-triliovault-ingress-nginx-controller-admission              ClusterIP   10.3.20.89       <none>        443/TCP                      13m
k8s-triliovault-web                                             ClusterIP   10.245.214.13    <none>        80/TCP                       13m
k8s-triliovault-web-backend                                     ClusterIP   10.245.10.221    <none>        80/TCP                       13m
triliovault-operator-k8s-triliovault-operator-webhook-service   ClusterIP   10.245.186.59    <none>        443/TCP                      16m

TVK is using an Nginx Ingress Controller to route traffic to the management web console services. Routing is host based, and the host name is tvk-doks.com as defined in the Helm values file from the Starter Kit:

# The host name to use when accessing the web console via the TVK ingress nginx controller
installTVK:
  ingressConfig:
    host: "tvk-doks.com"

Having the above information at hand, please go ahead and edit the /etc/hosts file, and add this entry:

127.0.0.1 tvk-doks.com

Next, create the port forward for the TVK ingress controller service:

kubectl port-forward svc/k8s-triliovault-ingress-nginx-controller 8080:80 -n tvk

Finally export the kubeconfig file for your DOKS cluster. This step is required so that the web console can authenticate you:

# List the available clusters
doctl k8s cluster list

# Save cluster configuration to YAML
doctl kubernetes cluster kubeconfig show <YOUR_CLUSTER_NAME_HERE> > config_<YOUR_CLUSTER_NAME_HERE>.yaml

Hint: If you have only one cluster, the below command can be used:

DOKS_CLUSTER_NAME="$(doctl k8s cluster list --no-header --format Name)"
doctl kubernetes cluster kubeconfig show $DOKS_CLUSTER_NAME > config_${DOKS_CLUSTER_NAME}.yaml

After following the above presented steps, you can access the console in your web browser by navigating to: http://tvk-doks.com:8080. When asked for the kubeconfig file, please select the one that you created in the last command from above.

Note: Please keep the generated kubeconfig file safe because it contains sensitive data.

Exploring the TVK Web Console User Interface

The home page looks similar to:

TVK Home Cluster Dashboard

Go ahead and explore each section from the left, like:

  • Cluster Management: This shows the list of primary cluster and other clusters having TVK instances, added to the primary OVH cluster using Multi-Cluster Management feature.

  • Backup & Recovery: This is the main dashboard which gives you a general overview for whole cluster, like: Discovered namespaces, Applications, Backupplans list, Targets, Hooks, Policies etc.

    • Namespaces:

    TVK Cluster Namespaces

    • Applications:

    TVK Auto-discovered Applications

    • Backupplans:

    TVK Backupplans

    • Targets:

    TVK Target List

    • Scheduling Policy:

    TVK Default Scheduling Policy

    • Retention Policy:

    TVK Default Retention Policy

  • Monitoring: This has two options- TrilioVault Monitoring and Velero Monitoring if user has Velero configured on their OVH cluster.

    • TrilioVault Monitoring: It shows the backup and restore summary of the kubernetes cluster.

    TVK TrilioVault Monitoring Backups and Restores

    • Velero Monitoring:

    TVK Velero Monitoring

  • Disaster Recovery: Allows you to manage and perform disaster recovery operations.

    TVK Disaster Recovery

You can also see the S3 Target created earlier, by navigating to Backup & Recovery -> Targets -> Select the Namespace tvk from the dropdown on the top:

TVK Target List

Going further, you can browse the target and list the available backups by clicking on the Actions button from the right, and then select Launch Browser option from the pop-up menu (for this to work the target must have the enableBrowsing flag set to true):

TVK Target Browser

For more information and available features, please consult the TVK Web Management Console User Interface official documentation.

Next, you will learn how to perform backup and restore operations for specific use cases, like:

  • Specific namespace(s) backup and restore.
  • Whole cluster backup and restore.

Step 4 - Namespaced Backup and Restore Example

In this step, you will learn how to create a one-time backup for an entire namespace from your DOKS cluster and restore it afterwards, making sure that all the resources are re-created. The namespace in question is ambassador. TVK has a neat feature that allows you to perform backups at a higher level than just namespaces, meaning: Helm Releases. You will learn how to accomplish such a task, in the steps to follow.

Next, you will perform the following tasks:

  • Create the ambassador Helm release backup, via BackupPlan and Backup CRDs.
  • Delete the ambassador Helm release.
  • Restore the ambassador Helm release, via Restore CRD.
  • Check the ambassador Helm release resources restoration.

Creating the Ambassador Helm Release Backup

To perform backups for a single application at the namespace level (or Helm release), a BackupPlan followed by a Backup CRD is required. A BackupPlan allows you to:

  • Specify a target where backups should be stored.
  • Define a set of resources to backup (e.g.: namespace or Helm releases).
  • Encryption, if you want to encrypt your backups on the target (this is a very nice feature for securing your backups data).
  • Define schedules for full or incremental type backups.
  • Define retention policies for your backups.

In other words a BackupPlan is a definition of 'what', 'where', 'to' and 'how' of the backup process, but it doesn't perform the actual backup. The Backup CRD is responsible with triggering the actual backup process, as dictated by the BackupPlan spec.

Typical BackupPlan CRD looks like below:

apiVersion: triliovault.trilio.io/v1
kind: BackupPlan
metadata:
  name: ambassador-helm-release-backup-plan
  namespace: ambassador
spec:
  backupConfig:
    target:
      name: trilio-s3-target
      namespace: tvk
  backupPlanComponents:
    helmReleases:
      - ambassador

Explanation for the above configuration:

  • spec.backupConfig.target.name: Tells TVK what target name to use for storing backups.
  • spec.backupConfig.target.namespace: Tells TVK in what namespace the target was created.
  • spec.backupComponents: Defines a list of resources to back up (can be namespaces or Helm releases).

Typical Backup CRD looks like below:

apiVersion: triliovault.trilio.io/v1
kind: Backup
metadata:
  name: ambassador-helm-release-full-backup
  namespace: ambassador
spec:
  type: Full
  backupPlan:
    name: ambassador-helm-release-backup-plan
    namespace: ambassador

Explanation for the above configuration:

  • spec.type: Specifies backup type (e.g. Full or Incremental).
  • spec.backupPlan: Specifies the BackupPlan which this Backup should use.

Steps to initiate the Ambassador Helm release one time backup:

  1. First, make sure that the Ambassador Edge Stack is deployed in your cluster by following the steps from the Ambassador Ingress tutorial.

  2. Next, change directory where the Starter Kit Git repository was cloned on your local machine:

    cd Kubernetes-Starter-Kit-Developers
  3. Then, open and inspect the Ambassador BackupPlan and Backup manifest files provided in the Starter Kit repository, using an editor of your choice (preferably with YAML lint support). You can use VS Code for example:

    code 05-setup-backup-restore/assets/manifests/triliovault/ambassador-helm-release-backup-plan.yaml
    code 05-setup-backup-restore/assets/manifests/triliovault/ambassador-helm-release-backup.yaml
  4. Finally, create the BackupPlan and Backup resources, using kubectl, please note that the BackupPlan needs to be available first so it mai take a minute to create that:

    kubectl apply -f 05-setup-backup-restore/assets/manifests/triliovault/ambassador-helm-release-backup-plan.yaml
    kubectl apply -f 05-setup-backup-restore/assets/manifests/triliovault/ambassador-helm-release-backup.yaml

Now, inspect the BackupPlan status (targeting the ambassador Helm release), using kubectl:

kubectl get backupplan ambassador-helm-release-backup-plan -n ambassador

The output looks similar to (notice the STATUS column value which should be set to Available):

NAME                                  TARGET             ...   STATUS
ambassador-helm-release-backup-plan   trilio-s3-target   ...   Available

Next, check the Backup object status, using kubectl:

kubectl get backup ambassador-helm-release-full-backup -n ambassador

The output looks similar to (notice the STATUS column value which should be set to InProgress, as well as the BACKUP TYPE set to Full):

NAME                                  BACKUPPLAN                            BACKUP TYPE   STATUS       ...
ambassador-helm-release-full-backup   ambassador-helm-release-backup-plan   Full          InProgress   ...                                  

After all the ambassador Helm release components finish uploading to the S3 target, you should get below results:

# Inspect the cluster backup status again for the `ambassador` namespace
kubectl get backup ambassador-helm-release-full-backup -n ambassador

# The output looks similar to (notice that the `STATUS` changed to `Available`, and `PERCENTAGE` is `100`)
NAME                                  BACKUPPLAN                            BACKUP TYPE   STATUS      ...   PERCENTAGE
ambassador-helm-release-full-backup   ambassador-helm-release-backup-plan   Full          Available   ...   100

If the output looks like above, you successfully backed up the ambassador Helm release. You can go ahead and see how TrilioVault stores Kubernetes metadata by listing the TrilioVault S3 Bucket contents. For example, you can use s3cmd:

s3cmd ls s3://trilio-starter-kit --recursive

The output looks similar to (notice that the listing contains the json manifests and UIDs, representing Kubernetes objects):

2021-11-25 07:04           28  s3://trilio-starter-kit/6c68af15-5392-45bb-a70b-b26a93605bd9/
2021-11-25 07:04           28  s3://trilio-starter-kit/6c68af15-5392-45bb-a70b-b26a93605bd9/5ebfffb5-442a-455c-b0de-1db98e18b425/
2021-11-25 07:04          311  s3://trilio-starter-kit/6c68af15-5392-45bb-a70b-b26a93605bd9/5ebfffb5-442a-455c-b0de-1db98e18b425/backup-namespace.json.manifest.00000004
2021-11-25 07:04          302  s3://trilio-starter-kit/6c68af15-5392-45bb-a70b-b26a93605bd9/5ebfffb5-442a-455c-b0de-1db98e18b425/backup.json.manifest.00000004
2021-11-25 07:04          305  s3://trilio-starter-kit/6c68af15-5392-45bb-a70b-b26a93605bd9/5ebfffb5-442a-455c-b0de-1db98e18b425/backupplan.json.manifest.00000004
2021-11-25 07:04           28  s3://trilio-starter-kit/6c68af15-5392-45bb-a70b-b26a93605bd9/5ebfffb5-442a-455c-b0de-1db98e18b425/custom/
2021-11-25 07:04           28  s3://trilio-starter-kit/6c68af15-5392-45bb-a70b-b26a93605bd9/5ebfffb5-442a-455c-b0de-1db98e18b425/custom/metadata-snapshot/
2021-11-25 07:04          330  s3://trilio-starter-kit/6c68af15-5392-45bb-a70b-b26a93605bd9/5ebfffb5-442a-455c-b0de-1db98e18b425/custom/metadata-snapshot/metadata.json.manifest.00000002
...

Hint: In case the backup fails to become available, you can inspect the logs from the metamover Pod to find the issue:

# First, you need to find the metamover pod
kubectl get pods -n ambassador | grep metamover

# Output looks similar to:
ambassador-helm-release-full-backup-metamover-mg9gl0--1-2d6wx   1/1     Running   0          4m32s

# Now, fetch logs data
kubectl logs pod/ambassador-helm-release-full-backup-metamover-mg9gl0--1-2d6wx -n ambassador -f

The output looks similar to (any errors during the backup will be shown here):

...
{"component":"meta-mover","file":"pkg/metamover/snapshot/parser/commons.go:1366","func":"github.com/trilioData/k8s-triliovault/pkg/metamover/snapshot/parser.(*Component).ParseForDataComponents","level":"info","msg":"Parsing data components of resource rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding: [edge-stack]","time":"2022-06-14T06:20:56Z"}
{"component":"meta-mover","file":"pkg/metamover/snapshot/parser/commons.go:1366","func":"github.com/trilioData/k8s-triliovault/pkg/metamover/snapshot/parser.(*Component).ParseForDataComponents","level":"info","msg":"Parsing data components of resource rbac.authorization.k8s.io/v1, Kind=RoleBinding: [edge-stack-agent-config]","time":"2022-06-14T06:20:56Z"}
...

Finally, you can check that the backup is available in the web console as well, by navigating to Resource Management -> ambassador -> Backup Plans (notice that it's in the Available state, and that the ambassador Helm release was backed up in the Component Details sub-view):

Ambassador Helm Release Backup

Deleting the Ambassador Helm Release and Resources

Now, go ahead and simulate a disaster, by intentionally deleting the ambassador Helm release:

helm delete ambassador -n ambassador

Next, check that the namespace resources were deleted (listing should be empty):

kubectl get all -n ambassador

Finally, verify that the echo and quote backend services endpoint is DOWN (please refer to Creating the Ambassador Edge Stack Backend Services), regarding the backend applications used in the Starter Kit tutorial). You can use curl to test (or you can use your web browser):

curl -Li http://quote.starter-kit.online/quote/
curl -Li http://echo.starter-kit.online/echo/

Restoring the Ambassador Helm Release Backup

Important notes:

  • If restoring into the same namespace, ensure that the original application components have been removed. Especially the PVC of application are deleted.
  • If restoring to another cluster (migration scenario), ensure that TrilioVault for Kubernetes is running in the remote namespace/cluster as well. To restore into a new cluster (where the Backup CR does not exist), source.type must be set to location. Please refer to the Custom Resource Definition Restore Section to view a restore by location example.
  • When you delete the ambassador namespace, the load balancer resource associated with the ambassador service will be deleted as well. So, when you restore the ambassador service, the LB will be recreated by DigitalOcean. The issue is that you will get a NEW IP address for your LB, so you will need to adjust the A records for getting traffic into your domains hosted on the cluster.

To restore a specific Backup, you need to create a Restore CRD. Typical Restore CRD looks like below:

apiVersion: triliovault.trilio.io/v1
kind: Restore
metadata:
  name: ambassador-helm-release-restore
  namespace: ambassador
spec:
  source:
    type: Backup
    backup:
      name: ambassador-helm-release-full-backup
      namespace: ambassador
  skipIfAlreadyExists: true

Explanation for the above configuration:

  • spec.source.type: Specifies what backup type to restore from.
  • spec.source.backup: Contains a reference to the backup object to restore from.
  • spec.skipIfAlreadyExists: Specifies whether to skip restore of a resource if it already exists in the namespace restored.

Restore allows you to restore the last successful Backup for an application. It is used to restore a single namespaces or Helm release, protected by the Backup CRD. The Backup CRD is identified by its name: ambassador-helm-release-full-backup.

First, inspect the Restore CRD example from the Starter Kit Git repository:

code 05-setup-backup-restore/assets/manifests/triliovault/ambassador-helm-release-restore.yaml

Then, create the Restore resource using kubectl:

kubectl apply -f 05-setup-backup-restore/assets/manifests/triliovault/ambassador-helm-release-restore.yaml

Finally, inspect the Restore object status:

kubectl get restore ambassador-helm-release-restore -n ambassador

The output looks similar to (notice the STATUS column set to Completed, as well as the PERCENTAGE COMPLETED set to 100):

NAME                              STATUS      DATA SIZE   START TIME             END TIME               PERCENTAGE COMPLETED   DURATION
ambassador-helm-release-restore   Completed   0           2021-11-25T15:06:52Z   2021-11-25T15:07:35Z   100                    43.524191306s

If the output looks like above, then the ambassador Helm release restoration process completed successfully.

Verifying Applications Integrity after Restoration

Check that all the ambassador namespace resources are in place and running:

kubectl get all -n ambassador

The output looks similar to:

NAME                                    READY   STATUS    RESTARTS   AGE
pod/ambassador-5bdc64f9f6-42wzr         1/1     Running   0          9m58s
pod/ambassador-5bdc64f9f6-nrkzd         1/1     Running   0          9m58s
pod/ambassador-agent-bcdd8ccc8-ktmcv    1/1     Running   0          9m58s
pod/ambassador-redis-64b7c668b9-69drs   1/1     Running   0          9m58s

NAME                       TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)                      AGE
service/ambassador         LoadBalancer   10.245.173.90    157.245.23.93   80:30304/TCP,443:30577/TCP   9m59s
service/ambassador-admin   ClusterIP      10.245.217.211   <none>          8877/TCP,8005/TCP            9m59s
service/ambassador-redis   ClusterIP      10.245.77.142    <none>          6379/TCP                     9m59s

NAME                               READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/ambassador         2/2     2            2           9m59s
deployment.apps/ambassador-agent   1/1     1            1           9m59s
deployment.apps/ambassador-redis   1/1     1            1           9m59s

NAME                                          DESIRED   CURRENT   READY   AGE
replicaset.apps/ambassador-5bdc64f9f6         2         2         2       9m59s
replicaset.apps/ambassador-agent-bcdd8ccc8    1         1         1       9m59s
replicaset.apps/ambassador-redis-64b7c668b9   1         1         1       9m59s

Ambassador Hosts:

kubectl get hosts -n ambassador

The output looks similar to (STATE should be Ready, as well as the HOSTNAME column pointing to the fully qualified host name):

NAME         HOSTNAME                   STATE   PHASE COMPLETED   PHASE PENDING   AGE
echo-host    echo.starter-kit.online    Ready                                     11m
quote-host   quote.starter-kit.online   Ready                                     11m

Ambassador Mappings:

kubectl get mappings -n ambassador

The output looks similar to (notice the echo-backend which is mapped to the echo.starter-kit.online host and /echo/ source prefix, same for quote-backend):

NAME                          SOURCE HOST                SOURCE PREFIX                               DEST SERVICE     STATE   REASON
ambassador-devportal                                     /documentation/                             127.0.0.1:8500           
ambassador-devportal-api                                 /openapi/                                   127.0.0.1:8500           
ambassador-devportal-assets                              /documentation/(assets|styles)/(.*)(.css)   127.0.0.1:8500           
ambassador-devportal-demo                                /docs/                                      127.0.0.1:8500           
echo-backend                  echo.starter-kit.online    /echo/                                      echo.backend
quote-backend                 quote.starter-kit.online   /quote/                                     quote.backend

Now, you need to update your DNS A records, because the DigitalOcean load balancer resource was recreated, and it has a new external IP assigned.

Finally, check if the backend applications respond to HTTP requests as well (please refer to Creating the Ambassador Edge Stack Backend Services), regarding the backend applications used in the Starter Kit tutorial):

curl -Li http://quote.starter-kit.online/quote/
curl -Li http://echo.starter-kit.online/echo/

Next step deals with whole cluster backup and restore, thus covering a disaster recovery scenario.

Step 5 - Backup and Restore Whole Cluster Example

In this step, you will simulate a disaster recovery scenario. The whole DOKS cluster will be deleted, and then the important applications restored from a previous backup.

Next, you will perform the following tasks:

  • Create the multi-namespace backup, using a ClusterBackupPlan CRD that targets all important namespaces from your DOKS cluster.
  • Delete the DOKS cluster, using doctl.
  • Re-install TVK and configure the S3 target (you're going to use the same S3 bucket, where your important backups are stored)
  • Restore all the important applications by using the TVK web console.
  • Check the DOKS cluster applications integrity.

Creating the DOKS Cluster Backup

The main idea here is to perform a DOKS cluster backup by including all important namespaces, that hold your essential applications and configurations. Basically, we cannot name it a full cluster backup and restore, but rather a multi-namespace backup and restore operation. In practice this is all that's needed, because everything is namespaced in Kubernetes. You will also learn how to perform a cluster restore operation via location from the target. The same flow applies when you need to perform cluster migration.

Typical ClusterBackupPlan manifest targeting multiple namespaces looks like below:

apiVersion: triliovault.trilio.io/v1
kind: ClusterBackupPlan
metadata:
  name: starter-kit-cluster-backup-plan
  namespace: tvk
spec:
  backupConfig:
    target:
      name: trilio-s3-target
      namespace: tvk
  backupComponents:
    - namespace: ambassador
    - namespace: backend
    - namespace: monitoring

Notice that kube-system (or other DOKS cluster related namespaces) is not included in the list. Usually, those are not required, unless there is a special case requiring some settings to be persisted at that level.

Steps to initiate a backup for all important namespaces in your DOKS cluster:

  1. First, change directory where the Starter Kit Git repository was cloned on your local machine:

    cd Kubernetes-Starter-Kit-Developers
  2. Then, open and inspect the ClusterBackupPlan and ClusterBackup manifest files provided in the Starter Kit repository, using an editor of your choice (preferably with YAML lint support). You can use VS Code for example:

    code 05-setup-backup-restore/assets/manifests/triliovault/starter-kit-cluster-backup-plan.yaml
    code 05-setup-backup-restore/assets/manifests/triliovault/starter-kit-cluster-backup.yaml
  3. Finally, create the ClusterBackupPlan and ClusterBackup resources, using kubectl:

    kubectl apply -f 05-setup-backup-restore/assets/manifests/triliovault/starter-kit-cluster-backup-plan.yaml
    kubectl apply -f 05-setup-backup-restore/assets/manifests/triliovault/starter-kit-cluster-backup.yaml

Now, inspect the ClusterBackupPlan status, using kubectl:

kubectl get clusterbackupplan starter-kit-cluster-backup-plan -n tvk

The output looks similar to (notice the STATUS column value which should be set to Available):

NAME                              TARGET             ...   STATUS
starter-kit-cluster-backup-plan   trilio-s3-target   ...   Available

Next, check the ClusterBackup status, using kubectl:

kubectl get clusterbackup starter-kit-cluster-backup -n tvk

The output looks similar to (notice the STATUS column value which should be set to Available, as well as the PERCENTAGE COMPLETE set to 100):

NAME                        BACKUPPLAN                        BACKUP TYPE   STATUS      ...   PERCENTAGE COMPLETE
starter-kit-cluster-backup  starter-kit-cluster-backup-plan   Full          Available   ...   100                               

If the output looks like above then all your important application namespaces were backed up successfully.

Note:

Please bear in mind that it may take a while for the full cluster backup to finish, depending on how many namespaces and associated resources are involved in the process.

You can also open the web console main dashboard and inspect the multi-namespace backup (notice how all the important namespaces that were backed up are highlighted in green color, in a honeycomb structure):

TVK Multi-Namespace Backup Overview

Re-creating the DOKS Cluster and Restoring Applications

An important aspect to keep in mind is that whenever you destroy a DOKS cluster and then restore it, a new Load Balancer with a new external IP is created as well when TVK restores your ingress controller. So, please make sure to update your DigitalOcean DNS A records accordingly.

Now, delete the whole DOKS cluster (make sure to replace the <> placeholders accordingly):

doctl kubernetes cluster delete <DOKS_CLUSTER_NAME>

Next, re-create the cluster as described in Section 1 - Set up DigitalOcean Kubernetes.

To perform the restore operation, you need to install the TVK application as described in Step 1 - Installing TrilioVault for Kubernetes. Please make sure to use the same Helm Chart version - this is important!

After the installation finishes successfully, configure the TVK target as described in Step 2 - Creating a TrilioVault Target to Store Backups, and point it to the same S3 bucket where your backup data is located. Also, please make sure that target browsing is enabled.

Next, verify and activate a new license as described in the TrilioVault Application Licensing section.

To get access to the web console user interface, please consult Getting Access to the TVK Web Management Console section.

Then, navigate to Resource Management -> TVK Namespace -> Targets (in case of Starter Kit the TVK Namespace is tvk):

TVK Targets List

Going further, browse the target and list the available backups by clicking on the Actions button from the right. Then, select Launch Browser option from the pop-up menu (for this to work the target must have the enableBrowsing flag set to true):

TVK Target Browser

Now, click on the starter-kit-cluster-backup-plan item from the list, and then click and expand the starter-kit-cluster-backup item from the right sub-window:

Multi-Namespace Restore Phase 1

To start the restore process, click on the Restore button. A progress window will be displayed similar to:

Multi-Namespace Restore Phase 2

After a while, if the progress window looks like below, then the multi-namespace restore operation completed successfully:

Multi-Namespace Restore Phase 3

Checking DOKS Cluster Applications State

First, verify all cluster Kubernetes resources (you should have everything in place):

kubectl get all --all-namespaces

Then, make sure that your DNS A records are updated to point to your new load balancer external IP.

Finally, the backend applications should respond to HTTP requests as well (please refer to Creating the Ambassador Edge Stack Backend Services), regarding the backend applications used in the Starter Kit tutorial):

curl -Li http://quote.starter-kit.online/quote/
curl -Li http://echo.starter-kit.online/echo/

In the next step, you will learn how to perform scheduled (or automatic) backups for your DOKS cluster applications.

Step 6 - Scheduled Backups

Taking backups automatically based on a schedule, is a really useful feature to have. It allows you to rewind back time, and restore the system to a previous working state if something goes wrong. This section provides an example for an automatic backup on a 5 minute schedule (the kube-system namespace was picked).

First, you need to create a Policy CRD of type Schedule that defines the backup schedule in cron format (same as Linux cron). Schedule polices can be used for either BackupPlan or ClusterBackupPlan CRDs. Typical schedule policy CRD looks like below (defines a 5 minute schedule):

kind: Policy
apiVersion: triliovault.trilio.io/v1
metadata:
  name: scheduled-backup-every-5min
  namespace: tvk
spec:
  type: Schedule
  scheduleConfig:
    schedule:
      - "*/5 * * * *" # trigger every 5 minutes

Next, you can apply the schedule policy to a ClusterBackupPlan CRD for example, as seen below:

apiVersion: triliovault.trilio.io/v1
kind: ClusterBackupPlan
metadata:
  name: kube-system-ns-backup-plan-5min-schedule
  namespace: tvk
spec:
  backupConfig:
    target:
      name: trilio-s3-target
      namespace: tvk
    schedulePolicy:
      fullBackupPolicy:
        name: scheduled-backup-every-5min
        namespace: tvk
  backupComponents:
    - namespace: kube-system
    - namespace: backend

Looking at the above, you can notice that it's a basic ClusterBackupPlan CRD, referencing the Policy CRD defined earlier via the spec.backupConfig.schedulePolicy field. You can have separate policies created for full or incremental backups, hence the fullBackupPolicy or incrementalBackupPolicy can be specified in the spec.

Now, please go ahead and create the schedule Policy, using the sample manifest provided by the Starter Kit tutorial (make sure to change directory first, where the Starter Kit Git repository was cloned on your local machine):

kubectl apply -f 05-setup-backup-restore/assets/manifests/triliovault/scheduled-backup-every-5min.yaml

Check that the policy resource was created:

kubectl get policies -n tvk

The output looks similar to (notice the POLICY type set to Schedule):

NAME                          POLICY     DEFAULT
scheduled-backup-every-5min   Schedule   false

Finally, create the resources for the kube-system namespace scheduled backups:

# Create the backup plan first for kube-system namespace
kubectl apply -f 05-setup-backup-restore/assets/manifests/triliovault/kube-system-ns-backup-plan-scheduled.yaml

# Create and trigger the scheduled backup for kube-system namespace
kubectl apply -f 05-setup-backup-restore/assets/manifests/triliovault/kube-system-ns-backup-scheduled.yaml

Check the scheduled backup plan status for kube-system:

kubectl get clusterbackupplan kube-system-ns-backup-plan-5min-schedule -n tvk

The output looks similar to (notice the FULL BACKUP POLICY value set to the previously created scheduled-backup-every-5min policy resource, as well as the STATUS which should be Available):

NAME                                       TARGET             ...   FULL BACKUP POLICY            STATUS
kube-system-ns-backup-plan-5min-schedule   trilio-s3-target   ...   scheduled-backup-every-5min   Available

Check the scheduled backup status for kube-system:

kubectl get clusterbackup kube-system-ns-full-backup-scheduled -n tvk

The output looks similar to (notice the BACKUPPLAN value set to the previously created backup plan resource, as well as the STATUS which should be Available):

NAME                                   BACKUPPLAN                                 BACKUP TYPE   STATUS      ...
kube-system-ns-full-backup-scheduled   kube-system-ns-backup-plan-5min-schedule   Full          Available   ...

Now, you can check that backups are performed on a regular interval (5 minutes), by querying the cluster backup resource and inspect the START TIME column (kubectl get clusterbackup -n tvk). It should reflect the 5 minute delta, as highlighted in the picture below:

TVK Every 5 Minute Backups

In the next step, you will learn how to set up a retention policy for your backups.

Step 7 - Backups Retention Policy

The retention policy allows you to define the number of backups to retain and the cadence to delete backups as per compliance requirements. The retention policy CRD provides a simple YAML specification to define the number of backups to retain in terms of days, weeks, months, years, latest etc.

Using Retention Policies

Retention polices can be used for either BackupPlan or ClusterBackupPlan CRDs. Typical Policy manifest for the Retention type looks like below:

apiVersion: triliovault.trilio.io/v1
kind: Policy
metadata:
  name: sample-policy
spec:
  type: Retention
  retentionConfig:
    latest: 2
    weekly: 1
    dayOfWeek: Wednesday
    monthly: 1
    dateOfMonth: 15
    monthOfYear: March
    yearly: 1

Explanation for the above configuration:

  • spec.type: Defines policy type. Can be: Retention or Schedule.
  • spec.retentionConfig: Describes retention configuration, like what interval to use for backups retention and how many.
  • spec.retentionConfig.latest: Maximum number of latest backups to be retained.
  • spec.retentionConfig.weekly: Maximum number of backups to be retained in a week.
  • spec.retentionConfig.dayOfWeek: Day of the week to maintain weekly backups.
  • spec.retentionConfig.monthly: Maximum number of backups to be retained in a month.
  • spec.retentionConfig.dateOfMonth: Date of the month to maintain monthly backups.
  • spec.retentionConfig.monthOfYear: Month of the backup to retain for yearly backups.
  • spec.retentionConfig.yearly: Maximum number of backups to be retained in a year.

The above retention policy translates to:

  • On a weekly basis, keep one backup each Wednesday.
  • On a monthly basis, keep one backup in the 15th day.
  • On a yearly basis, keep one backup every March.
  • Overall, I want to always have the 2 most recent backups available.

The basic flow for creating a retention policy resource goes the same way as for scheduled backups. You need a BackupPlan or a ClusterBackupPlan CRD defined to reference the retention policy, and then have a Backup or ClusterBackup object to trigger the process.

Typical ClusterBackupPlan example configuration that has retention set, looks like below:

apiVersion: triliovault.trilio.io/v1
kind: ClusterBackupPlan
metadata:
  name: kube-system-ns-backup-plan-5min-schedule
  namespace: tvk
spec:
  backupConfig:
    target:
      name: trilio-s3-target
      namespace: tvk
    retentionPolicy:
      fullBackupPolicy:
        name: ambassador-backups-retention-policy
        namespace: tvk
  backupComponents:
    - namespace: kube-system
    - namespace: backend

Notice that it uses a retentionPolicy field to reference the policy in question. Of course, you can have a backup plan that has both types of policies set, so that it is able to perform scheduled backups, as well as to deal with retention strategies.

Using Cleanup Policies

Having so many TVK resources each responsible with various operations like: scheduled backups, retention, etc, it is very probable for things to go wrong at some point in time. It means that some of the previously enumerated operations might fail due to various reasons, like: inaccessible storage, network issues for NFS, etc. So, what happens is that your DOKS cluster will get crowded with many Kubernetes objects in a failed state.

You need a way to garbage collect all those objects in the end and release associated resources, to avoid trouble in the future. Meet the Cleanup Policy CRD:

apiVersion: triliovault.trilio.io/v1
kind: Policy
metadata:
  name: garbage-collect-policy
spec:
  type: Cleanup
  cleanupConfig:
    backupDays: 5

The above cleanup policy must be defined in the TVK install namespace. Then, a cron job is created automatically for you that runs every 30 mins, and deletes failed backups based on the value specified for backupdays within the spec field.

This is a very neat feature that TVK provides to help you deal with this kind of situation.

Conclusion

In this tutorial, you learned how to perform one time, as well as scheduled backups, and to restore everything back. Having scheduled backups in place, is very important as it allows you to revert to a previous snapshot in time, if something goes wrong along the way. You walked through a disaster recovery scenario, as well. Next, backups retention plays an important role as well, because storage is finite and sometimes it can get expensive if too many objects are implied.

All the basic tasks and operations explained in this tutorial, are meant to give you a basic introduction and understanding of what TrilioVault for Kubernetes is capable of. You can learn more about TrilioVault for Kubernetes and other interesting (or useful) topics, by following the links below:

Go to Section 6 - Kubernetes Secrets.