Skip to content

tigera-cs/osmc_monitoring

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

Monitoring Calico Components - DEMO

Table of Contents

Thank you!

Thank you for attending the OSMC in Nuremberg and joining the presentation on "Monitoring Calico Components" by Tigera.

We hope you enjoyed the presentation! Feel free to download the slides from here.

We would also appreciate your feedback about the presentation or Project Calico. Please leave your feedback here. Your input is valuable to us!


Overview

In this demo, we will deploy Calico Open Source using Helm and monitor core Calico components: Typha and Felix. We’ll deploy Prometheus, Grafana, and Grafana dashboards to observe key metrics and establish baselines.

Next, we’ll upgrade Calico with intentionally low limits for Typha and run a script to stress Calico components. Finally, we’ll examine how the metrics change in the Grafana dashboard.


Before you begin...

IMPORTANT

  • This demo is disruptive and should not be run in any production cluster.
  • DO NOT use the Typha limits shown in this demo in any cluster other than for this demo.
  • The Grafana dashboards provided are not maintained by Tigera. They are for demonstration purposes only. Feel free to adapt them or use the metrics as needed.

About Calico Felix and Typha

  • Felix is the primary Calico agent that runs on every node, enforcing network policy and managing routes.
  • Typha is an optional component that enhances scalability by reducing datastore traffic between Calico nodes.

Both Felix and Typha can provide metrics for Prometheus. For more details, see the official documentation: Typha | Felix.


Demo

1. Set up a test cluster and deploy Calico Open Source using Helm, following this guide. Using Helm is essential, as we will later modify Typha resource limits via Helm values.

2. Implement basic monitoring of Calico with Prometheus by following this tutorial:

  • Enable Calico metrics reporting.
  • Create the necessary namespace and service account for Prometheus.
  • Deploy and configure Prometheus.
  • View metrics in the Prometheus dashboard and create a simple graph.

3. Set up Calico metrics dashboards in Grafana by following this tutorial.

4. Import the additional Grafana dashboards using the .json files from this repo, by following these instructions.

5. Open the OSMC - Monitoring Calico Components - Live dashboard in Grafana. Set the timeframe to 15 minutes and the Auto refresh to 5 seconds, as shown below:

dashboard_1

6. Wait 5 to 10 minutes and take note of baselines of the metrics in the dashboard.

7. Upgrade Calico Open Source with very low Typha & Felix resource limits, using this Helm values_osmc.yaml file:

installation:
    enabled: true
    kubeletVolumePluginPath: "None"
    typhaDeployment:
      spec:
        template:
          spec:
            containers:
            - name: calico-typha
              resources:
                limits:
                  cpu: 1m
    calicoNodeDaemonSet:
          spec:
            template:
              spec:
                containers:
                - name: calico-node
                  resources:
                    limits:
                      cpu: 200m

Upgrade Calico with this command:

helm upgrade calico projectcalico/tigera-operator --values values_osmc.yaml -n tigera-operator

(Optional) Adjust these limits if the upgrade fails.

8. Wait 5–10 minutes and observe the new baseline metrics in the dashboard. See also how the upgrade affects metrics related with Typha breadcrumbs.

9. Deploy a test-deployment which we'll use to stress Calico components:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-deploy
spec:
  replicas: 2
  selector:
    matchLabels:
      app: test-deploy
  template:
    metadata:
      labels:
        app: test-deploy
    spec:
      containers:
      - name: test-container
        image: wbitt/network-multitool
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: "k8s-app"
                operator: In
                values:
                - calico-typha
            namespaceSelector:
              matchLabels:
                name: calico-system
            topologyKey: "kubernetes.io/hostname"

NOTE: Esure that calico-typha and test-deploy pods are not scheduled on the same nodes.

10. Wait 5–10 minutes, observe the new baseline metrics, and note any changes.

11. Stress Calico Typha by scaling up and down the test-deploy deployment, using this script:

while true; do kubectl scale deployment test-deploy --replicas=10; sleep 2; kubectl scale deployment test-deploy --replicas=1;sleep 5; done

12. Monitor the Grafana dashboard. Note significant metric changes and watch for a drop in the number of active Calico nodes.

13. Stop the script and cleanup the cluster.

Congratulations! You have completed 'Monitoring Calico Components' demo! Don’t forget to leave your feedback here.

About

OSMC Nuremberg 2024 - Monitoring Calico Components

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published