A curated list of Site Reliability and Production Engineering Tools
-
Updated
Jun 11, 2024
A curated list of Site Reliability and Production Engineering Tools
🦥 Easy and simple Prometheus SLO (service level objectives) generator
Manage application's SLI and SLO's easily with the application lifecycle inside a Kubernetes cluster
Slo-exporter computes standardized SLI and SLO metrics based on events coming from various data sources.
Calculate how much downtime should be permitted in your Service Level Agreement or Objective
Calculate SLI/SLO metrics from ZMON's timeseries data
Calculate the tolerable downtime of your service
SLOs, Error windows and alerts are complicated. Here an attempt to make it easy
CoCo: Coordinated Container Scheduling with Last-Level Cache and Memory Bandwidth Partitioning
SLOpPy - SLO demonstration on a simple Python API, that will design pizza toppings based on Wikipedia articles
A framework for building the home of your microservices.
This GitHub repository contains a comprehensive tutorial on Site Reliability Engineering (SRE), covering topics such as SLAs, SLOs, SLIs, Chaos Engineering, monitoring, alerting, and much more. It also includes a bonus content on SRE best practices. Follow along with the #100daysofSRE challenge and improve your reliability engineering skills.
Add a description, image, and links to the service-level-objective topic page so that developers can more easily learn about it.
To associate your repository with the service-level-objective topic, visit your repo's landing page and select "manage topics."