Warning
This documentation and stacks are currently a work-in-progress.
A overview of Observability, Telemetry and Monitoring platform.
This is the architecture overview of the whole system working together.
We are using the Grafana Labs’ opinionated observability stack which includes: Loki-for logs, Grafana - for dashboards and visualization, Tempo - for traces, and Mimir - for metrics.
These are the components that will be instrumented to gather Metrics, Logs and Traces.
A proxy service for accessing Docker Engine API via Docket socket.
Prometheus
Prometheus can be deploy on the Docker Swarm’s manager nodes directly. But if you doesn’t have access to the manager node or wish to deploy the service on the worker nodes instead to help reduce strain on the manager node, you can configure Prometheus to use the “dockerswarm_sd_server”.
Promtail
By design, Promtail requires access to the Docker Engine API to perform Service Discovery and fetch container logs via file in the /var/lib/docker/containers
directory on each nodes in the cluster.
As Promtail is deployed globally (daemonset in Kubernetes terms) across all nodes both manager and workers. Accessing to Docker Swarm specific API only possible on the Swarm Manager node only.
The dockerswarm_sd_server
provide a simple proxy to the Docker Engine API (with limited capabilities) by running an agent on one (or more) on Docker Swarm’s manager.
This allows the worker nodes to perform Service Discovery and allow Promtail to discover and collect logs for each of the nodes.
This diagram show what the labels used for adding Kubernetes Compatible Labels.
For advance relabeling check the “Prometheus/Promtail's Kubernetes compatible labels” documents.
There are the labels that can be use to create Kubernetes compatible labels based on the deployment of the containers, either via Docker Swarm mode or via Docker Compose.
Docker Swarm
prometheus | promtail | dockerswarm_tasks | dockerswarm_cadvisor |
---|---|---|---|
cluster | cluster | ||
replica | replica | ||
instance | host | __meta_dockerswarm_node_hostname | |
job | job | __meta_dockerswarm_service_label_com_docker_stack_namespace + __meta_dockerswarm_service_name(*2) | container_label_com_docker_stack_namespace + container_label_com_docker_swarm_service_name(*2) |
namespace | namespace | __meta_dockerswarm_service_label_com_docker_stack_namespace | container_label_com_docker_stack_namespace |
deployment | deployment | __meta_dockerswarm_service_label_com_docker_stack_namespace | container_label_com_docker_stack_namespace |
pod | pod | __meta_dockerswarm_service_name | container_label_com_docker_swarm_service_name |
container | container | __meta_dockerswarm_service_name + __meta_dockerswarm_task_slot + __meta_dockerswarm_task_id |
name |
Docker and Docker Compose
prometheus | promtail | docker | docker_cadvisor |
---|---|---|---|
cluster | cluster | ||
replica | replica | ||
instance | host | ||
job | job | __meta_docker_container_label_com_docker_compose_project + __meta_docker_container_label_com_docker_compose_service | container_label_com_docker_compose_project + container_label_com_docker_compose_service |
namespace | namespace | __meta_docker_container_label_com_docker_compose_project | container_label_com_docker_compose_project |
deployment | deployment | __meta_docker_container_label_com_docker_compose_project | container_label_com_docker_compose_project |
pod | pod | __meta_docker_container_label_com_docker_compose_project + __meta_docker_container_label_com_docker_compose_service | container_label_com_docker_compose_project + container_label_com_docker_compose_service |
container | container | *__meta_docker_container_name(1) | name |
See scrape_configs: A collections of Prometheus/Promtail's scrape_configs.
The agent collector deployment pattern consists of applications — instrumented with an OpenTelemetry Instrumentation using OpenTelemetry protocol (OTLP) — or other collectors (using the OTLP exporter) that send telemetry signals to a collector instance running with the application or on the same host as the application (such as a sidecar or a daemonset).
By convention, job
and instance
labels distinguish targets and are expected to be present on metrics exposed on a Prometheus pull exporter (a “federated” Prometheus endpoint) or pushed via Prometheus remote-write.
In OTLP, the service.name
, service.namespace
, and service.instance.id
triplet is required to be unique, which makes them good candidates to use to construct job
and instance
. In the collector Prometheus exporters, the service.name
and service.namespace
attributes MUST be combined as <service.namespace>/<service.name>
, or <service.name>
if namespace
is empty, to form the job
metric label. The service.instance.id
attribute, if present, MUST be converted to the instance
label; otherwise, instance
should be added with an empty value.
Docker Stack | OpenTelemetry |
---|---|
{{.Service.Name}} |
service.name |
{{.Service.Labels["com.docker.stack.namespace"]}} |
service.namespace |
{{.Task.ID}} |
service.instance.id |
Note Application running via Docker Compose require manual labeling via Environment Variables
Currently the design is primarily focus on container running in Swarm mode.
But we can configure Prometheus/Promtail to scrape metrics and logs from generic containers or containers running via Docker Compose as well.