In this tutorial you will learn about Loki, which is a log aggregation
system inspired by Prometheus
.
Loki
uses Promtail
to fetch logs from all Pods
running in your cluster. Then, logs are aggregated
and compressed
, and sent to the configured storage
. Next, you can connect Loki
data source to Grafana
and view the logs.
Main reasons for using Loki:
Horizontally scalable
.High availabilty
.Multi-tenant
mode available.Stores
andindexes
data in a veryefficient
way.Cost
effective.Easy
to operate.LogQL
DSL (offers the same functionality asPromQL
fromPrometheus
, which is a plus if you're already familiar with it).
In the steps to follow, you will learn to:
- Install and configure
Loki
for yourDOKS
cluster. - Configure
Grafana
withLoki
data source to query logs, usingLogQL
. - Configure
Loki
to usepersistent
storage for logs, viaDO Spaces
.
After finishing all the steps, you will have a production ready Loki
setup, that will help you troubleshoot any issues you might have with running applications. Just log in to Grafana
, and make use of LogQL
syntax to search, filter, and perform all kind of queries on application logs, to fetch useful information.
- Introduction
- Prerequisites
- Step 1 - Installing LOKI
- Step 2 - Configure Grafana with Loki
- Step 3 - Using LogQL
- Step 4 - Understanding Promtail
- Step 5 - Setting Persistent Storage for Loki
- Step 6 - Setting Loki Storage Retention
- Conclusion
To complete this tutorial, you will need:
- A DO Spaces bucket for
Loki
storage. Please follow the officialDigitalOcean
tutorial to create one. Make sure that it is set torestrict file listing
for security reasons. - Access keys for DO Spaces (S3-compatible object storage). Please follow the official
DigitalOcean
tutorial, to learn how to manage access keys. Keep theaccess
andsecret
keys somewhere safe for later use. - A Git client, for cloning the
Starter Kit
repository. - Helm, for installing the
Loki
stack chart. - Kubectl, for
Kubernetes
interaction. - A text editor with
YAML
lint support, for example: Visual Studio Code. - Emojivoto Sample App deployed in the cluster. Please follow the steps in its repository README.
In this step, you will learn how to deploy Loki
to your DOKS
cluster, using Helm
.
First, clone the Starter Kit
repository, and then change directory to your local copy:
git clone https://github.com/digitalocean/Kubernetes-Starter-Kit-Developers.git
cd Kubernetes-Starter-Kit-Developers
Next, add the Grafana
Helm repository and list the available charts:
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update grafana
helm search repo grafana
The output looks similar to the following:
NAME CHART VERSION APP VERSION DESCRIPTION
grafana/grafana 6.20.5 8.3.4 The leading tool for querying and visualizing t...
grafana/enterprise-metrics 1.7.2 v1.6.1 Grafana Enterprise Metrics
grafana/fluent-bit 2.3.0 v2.1.0 Uses fluent-bit Loki go plugin for gathering lo...
grafana/loki-stack 2.6.4 v2.4.2 Loki: like Prometheus, but for logs.
...
Notes:
- The chart of interest is
grafana/loki-stack
, which will install standaloneLoki
on the cluster. Please visit the loki-stack page for more details about this chart. - It's good practice in general, to use a specific version for the
Helm
chart. This way, you canversion
it usingGit
, and target if for a specificrelease
. In this tutorial, the Helm chart version2.6.4
is picked forloki-stack
, which maps to application version2.4.2
.
For your convenience, there's a ready to use sample values file provided in the Starter Kit
Git repository (loki-stack-values-v2.6.4.yaml
). Please use your favorite text editor (preferably with YAML
lint support), for inspection. You can use VS Code, for example:
code 04-setup-observability/assets/manifests/loki-stack-values-v2.6.4.yaml
Note:
The above values file, enables Loki
and Promtail
for you, so no other input is required. Prometheus
and Grafana
installation is disabled, because Set up Prometheus Stack took care of it already. Fluent Bit
is not used, so it is disabled by default as well.
Next, install the stack using Helm
. The following command installs version 2.6.4
of grafana/loki-stack
in your cluster, using the Starter Kit
repository values
file (also creates the loki-stack
namespace, if it doesn't exist):
HELM_CHART_VERSION="2.6.4"
helm install loki grafana/loki-stack --version "${HELM_CHART_VERSION}" \
--namespace=loki-stack \
--create-namespace \
-f "04-setup-observability/assets/manifests/loki-stack-values-v${HELM_CHART_VERSION}.yaml"
Finally, check Helm
release status:
helm ls -n loki-stack
The output looks similar to (STATUS
column should display 'deployed'):
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
loki loki-stack 1 2022-06-08 10:00:06.838665 +0300 EEST deployed loki-stack-2.6.4 v2.4.2
Next, inspect all the Kubernetes
resources created for Loki
:
kubectl get all -n loki-stack
You should have resources deployed for Loki
itself (loki-0
) and Promtail (loki-promtail
). The output looks similar to:
NAME READY STATUS RESTARTS AGE
pod/loki-0 1/1 Running 0 44m
pod/loki-promtail-8dskn 1/1 Running 0 44m
pod/loki-promtail-mgb25 1/1 Running 0 44m
pod/loki-promtail-s7cp6 1/1 Running 0 44m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/loki ClusterIP 10.245.195.248 <none> 3100/TCP 44m
service/loki-headless ClusterIP None <none> 3100/TCP 44m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/loki-promtail 3 3 3 3 3 <none> 44m
NAME READY AGE
statefulset.apps/loki 1/1 44m
In the next step, you will configure Grafana
to use the Loki
datasource, and view application logs.
In this step, you will add the Loki
data source to Grafana
. First, you need to expose the Grafana
web interface on your local machine (default credentials: admin/prom-operator
):
kubectl --namespace monitoring port-forward svc/kube-prom-stack-grafana 3000:80
Note:
You should NOT expose Grafana
to public
network (eg. create an ingress mapping or LB
service) with default
login/password.
Next, open a web browser on localhost:3000, and follow below steps:
- Click the
Configuration
gear from the left panel. - Select
Data sources
. - Click the
Add data source
blue button. - Select
Loki
from the list and addLoki
urlhttp://loki.loki-stack:3100
. - Save and test.
If everything goes well, a green label message will appear, saying Data source connected and labels found.
Now, you can access logs from the Explore
tab of Grafana
. Make sure to select Loki
as the data source. Use the Help
button for log search cheat sheet.
In the next step, you'll be introduced to LogQL
, which is PromQL
's brother, but for logs. Some basic features of LogQL
will be presented as well.
In this step, you will learn how to use LogQL
for querying application logs, and make use of the available features
to ease your work.
Loki
comes with its very own language for querying logs, called LogQL
. LogQL
can be considered a distributed grep
with labels
for filtering.
A basic LogQL
query consists of two parts: the log stream selector
and a filter
expression. Due to Loki’s design, all LogQL
queries are required to contain a log stream selector
.
The log stream selector
will reduce the number of log streams to a manageable volume. Depending on how many labels you use to filter down the log streams, it will affect the relative performance of the query’s execution. The filter expression is then used to do a distributed grep over the retrieved log streams.
Next, a practical example is presented, for querying the Ambassador Edge Stack
logs.
First, you need to expose the Grafana
web console on your local machine (default credentials: admin/prom-operator
):
kubectl --namespace monitoring port-forward svc/kube-prom-stack-grafana 3000:80
Next, point your web browser to localhost:3000, and navigate to the Explore
tab from the left panel. Select Loki
from the data source menu, and run this query:
{container="vote-bot", namespace="emojivoto"}
The output looks similar to the following:
Perform another query, but this time filter the results to include only the Error
message:
{container="web-svc",namespace="emojivoto"} |= "Error"
The output looks similar to (notice how the Error
word is being highlighted in the query results panel):
As you can see in the above examples, each query is composed of:
- A
log stream
selector{container="web-svc", namespace="emojivoto"}
, which targets theweb-svc container
from theambassador namespace
. - A
filter
- e.g.:|= "Error"
, which shows only the lines containing thewarning
word.
More complex queries can be created using aggregation
operators. For more details on the topic, and other advanced features, please visit the official LogQL page.
Another feature of Loki
worth mentioning is Labels. Labels
allows you to organize streams. In other words, labels
add metadata
to a log stream, so that the system can distinguish it later. Essentially, they are key-value
pairs that can be anything you want, as long as they have a meaning for the data being tagged.
Loki
indexes data based on labels, allowing more efficient storage. Below picture highlights this feature, in the Log labels
panel:
In the next step, you will discover Promtail
, which is the agent responsible with logs fetching
, and transforming
the data (labeling
, adding
new fields
, dropping
, etc).
In this step, you will learn what Promtail
is, as well as how it works. It is deployed as a DaemonSet
, and it's an important
part of your Loki
stack installation, responsible with fetching all Pods
logs running in your Kubernetes
cluster.
What Promtail
is essentially doing, is to:
Discover
targets thatemit
logs.Attach labels
tolog streams
.Push log streams
to theLoki
instance.
Before Promtail
can ship any data from log files to Loki
, it needs to find out information about its environment. Specifically, this means discovering applications emitting log lines to files that need to be monitored.
Promtail
borrows the same service discovery mechanism from Prometheus
, although it currently only supports Static
and Kubernetes
service discovery. This limitation is due to the fact that Promtail
is deployed as a daemon to every local machine and, as such, does not discover label from other machines. Kubernetes
service discovery fetches required labels from the Kubernetes API
server while Static
usually covers all other use cases.
As with every monitoring agent
, you need to have a way for it to be up all the time. The Loki
stack Helm
deployment already makes this possible via a DaemonSet
, as seen below:
kubectl get ds -n loki-stack
The output looks similar to the following (notice the loki-promtail
line):
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-prom-stack-prometheus-node-exporter 2 2 2 2 2 <none> 7d4h
loki-promtail 2 2 2 2 2 <none> 5h6m
This is great, but how does Promtail discover Kubernetes
pods and assigns labels ?
The scrape_configs
section from the Promtail
main configuration
will show you the details. You can use kubectl
for inspection (notice that the application configuration
is stored using a Kubernetes ConfigMap
):
kubectl get cm loki-stack -n loki-stack -o yaml > loki-promtail-config.yaml
Next, please open the loki-promtail-config.yaml
file using a text editor of your choice (preferably with YAML
support). For example you can use VS Code:
code loki-promtail-config.yaml
Then, look for the scrape_configs
section. The output should be similar to:
...
scrape_configs:
- job_name: kubernetes-pods-name
pipeline_stages:
- docker: {}
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: namespace
...
Promtail
knows how to scrape for logs by using scrape_configs
. Each scrape configuration tells Promtail
how to discover logs
and extract labels
. Next, each scrape configuration contains one or more entries (called jobs
) which are executed for each discovered target. Then, each job
may contain a pipeline
comprised of multiple stages
. The main purpose of each stage
is to transform
your application logs, but it can also drop
(filter) unwanted logs data, if needed. Jobs
can contain relabel_configs
stanzas as well, used to transform labels
.
Explanations for the above configuration:
job_name
: Defines a newjob
, and its associated name.pipeline_stages
: Can be used toadd
orupdate
labels,correct
the timestamp, orre-write
log lines entirely. It usesstages
to accomplish the previously mentioned tasks. The configuration snippet presented above, is using a docker stage, which can extract data based on the standardDocker
format. You can refer to the official documentation to read more about Stages and Pipelines.kubernetes_sd_config
: TellsPromtail
how todiscover logs
coming fromPods
viaKubernetes service discovery
.relabel_configs
: Defines alist
of operations totransform
thelabels
fromdiscovery
into another form. The configuration snippet presented above, is renaming the__meta_kubernetes_namespace
source label provided by the Kubernetes service discovery mechanism, to a more human friendly form -namespace
.
In most of the cases you may not want to fetch logs from all namespaces (and pods, implicitly). This is very useful to avoid high traffic
inside your Kubernetes
cluster caused by Promtail
, as well as to reduce
the amount of data ingested
by Loki
. What Promtail
allows you to do in this case, is to drop
logs from unwanted namespaces
and keep the rest. Another advantage of this approach, is that the volume of data that needs to be indexed
by Loki is reduced
, meaning less
storage used and number of objects
, if using a DO Spaces bucket
for example.
Promtail
allows you to filter logs on a namespace basis, via the drop
stage. You can use Helm
to configure Promtail
for namespace filtering
.
First, open the 04-setup-observability/assets/manifests/loki-stack-values-v2.6.4.yaml
file provided in the Starter Kit
repository, using a text editor of your choice (preferably with YAML
lint support). For example you can use VS Code (please make sure to change directory
where the Starter Kit
repository was cloned first):
code 04-setup-observability/assets/manifests/loki-stack-values-v2.6.4.yaml
Next, please remove the comments surrounding the pipelineStages
section. In the following example, you will configure Promtail
to drop all logs coming from all namespaces
prefixed with kube-
, meaning: kube-node-lease
, kube-public
, kube-system
. The output looks similar to:
promtail:
enabled: true
#
# Enable Promtail service monitoring
# serviceMonitor:
# enabled: true
#
# User defined pipeline stages
pipelineStages:
- docker: {}
- drop:
source: namespace
expression: "kube-.*"
Explanations for the above configuration:
pipelineStages
: TellsHelm
to insert user defined pipeline stages in each job that it creates. By default, theLoki
stackHelm
chart is configuringPromtail
to fetchall
the logs coming from everynamespace
andpod
(you can inspect the ConfigMap template for details).docker
: TellsPromtail
to use a Docker stage. Helps withDocker
logsformatting
.drop
: TellsPromtail
to use a Drop stage. Then, you make use of thesource
field, to drop logs based on anamespace
. Finally,expression
is the regex selector for thesource
field.
Finally, save the values file and apply changes using Helm
upgrade:
HELM_CHART_VERSION="2.6.4"
helm upgrade loki grafana/loki-stack --version "${HELM_CHART_VERSION}" \
--namespace=loki-stack \
-f "04-setup-observability/assets/manifests/loki-stack-values-v${HELM_CHART_VERSION}.yaml"
If the upgrade succeeded and no errors were reported, you can check via LogQL
if logs are still pushed to Loki
from the kube-system
namespace after a while. Please wait a minute or so, and then run below queries. Make sure to adjust
the time window
in Grafana
as well to match the last minute(s) interval (you will want to fetch the most recent data only).
Next (if not already), please create a port forward for the Grafana
web console on your local machine (default credentials: admin/prom-operator
):
kubectl --namespace monitoring port-forward svc/kube-prom-stack-grafana 3000:80
Then, point your web browser to localhost:3000, and navigate to the Explore
tab from the left panel. Select Loki
from the data source menu, and run below queries:
{namespace="kube-system"}
{namespace="kube-public"}
{namespace="kube-node-leases"}
The output window should not return any data for any of the above queries.
Hints:
- You can have as many drop stages as you want by using different regex expressions, or combine them together under a single stage if desired, via the regex
OR
operator (|
). - You can also leverage the power of
ServiceMonitors
and enable Promtailmetrics
collection, as learned in Step 2 - Configure Prometheus and Grafana from thePrometheus
tutorial:-
First, make sure to enable both the Loki ServiceMonitor and Promtail ServiceMonitor from
Prometheus Stack
, followed by aHelm upgrade
. -
Next, enable Promtail Service from
Loki Stack
, followed by aHelm upgrade
. -
Then, you can install the Loki&Promtail Grafana dashboard provided by the community, and inspect the
volume
of dataingested
byLoki
. After a while, you can get a visual feedback of what it means to filter unnecessary application logs, as seen in the picture below (before
fitlering - on theleft
side, andafter
- on theright
side):
-
For more features and in depth explanations, please visit the Promtail official documentation page.
In the next step, you will learn how to set up persistent
storage for Loki
, using DO Spaces
. By default, Helm
deploys Loki
with ephemeral
storage, meaning all your indexed log data
will be lost
if it restarts, or if the DOKS
cluster is recreated.
In this step, you will learn how to enable persistent
storage for Loki
. You're going to use the DO Spaces
bucket created in the Prerequisites section of the tutorial.
By default, Helm
deploys Loki
with ephemeral
storage, via the emptyDir volume. To preserve
indexed log data across Pod restarts, Loki
can be set up to use DO Spaces
instead.
Why use DO Spaces
? Because it scales
very well and it's cheaper
than PVs
, which rely on Block Storage
. On the other hand, you don't have to worry anymore about running out of disk space
, as it happens when relying on the PV
implementation. More than that, you don't have to worry about PV
sizing and doing the extra math.
In terms of performance, S3
is not that good compared to Block Storage
, but it doesn't matter very much in this case because Loki
does a good job in general.
Next, you're going to tell Helm
how to configure Loki persistent storage, via DO Spaces
, as well as setting the correct schema
.
First, change directory where the Starter Kit
repository was cloned:
cd Kubernetes-Starter-Kit-Developers
Next, open the loki-stack-values-v2.6.4.yaml
file provided in the Starter Kit
repository, using a text editor of your choice (preferably with YAML
lint support). You can use VS Code, for example:
code 04-setup-observability/assets/manifests/loki-stack-values-v2.6.4.yaml
Then, please remove the comments surrounding the schema_config
and storage_config
keys. The final Loki
storage setup configuration looks similar to (please replace the <>
placeholders accordingly):
loki:
enabled: true
config:
schema_config:
configs:
- from: "2020-10-24"
store: boltdb-shipper
object_store: aws
schema: v11
index:
prefix: index_
period: 24h
storage_config:
boltdb_shipper:
active_index_directory: /data/loki/boltdb-shipper-active
cache_location: /data/loki/boltdb-shipper-cache
cache_ttl: 24h
shared_store: aws
aws:
bucketnames: <YOUR_DO_SPACES_BUCKET_NAME_HERE>
endpoint: <YOUR_DO_SPACES_BUCKET_ENDPOINT_HERE> # in the following format: <region>.digitaloceanspaces.com
region: <YOUR_DO_SPACES_BUCKET_REGION_HERE> # short region name (e.g.: fra1)
access_key_id: <YOUR_DO_SPACES_ACCESS_KEY_HERE>
secret_access_key: <YOURDO_SPACES_SECRET_KEY_HERE>
s3forcepathstyle: true
Explanation for the above configuration:
schema_config
- defines a storagetype
and a schemaversion
to facilitate migrations (schemas candiffer
betweenLoki
installations, soconsistency
is they key here). In this case,BoltDB Shipper
is specified as the storage implementation and av11
schema version. The24h
period for the index is the default and preferred value, so please don't change it. Please visit Schema Configs for more details.storage_config
- tellsLoki
aboutstorage configuration
details, like settingBoltDB Shipper
parameters. It also informsLoki
about theaws
compatibleS3
storage parameters (bucket
name,credentials
,region
, etc).
Apply settings, using Helm
:
HELM_CHART_VERSION="2.6.4"
helm upgrade loki grafana/loki-stack --version "${HELM_CHART_VERSION}" \
--namespace=loki-stack \
-f "04-setup-observability/assets/manifests/loki-stack-values-v${HELM_CHART_VERSION}.yaml"
Now, check if the main Loki
application pod is up and running (it may take up to 1 minute
or so to start, so please be patient):
kubectl get pods -n loki-stack -l app=loki
The output looks similar to:
NAME READY STATUS RESTARTS AGE
loki-0 1/1 Running 0 13m
Hints:
-
The main application
Pod
is calledloki-0
. You can check the configuration file, using the following command (please note that it containssensitive
information):kubectl exec -it loki-0 -n loki-stack -- /bin/cat /etc/loki/loki.yaml
-
You can also check the logs while waiting. It's also good practice in general to check the application logs, and see if something goes bad or not.
kubectl logs -n loki-stack -l app=loki
If everything goes well, you should see the DO Spaces
bucket containing the index
and chunks
folders (the chunks
folder is called fake
, which is a strange name - this is by design, when not running in multi-tenant
mode).
For more advanced options, and fine tuning the storage
for Loki
, please visit the Loki Storage official documentation page.
Next, you will learn how to set storage retention policies for Loki
.
In this step, you will learn how to set DO Spaces
retention policies. Because you configured DO Spaces
as the default storage backend for Loki
, the same rules apply, as for every S3
compatible storage type.
Retention
is a very important aspect when configuring storage backends in general, because storage is finite
. While S3
storage is not expensive, and is somewhat infinte
(it makes you think like that), it is good practice to have a retention policy set.
Although S3
is very scalable, and you don't have to worry about disk space
issues, it's OK
to have a retention policy
in place. This way, really old data can be deleted if not needed.
S3
compatible storage has its own set of policies and rules for retention. In the S3
terminology, it is called object lifecycle
. You can learn more about the DO Spaces bucket lifecycle options, on the official documentation page.
Hint:
S3CMD
is a really good utility to have in order to inspect how many objects are present, as well as the size of the DO Spaces
bucket used for Loki
retention. S3CMD
also helps you to see if the retention policies set so far are working or not. Please follow the DigitalOcean
guide for installing and setting up s3cmd.
Setting the lifecycle for Loki
storage bucket is achieved via the s3cmd
utility.
Next, you're going to use the loki_do_spaces_lifecycle.xml
configuration file provided in the Starter Kit
Git repository, to configure retention for Loki
bucket. The policy file contents look similar to:
<LifecycleConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Rule>
<ID>Expire old fake data</ID>
<Prefix>fake/</Prefix>
<Status>Enabled</Status>
<Expiration>
<Days>10</Days>
</Expiration>
</Rule>
<Rule>
<ID>Expire old index data</ID>
<Prefix>index/</Prefix>
<Status>Enabled</Status>
<Expiration>
<Days>10</Days>
</Expiration>
</Rule>
</LifecycleConfiguration>
The above lifecycle
configuration, will automatically delete
after 10 days
, all the objects from the fake/
and index/
paths in the Loki
storage bucket. A 10 days
lifespan is chosen in this example, because it's usually enough for development purposes. For production
or other critical systems, a period of 30 days
and even more is recommended.
How to configure the Loki
bucket lifecycle, using s3cmd
:
-
First, change directory where the
Starter Kit
repository was cloned.cd Kubernetes-Starter-Kit-Developers
-
Next, open and inspect the
04-setup-observability/assets/manifests/loki_do_spaces_lifecycle.xml
file from theStarter Kit
repository, using a text editor of your choice (preferably withXML
lint support), and adjust according to your needs. -
Then, set the
lifecycle
policy (please replace the<>
placeholders accordingly):s3cmd setlifecycle 04-setup-observability/assets/manifests/loki_do_spaces_lifecycle.xml s3://<LOKI_STORAGE_BUCKET_NAME>
-
Finally, check that the
policy
was set (please replace the<>
placeholders accordingly):s3cmd getlifecycle s3://<LOKI_STORAGE_BUCKET_NAME>
After finishing the above steps, you can inspect
the bucket size
, and number
of objects via the du
subcommand of s3cmd
(the name is borrowed from the Linux Disk Usage
utility). Please replace the <>
placeholders accordingly:
s3cmd du -H s3://<LOKI_DO_SPACES_BUCKET_NAME>
The output looks similar to the following (notice that it prints the bucket size - 19M
, and number of objects present - 2799
):
19M 2799 objects s3://loki-storage-test/
Next, the DO Spaces
backend implementation will clean the objects for you automatically
, based on the expiration date. You can always go back and edit the policy if needed later on, by uploading a new one.
In this tutorial, you learned how to install Loki
for logs monitoring in your DOKS
cluster. Then, you configured Grafana
to use Loki
as a data source. You also learned about LogQL
for querying logs, and how to setup persistent storage
and retention for Loki
.
For more details about Loki
and other available features, please visit the Loki official documentation page.
Next, go to Exporting Kuberentes Events.