Skip to content

Commit d4b917c

Browse files
Merge pull request #2 from anunarapureddy/main
Section 5: Sampling content
2 parents 899d294 + e241490 commit d4b917c

File tree

4 files changed

+192
-27
lines changed

4 files changed

+192
-27
lines changed

05-sampling.md

Lines changed: 18 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -8,15 +8,6 @@ This tutorial step covers the basic usage of the OpenTelemetry Collector on Kube
88

99
[excalidraw](https://excalidraw.com/#json=15BrdSOMEkc9RA5cxeqwz,urTmfk01mbx7V-PpQI7KgA)
1010

11-
### OpenTelemetry Collector on k8s
12-
13-
After installing the OpenTelemetry Operator, the `v1alpha1.OpenTelemetryCollector` simplifies the operation of the OpenTelemetry Collector on Kubernetes. There are different deployment modes available, breaking config changes are migrated automatically, provides integration with Prometheus (including operating on Prometheus Operator CRs) and simplifies sidecar injection.
14-
15-
TODO: update collector
16-
```yaml
17-
18-
```
19-
2011
## Sampling, what does it mean and why is it important?
2112

2213
Sampling refers to the practice of selectively capturing and recording traces of requests flowing through a distributed system, rather than capturing every single request. It is crucial in distributed tracing systems because modern distributed applications often generate a massive volume of requests and transactions, which can overwhelm the tracing infrastructure or lead to excessive storage costs if every request is
@@ -64,9 +55,16 @@ https://opentelemetry.io/docs/languages/sdk-configuration/general/#otel_traces_s
6455

6556
Tail sampling is where the decision to sample a trace takes place by considering all or most of the spans within the trace. Tail Sampling gives you the option to sample your traces based on specific criteria derived from different parts of a trace, which isn’t an option with Head Sampling.
6657

67-
Usecase: Sample 100% of the traces that have an error-ing span in them.
58+
Deploy the opentelemetry collector with `tail_sampling` enabled.
59+
60+
```shell
61+
kubectl apply -f https://raw.githubusercontent.com/pavolloffay/kubecon-eu-2024-opentelemetry-kubernetes-tracing-tutorial/backend/05-tail-sampling-collector.yaml
62+
kubectl get pods -n observability-backend -w
63+
```
6864

6965
```yaml
66+
# Sample 100% of traces with ERROR-ing spans (omit traces with all OK spans)
67+
# and traces which have a duration longer than 500ms
7068
processors:
7169
tail_sampling:
7270
decision_wait: 10s # time to wait before making a sampling decision is made
@@ -87,11 +85,8 @@ Usecase: Sample 100% of the traces that have an error-ing span in them.
8785
]
8886
```
8987

90-
Applying this chart will start a new collector with the tailsampling processor
88+
<TODO: Add screenshot>
9189

92-
```shell
93-
kubectl apply -f https://raw.githubusercontent.com/pavolloffay/kubecon-eu-2024-opentelemetry-kubernetes-tracing-tutorial/backend/03-tail-sampling-config.yaml
94-
```
9590

9691
-----
9792
### Advanced Topic: Sampling at scale with OpenTelemetry
@@ -102,4 +97,13 @@ Requires two deployments of the Collector, the first layer routing all the spans
10297

10398
[excalidraw](https://excalidraw.com/#room=6a15d65ba4615c535a40,xcZD6DG977owHRoxpYY4Ag)
10499

100+
Apply the YAML below to deploy a layer of Collectors containing the load-balancing exporter in front of collectors performing tail-sampling:
101+
102+
```shell
103+
kubectl apply -f https://raw.githubusercontent.com/pavolloffay/kubecon-eu-2024-opentelemetry-kubernetes-tracing-tutorial/backend/05-scale-otel-collectors.yaml
104+
kubectl get pods -n observability-backend -w
105+
```
106+
107+
<TODO: Add screenshot>
108+
105109
[Next steps](./06-RED-metrics.md)
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
apiVersion: opentelemetry.io/v1alpha1
2+
kind: Instrumentation
3+
metadata:
4+
name: my-instrumentation
5+
namespace: tutorial-application
6+
spec:
7+
exporter:
8+
endpoint: http://otel-collector.observability-backend.svc.cluster.local:4317
9+
propagators:
10+
- tracecontext
11+
- baggage
12+
- b3
13+
sampler:
14+
type: parentbased_traceidratio
15+
argument: "0.5"
16+
resource:
17+
addK8sUIDAttributes: false
18+
python:
19+
env:
20+
# Required if endpoint is set to 4317.
21+
# Python autoinstrumentation uses http/proto by default
22+
# so data must be sent to 4318 instead of 4317.
23+
- name: OTEL_EXPORTER_OTLP_ENDPOINT
24+
value: http://otel-collector.observability-backend.svc.cluster.local:4318
25+
java:
26+
env:
27+
- name: OTEL_LOGS_EXPORTER
28+
value: otlp

backend/05-scale-otel-collectors.yaml

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
apiVersion: opentelemetry.io/v1alpha1
2+
kind: OpenTelemetryCollector
3+
metadata:
4+
name: otel
5+
namespace: observability-backend
6+
spec:
7+
image: ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib:0.94.0
8+
mode: deployment
9+
replicas: 1
10+
ports:
11+
- port: 8888
12+
protocol: TCP
13+
name: metrics
14+
config: |
15+
receivers:
16+
otlp:
17+
protocols:
18+
grpc:
19+
endpoint: 0.0.0.0:4317
20+
http:
21+
endpoint: 0.0.0.0:4318
22+
23+
processors:
24+
# Sample 100% of traces with ERROR-ing spans (omit traces with all OK spans)
25+
# and traces which have a duration longer than 500ms
26+
tail_sampling:
27+
decision_wait: 10s # time to wait before making a sampling decision is made
28+
num_traces: 100 # number of traces to be kept in memory
29+
expected_new_traces_per_sec: 10 # expected rate of new traces per second
30+
policies:
31+
- name: keep-errors
32+
type: status_code
33+
status_code: {status_codes: [ERROR]}
34+
- name: keep-slow-traces
35+
type: latency
36+
latency: {threshold_ms: 500}
37+
38+
exporters:
39+
debug:
40+
verbosity: detailed
41+
loadbalancing:
42+
protocol:
43+
otlp:
44+
timeout: 1s
45+
tls:
46+
insecure: true
47+
resolver:
48+
k8s:
49+
service: otel-gateway.observability-backend
50+
ports:
51+
- 4317
52+
53+
otlphttp/metrics:
54+
endpoint: http://prometheus.observability-backend.svc.cluster.local:80/api/v1/otlp/
55+
tls:
56+
insecure: true
57+
58+
debug:
59+
verbosity: detailed
60+
61+
service:
62+
pipelines:
63+
traces:
64+
receivers: [otlp]
65+
processors: [tail_sampling]
66+
exporters: [otlp/traces]
67+
metrics:
68+
receivers: [otlp]
69+
exporters: [otlphttp/metrics]
70+
logs:
71+
receivers: [otlp]
72+
exporters: [debug]
73+
---
74+
apiVersion: opentelemetry.io/v1alpha1
75+
kind: OpenTelemetryCollector
76+
metadata:
77+
name: otel-gateway
78+
namespace: observability-backend
79+
spec:
80+
image: ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-contrib:0.94.0
81+
mode: deployment
82+
replicas: 2
83+
ports:
84+
- port: 8888
85+
protocol: TCP
86+
name: metrics
87+
config: |
88+
receivers:
89+
otlp:
90+
protocols:
91+
grpc:
92+
endpoint: 0.0.0.0:4317
93+
http:
94+
endpoint: 0.0.0.0:4318
95+
96+
processors:
97+
# Sample 100% of traces with ERROR-ing spans (omit traces with all OK spans)
98+
# and traces which have a duration longer than 500ms
99+
tail_sampling:
100+
decision_wait: 10s # time to wait before making a sampling decision is made
101+
num_traces: 100 # number of traces to be kept in memory
102+
expected_new_traces_per_sec: 10 # expected rate of new traces per second
103+
policies:
104+
- name: keep-errors
105+
type: status_code
106+
status_code: {status_codes: [ERROR]}
107+
- name: keep-slow-traces
108+
type: latency
109+
latency: {threshold_ms: 500}
110+
111+
exporters:
112+
otlp/traces:
113+
endpoint: jaeger-collector:4317
114+
tls:
115+
insecure: true
116+
117+
otlphttp/metrics:
118+
endpoint: http://prometheus.observability-backend.svc.cluster.local:80/api/v1/otlp/
119+
tls:
120+
insecure: true
121+
122+
debug:
123+
verbosity: detailed
124+
125+
service:
126+
pipelines:
127+
traces:
128+
receivers: [otlp]
129+
processors: [tail_sampling]
130+
exporters: [otlp/traces]
131+
metrics:
132+
receivers: [otlp]
133+
exporters: [otlphttp/metrics]
134+
logs:
135+
receivers: [otlp]
136+
exporters: [debug]
137+
---

backend/03-tail-sampling-config.yaml renamed to backend/05-tail-sampling-collector copy 2.yaml

Lines changed: 9 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -20,24 +20,20 @@ spec:
2020
http:
2121
endpoint: 0.0.0.0:4318
2222
23-
processors:
23+
processors:
24+
# Sample 100% of traces with ERROR-ing spans (omit traces with all OK spans)
25+
# and traces which have a duration longer than 500ms
2426
tail_sampling:
2527
decision_wait: 10s # time to wait before making a sampling decision is made
2628
num_traces: 100 # number of traces to be kept in memory
2729
expected_new_traces_per_sec: 10 # expected rate of new traces per second
2830
policies:
29-
[
30-
{
31-
name: keep-errors,
32-
type: status_code,
33-
status_code: {status_codes: [ERROR]}
34-
},
35-
{
36-
name: keep-slow-traces,
37-
type: latency,
38-
latency: {threshold_ms: 500}
39-
}
40-
]
31+
- name: keep-errors
32+
type: status_code
33+
status_code: {status_codes: [ERROR]}
34+
- name: keep-slow-traces
35+
type: latency
36+
latency: {threshold_ms: 500}
4137
4238
exporters:
4339
otlp/traces:

0 commit comments

Comments
 (0)