Skip to content

Commit 4159431

Browse files
authored
Merge pull request #50 from bsc-dom/telemetry-doc
Telemetry doc
2 parents 3417e3c + d7a0539 commit 4159431

File tree

22 files changed

+425
-205
lines changed

22 files changed

+425
-205
lines changed

docs/index.rst

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,6 @@ dataClay
1010
main-concepts
1111
alien-objects
1212
advanced-usage
13-
telemetry
14-
hpc-tracing
1513
examples/index
1614

1715
.. toctree::
@@ -25,12 +23,22 @@ dataClay
2523
deployment/hpc-manual-deployment
2624
deployment/compile-redis
2725

26+
.. toctree::
27+
:hidden:
28+
:caption: Telemetry
29+
30+
telemetry/configuration
31+
telemetry/offline
32+
telemetry/real-time
33+
telemetry/prometheus
34+
telemetry/hpc-tracing
35+
2836
.. toctree::
2937
:hidden:
3038
:caption: Release Notes
3139

32-
releasenotes/3-x
3340
releasenotes/4-x
41+
releasenotes/3-x
3442

3543
.. toctree::
3644
:hidden:

docs/telemetry.rst

Lines changed: 0 additions & 131 deletions
This file was deleted.

docs/telemetry/configuration.rst

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
Telemetry Configuration
2+
=======================
3+
4+
dataClay is instrumented with `OpenTelemetry <https://opentelemetry.io/>`_ to allow observability of
5+
distributed traces, metrics, and logs. You can configure tracing to export telemetry data either in real-time or for post-mortem analysis. Visualizations can be performed in Grafana.
6+
7+
Configuration
8+
-------------
9+
10+
To activate tracing in dataClay, the following environment variables need to be set:
11+
12+
- **`DATACLAY_TRACING`**: Set to `true` to enable tracing.
13+
- **`DATACLAY_TRACING_EXPORTER`**: Export traces to the OpenTelemetry Collector (`otlp`) or print traces to the console (`console`). The default is `otlp`.
14+
- **`DATACLAY_TRACING_HOST`**: Host of the OpenTelemetry Collector (default: `localhost`).
15+
- **`DATACLAY_TRACING_PORT`**: Port of the OpenTelemetry Collector (default: `4317`).
16+
- **`DATACLAY_SERVICE_NAME`**: The service name, which identifies dataClay components in trace data.
17+
18+
Metrics
19+
-------
20+
21+
.. list-table::
22+
:header-rows: 1
23+
24+
* - Metric
25+
- Description
26+
- Service
27+
* - dataclay_inmemory_objects
28+
- Number of objects in memory
29+
- backend, client
30+
* - dataclay_loaded_objects
31+
- Number of loaded objects
32+
- backend
33+
* - dataclay_stored_objects
34+
- Number of stored objects
35+
- backend
36+
* - dataclay_inmemory_misses_total
37+
- Number of inmemory misses
38+
- backend, client
39+
* - dataclay_inmemory_hits_total
40+
- Number of inmemory hits
41+
- backend, client
Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,9 @@
1-
===========
21
HPC Tracing
32
===========
43

5-
How to generate paraver traces in MN5
6-
=====================================
4+
How to generate paraver traces in MN5 using COMPSs
5+
--------------------------------------------------
76

8-
Using COMPSs
9-
------------
107
In order to get the traces we will create a script.
118

129
- First we have to import the COMPSs and DataClay modules in order to be able to use them, as well as defining which python version we will be using:
@@ -57,7 +54,8 @@ In order to generate the paraver files, we will call another COMPSs script, "com
5754
If we run this script in the same directory where we found the traces ($HOME/.COMPSs/[SLURM_JOB_ID]/trace/), the paraver files will appear.
5855

5956
How to inspect the traces in Paraver
60-
====================================
57+
------------------------------------
58+
6159
To be able to see these files we will have to open them using the following commands:
6260

6361
.. code-block:: bash

docs/telemetry/offline.rst

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
2+
Offline Telemetry Example
3+
=========================
4+
5+
This example demonstrates exporting OpenTelemetry traces to a JSON file for post-mortem analysis in Grafana.
6+
7+
1. **Activate tracing** by setting environment variables as described in the `telemetry configuration <https://dataclay.bsc.es/docs/telemetry/configuration>`_.
8+
2. **Generate traces**:
9+
10+
- Navigate to the `json-exporter` folder in the `offline telemetry example JSON exporter <https://github.com/bsc-dom/dataclay/tree/telemetry-doc/examples/telemetry/offline/json-exporter>`_.
11+
- Start dataClay and OpenTelemetry Collector services:
12+
13+
.. code-block:: bash
14+
15+
docker compose up
16+
17+
- Run the dataClay client:
18+
19+
.. code-block:: bash
20+
21+
python3 client.py
22+
23+
- Traces are exported to the `traces` folder. You can visualize the JSON traces in Grafana.
24+
25+
3. **Visualize in Grafana**:
26+
27+
- Navigate to the `json-post-mortem` folder in the `offline telemetry example post-mortem <https://github.com/bsc-dom/dataclay/tree/telemetry-doc/examples/telemetry/offline/json-post-mortem>`_.
28+
- Start the OpenTelemetry Collector, Tempo, and Grafana services:
29+
30+
.. code-block:: bash
31+
32+
docker compose up
33+
34+
- Open Grafana at <http://localhost:3000> (default username/password: `admin`/`admin`).
35+
- In the `Explore` section, select `Tempo` as the data source and use the `Trace ID` field to query traces.
36+
37+
4. **Alternative Trace Export**:
38+
39+
- Run the OpenTelemetry Collector manually:
40+
41+
.. code-block:: bash
42+
43+
docker run \
44+
-v ./config/otel-collector.yaml:/etc/otel-collector.yaml \
45+
otel/opentelemetry-collector-contrib \
46+
"--config=/etc/otel-collector.yaml"
47+
48+
5. **Copy Traces from MareNostrum 5**:
49+
50+
- To analyze traces from MareNostrum 5, copy them locally:
51+
52+
.. code-block:: bash
53+
54+
scp transfer1.bsc.es:~/.dataclay/otel-traces.json ./traces/otel-traces.json
55+
56+
6. **Troubleshooting**:
57+
58+
- If permission issues arise for the `/traces` folder, adjust permissions:
59+
60+
.. code-block:: bash
61+
62+
sudo chmod -R 777 traces

docs/telemetry/prometheus.rst

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
Prometheus
2+
==========
3+
4+
5+
Metrics
6+
-------
7+
8+
.. list-table::
9+
:header-rows: 1
10+
11+
* - Metric
12+
- Description
13+
- Service
14+
* - dataclay_inmemory_objects
15+
- Number of objects in memory
16+
- backend, client
17+
* - dataclay_loaded_objects
18+
- Number of loaded objects
19+
- backend
20+
* - dataclay_stored_objects
21+
- Number of stored objects
22+
- backend
23+
* - dataclay_inmemory_misses_total
24+
- Number of inmemory misses
25+
- backend, client
26+
* - dataclay_inmemory_hits_total
27+
- Number of inmemory hits
28+
- backend, client
29+
30+
31+
Deploy dataClay with Prometheus
32+
-------------------------------
33+
34+
Run dataClay with Prometheus:
35+
36+
.. note::
37+
This example is available in `GitHub <https://github.com/bsc-dom/dataclay/tree/main/examples/telemetry/prometheus>`__.
38+
39+
.. code-block:: bash
40+
41+
docker compose up -d
42+
43+
The ``metadata-service`` and ``backends`` will post their metrics to the ``8000`` port.
44+
Prometheus is configured to scrape this port to pull the metrics.
45+
46+
Access Prometheus at `http://localhost:9090 <http://localhost:9090>`_. You can query the metrics defined above.
47+
48+
49+
Deploy dataClay with Prometheus Pushgateway
50+
-------------------------------------------
51+
52+
Run dataClay with Prometheus Pushgateway:
53+
54+
.. note::
55+
This example is available in `GitHub <https://github.com/bsc-dom/dataclay/tree/main/examples/telemetry/prometheus-pushgateway>`__.
56+
57+
.. code-block:: bash
58+
59+
docker compose up -d
60+
61+
62+
The ``metadata-service`` and ``backends`` will push their metrics to the ``pushgateway`` at the ``9091`` port.
63+
64+
The ``client.py`` can also push metrics using the ``pushgateway``:
65+
66+
.. code-block:: bash
67+
68+
export DATACLAY_METRICS=true
69+
export DATACLAY_METRICS_EXPORTER=pushgateway
70+
export DATACLAY_METRICS_HOST=localhost # the default
71+
export DATACLAY_METRICS_PORT=9091
72+
python3 client.py
73+
74+
75+
Access the Pushgateway at `http://localhost:9091 <http://localhost:9091>`_ and Prometheus at `http://localhost:9090 <http://localhost:9090>`_.
76+
77+
.. note::
78+
When using ``pushgateway``, a new Python thread will run to push the metrics every 10 seconds (default).

0 commit comments

Comments
 (0)