Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the docimentation after the otel-collector update #12

Open
1 of 11 tasks
akvlad opened this issue Apr 18, 2024 · 4 comments
Open
1 of 11 tasks

Update the docimentation after the otel-collector update #12

akvlad opened this issue Apr 18, 2024 · 4 comments

Comments

@akvlad
Copy link
Contributor

akvlad commented Apr 18, 2024

After the otel collector dependencies were updated to the latest version, the were some deprecated processors.
Some of the processors (spanmetrics) were mentioned in the qryn documentation like https://qryn.metrico.in/#/telemetry/ingestion

Please:

Nice to have:

  • Please do grep the documentation by the every module deprecated during the otel-collector update
  • Replace every extra deprecated module if you find any

UPD:
Tasks to complete

@akvlad
Copy link
Contributor Author

akvlad commented Apr 18, 2024

@afzal-qxip
Copy link
Contributor

"@akvlad and @lmangani, the service graph metrics are not emitted on Qryn when using the configurations below."

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_http:
        endpoint: 0.0.0.0:14268
  zipkin:
    endpoint: 0.0.0.0:9411
  fluentforward:
    endpoint: 0.0.0.0:24224
  prometheus:
    config:
      scrape_configs:
        - job_name: 'otel-collector'
          scrape_interval: 5s
          static_configs:
            - targets: ['exporter:8080']
processors:
  batch:
    send_batch_size: 10000
    timeout: 5s
  memory_limiter:
    check_interval: 2s
    limit_mib: 1800
    spike_limit_mib: 500
  resourcedetection/system:
    detectors: ['system']
    system:
      hostname_sources: ['os']
  resource:
    attributes:
      - key: service.name
        value: "serviceName"
        action: upsert
  metricstransform:
    transforms:
      - include: calls_total
        action: update
        new_name: traces_spanmetrics_calls_total
      - include: latency
        action: update
        new_name: traces_spanmetrics_latency
connectors:
  spanmetrics:
    histogram:
      explicit:
        buckets: [100us, 1ms, 2ms, 6ms, 10ms, 100ms, 250ms]
    dimensions_cache_size: 1500
  servicegraph:
    metrics_exporter: qryn
    latency_histogram_buckets: [100us, 1ms, 2ms, 6ms, 10ms, 100ms, 250ms]
    dimensions:
      - randomContainer
    store:
      ttl: 2s
      max_items: 200
exporters:
  qryn:
    dsn: clickhouse://default:**********@nl.ch.hepic.tel:****/cloki
    timeout: 10s
    sending_queue:
      queue_size: 100
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s
      max_elapsed_time: 300s
    logs:
      format: json
  otlp/spanmetrics:
    endpoint: localhost:4317
    tls:
      insecure: true
  prometheus/servicegraph:
    endpoint: localhost:9090
    namespace: servicegraph
extensions:
  health_check:
  pprof:
  zpages:
service:
  extensions: [pprof, zpages, health_check]
  pipelines:
    logs:
      receivers: [fluentforward, otlp]
      processors: [memory_limiter, resourcedetection/system, resource, batch]
      exporters: [qryn]
    traces:
      receivers: [otlp, jaeger, zipkin]
      processors: [memory_limiter, resourcedetection/system, resource, batch]
      exporters: [qryn]
    traces/spanmetrics:
      receivers: [ otlp, jaeger, zipkin ]
      exporters: [spanmetrics]
    traces/servicegraph:
      receivers: [ otlp, jaeger, zipkin ]
      exporters: [servicegraph]
    metrics:
      receivers: [prometheus]
      processors: [memory_limiter, resourcedetection/system, resource, batch]
      exporters: [qryn]
    metrics/spanmetrics:
        receivers: [spanmetrics]
        processors: [ memory_limiter, resourcedetection/system, resource, batch ]
        exporters: [qryn]
    metrics/servicegraph:
        receivers: [servicegraph]
        processors: [memory_limiter, resourcedetection/system, resource, batch ]
        exporters: [qryn]

@lmangani
Copy link
Contributor

lmangani commented Apr 18, 2024

In my opinion this requires a local prometheus remote_write socket to ingest the service graph metrics

receivers:
  prometheusremotewrite:
    endpoint: 0.0.0.0:9090

Fictional Example

receivers:
  otlp:
    protocols:
      grpc:
  prometheusremotewrite:
    endpoint: 0.0.0.0:9090

connectors:
  servicegraph:
    latency_histogram_buckets: [100ms, 250ms, 1s, 5s, 10s]
    dimensions:
      - dimension-1
      - dimension-2
    store:
      ttl: 1s
      max_items: 10

exporters:
  prometheus/servicegraph:
    endpoint: localhost:9090
    namespace: servicegraph
  qryn:
    dsn: tcp://clickhouse-server:9000/cloki?username=default&password=*************
    timeout: 10s
    sending_queue:
      queue_size: 100
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s
      max_elapsed_time: 300s
    logs:
       format: json

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [servicegraph]
    metrics/servicegraph:
      receivers: [servicegraph]
      exporters: [prometheus/servicegraph]
    metrics:
      receivers: [prometheusremotewrite]
      processors: ...
      exporters: [qryn]

@afzal-qxip
Copy link
Contributor

@akvlad PR for the above task after testing.

Both Spanmetrices and Servicegraph are working fine after configuration update

#13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants