Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core): add otel to opentdf services #1858

Open
wants to merge 36 commits into
base: main
Choose a base branch
from

Conversation

sujankota
Copy link
Contributor

@sujankota sujankota commented Jan 14, 2025

Closes: DSPX-221

Summary

This PR refactors the OpenTelemetry tracing implementation to support multiple tracing providers dynamically, moving away from the previous setup that was primarily focused on Jaeger. It addresses the suggestion in PR #1858 (comment) to make the configuration provider-agnostic, leveraging the standard OpenTelemetry Protocol (OTLP) for better interoperability.

A new configuration structure under server.trace has been introduced, allowing users to explicitly enable/disable tracing and select a provider (otlp or file) along with its specific settings.

Key Changes

  • New Configuration Structure: Introduced Config, ProviderConfig, OTLPConfig, and FileConfig structs under the server.trace key in the main configuration (opentdf.yaml).
  • Dynamic Exporter Creation: Refactored tracing.InitTracer to create the appropriate SpanExporter based on the configured provider.name.
  • OTLP Provider Support: Implemented the otlp provider, supporting:
    • Both grpc (default) and http/protobuf protocols.
    • Configurable endpoint.
    • insecure flag for disabling TLS.
    • Optional headers for authentication, etc.
  • File Provider Support: Implemented the file provider, which uses stdouttrace directed to a local file:
    • Configurable output path.
    • Log rotation handled by lumberjack (configurable maxSize, maxBackups, maxAge, compress).
    • Option for prettyPrint JSON output.
  • Removed Old Configuration: Removed the previous ExportToJaeger boolean and Folder string configuration fields.
  • Resource Creation: Updated OpenTelemetry resource creation to use standard detectors (resource.Default()) merged with explicit attributes and environment variables (resource.WithFromEnv()) for richer metadata.
  • README Update: Added a new section to the main README detailing the server.trace configuration options with examples for different providers.
  • Improved Shutdown: Ensured the file writer (if used) is closed during tracer shutdown.
  • Enhanced Logging: Added more informative logs during tracer initialization.

Configuration Changes

The tracing configuration in opentdf.yaml (or equivalent) needs to be updated from the old format to the new structure under server.trace.

New Structure Example:

server:
  # ... other server config ...
  trace:
    enabled: true # Or false to disable tracing entirely
    provider:
      # --- Provider Selection ---
      name: otlp # Required: Select provider ("otlp" or "file")

      # --- OTLP Provider Config (only used if name: otlp) ---
      otlp:
        endpoint: "localhost:4317" # Required: OTLP receiver endpoint (e.g., Jaeger, Tempo, OTEL Collector)
        protocol: grpc            # Optional: "grpc" (default) or "http/protobuf"
        insecure: true            # Optional: true (disable TLS) or false (use TLS - default)
        # headers:                # Optional: Key-value headers for the exporter
        #   Authorization: "Bearer my-token"

      # --- File Provider Config (only used if name: file) ---
      # file:
      #   path: "./traces.log"      # Required: Output file path
      #   prettyPrint: false        # Optional: Use readable JSON formatting (default: false)
      #   maxSize: 20             # Optional: Max size in MB before rotation (default: 20)
      #   maxBackups: 10            # Optional: Max rotated files to keep (default: 10)
      #   maxAge: 30              # Optional: Max days to keep rotated files (default: 30)
      #   compress: false           # Optional: Compress rotated files (default: false)

Testing

Verified configuration parsing for enabled, otlp, and file providers.
Manually tested sending traces via OTLP/gRPC and OTLP/HTTP to a local OpenTelemetry Collector (using debug exporter), confirming spans are received.
Manually tested the file provider, confirming trace output to the specified file with lumberjack rotation.
Confirmed stdouttrace works locally during debugging.
Ensured enabled: false results in a no-op tracer.
Documentation

Updated README.md with the new "Observability: Distributed Tracing (OpenTelemetry)" section explaining the configuration.

Checklist

  • I have added or updated unit tests
  • I have added or updated integration tests (if appropriate)
  • I have added or updated documentation

Testing Instructions

@sujankota sujankota marked this pull request as ready for review January 28, 2025 16:10
@sujankota sujankota requested review from a team as code owners January 28, 2025 16:10
@sujankota sujankota requested a review from a team as a code owner January 28, 2025 17:54
@dmihalcik-virtru dmihalcik-virtru self-requested a review February 3, 2025 16:27
@sujankota sujankota enabled auto-merge March 3, 2025 13:21
@pflynn-virtru pflynn-virtru changed the title feat(core): add otelto opentdf services feat(core): add otel to opentdf services Mar 11, 2025
pflynn-virtru and others added 3 commits March 17, 2025 11:22
Cleaned up go.mod by removing indirect dependencies that are no longer needed. This reduces clutter and ensures the project only tracks essential dependencies, improving maintainability.
Updated `Provider` methods to use pointer receivers for consistency and potential performance improvements. Added Tracer initialization in test cases to enable tracing functionality during tests.
opentdf-dev.yaml Outdated
trace:
enabled: true
folder: "traces"
exportToJaeger: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pflynn-virtru One of the things I talked to @sujankota about was updating the config so its on specific to jaeger. The open telemetry protocol is there so it can work across providers.

This is what I was thinking for a config setup but by no means do you have to implement this exact way.

type TracingConfig struct {
    Enabled  bool           `json:"enabled" yaml:"enabled"`
    Provider ProviderConfig `json:"provider" yaml:"provider"`
}

type ProviderConfig struct {
    Name   string        `json:"name" yaml:"name"` // "otlp", "file", "jaeger", etc.
    OTLP   *OTLPConfig   `json:"otlp,omitempty" yaml:"otlp,omitempty"`
    File   *FileConfig   `json:"file,omitempty" yaml:"file,omitempty"`
}

type OTLPConfig struct {
    Endpoint string `json:"endpoint" yaml:"endpoint"`
    Insecure bool   `json:"insecure" yaml:"insecure"`
}

type FileConfig struct {
    Path string `json:"path" yaml:"path"`
}

Introduced OpenTelemetry-based distributed tracing, configurable via `server.trace` in the main configuration. Added support for multiple providers: OTLP (gRPC/HTTP) for external backends (e.g., Jaeger) and file-based JSON logging with rotation for local debugging. Updated README with configuration details and examples.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants