Skip to content

Commit

Permalink
Tracing: Standardize on otel tracing (grafana#75528)
Browse files Browse the repository at this point in the history
  • Loading branch information
marefr authored Oct 3, 2023
1 parent 4432c4c commit e4c1a7a
Show file tree
Hide file tree
Showing 46 changed files with 321 additions and 439 deletions.
52 changes: 30 additions & 22 deletions contribute/backend/instrumentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@ A distributed trace is data that tracks an application request as it flows throu

### Usage

Grafana currently supports two tracing implementations, [OpenTelemetry](https://opentelemetry.io/) and [OpenTracing](https://opentracing.io/). OpenTracing is deprecated, but still supported until we remove it. The two different implementations implements the `Tracer` and `Span` interfaces, defined in the _pkg/infra/tracing_ package, which you can use to create traces and spans. To get a hold of a `Tracer` you would need to get it injected as dependency into your service, see [Services](services.md) for more details.
Grafana uses [OpenTelemetry](https://opentelemetry.io/) for distributed tracing. There's an interface `Tracer` in the _pkg/infra/tracing_ package that implements the [OpenTelemetry Tracer interface](go.opentelemetry.io/otel/trace), which you can use to create traces and spans. To get a hold of a `Tracer` you would need to get it injected as dependency into your service, see [Services](services.md) for more details. For more information, see https://opentelemetry.io/docs/instrumentation/go/manual/.

Example:

Expand All @@ -166,6 +166,7 @@ import (

"github.com/grafana/grafana/pkg/infra/tracing"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/trace"
)

type MyService struct {
Expand All @@ -179,36 +180,36 @@ func ProvideService(tracer tracing.Tracer) *MyService {
}

func (s *MyService) Hello(ctx context.Context, name string) (string, error) {
ctx, span := s.tracer.Start(ctx, "MyService.Hello")
ctx, span := s.tracer.Start(ctx, "MyService.Hello", trace.WithAttributes(
attribute.String("my_attribute", "val"),
))
// this make sure the span is marked as finished when this
// method ends to allow the span to be flushed and sent to
// storage backend.
defer span.End()

// Add some event to show Events usage
span.AddEvents(
[]string{"message"},
[]tracing.EventValue{
{Str: "checking name..."},
})
span.AddEvent("checking name...")

if name == "" {
err := fmt.Errorf("name cannot be empty")

// sets the span’s status to Error to make the span tracking
// a failed operation as an error span.
span.SetStatus(codes.Error, "failed to check name")
// record err as an exception span event for this span
span.RecordError(err)
return "", err
}

// Add some other event to show Events usage
span.AddEvents(
[]string{"message"},
[]tracing.EventValue{
{Str: "name checked"},
})
span.AddEvent("name checked")

// Add attribute to show Attributes usage
span.SetAttributes("my_service.name", name, attribute.Key("my_service.name").String(name))
span.SetAttributes(
attribute.String("my_service.name", name),
attribute.Int64("my_service.some_other", int64(1337)),
)

return fmt.Sprintf("Hello %s", name), nil
}
Expand Down Expand Up @@ -243,6 +244,22 @@ If span names, attribute or event values originates from user input they **shoul

Be **careful** to not expose any sensitive information in span names, attribute or event values, e.g. secrets, credentials etc.

### Span attributes

Consider using `attributes.<Type>("<key>", <value>)` in favor of `attributes.Key("<key>").<Type>(<value>)` since it requires less characters and thereby reads easier.

Example:

```go
attribute.String("datasource_name", proxy.ds.Name)
// vs
attribute.Key("datasource_name").String(proxy.ds.Name)

attribute.Int64("org_id", proxy.ctx.SignedInUser.OrgID)
// vs
attribute.Key("org_id").Int64(proxy.ctx.SignedInUser.OrgID)
```

### How to collect, visualize and query traces (and correlate logs with traces) locally

#### 1. Start Jaeger
Expand All @@ -255,20 +272,11 @@ make devenv sources=jaeger

To enable tracing in Grafana, you must set the address in your config.ini file

opentelemetry tracing (recommended):

```ini
[tracing.opentelemetry.jaeger]
address = http://localhost:14268/api/traces
```

opentracing tracing (deprecated/not recommended):

```ini
[tracing.jaeger]
address = localhost:6831
```

#### 3. Search/browse collected logs and traces in Grafana Explore

You need provisioned gdev-jaeger and gdev-loki datasources, see [developer dashboard and data sources](https://github.com/grafana/grafana/tree/main/devenv#developer-dashboards-and-data-sources) for setup instructions.
Expand Down
4 changes: 2 additions & 2 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ require (
gopkg.in/mail.v2 v2.3.1 // @grafana/backend-platform
gopkg.in/yaml.v2 v2.4.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // @grafana/alerting-squad-backend
xorm.io/builder v0.3.6 // indirect; @grafana/backend-platform
xorm.io/builder v0.3.6 // @grafana/backend-platform
xorm.io/core v0.7.3 // @grafana/backend-platform
xorm.io/xorm v0.8.2 // @grafana/alerting-squad-backend
)
Expand Down Expand Up @@ -174,7 +174,7 @@ require (
github.com/grpc-ecosystem/go-grpc-prometheus v1.2.1-0.20191002090509-6af20e3a5340 // indirect
github.com/hashicorp/errwrap v1.1.0 // indirect
github.com/hashicorp/go-msgpack v0.5.5 // indirect
github.com/hashicorp/go-multierror v1.1.1 // indirect; @grafana/grafana-as-code
github.com/hashicorp/go-multierror v1.1.1 // @grafana/grafana-as-code
github.com/hashicorp/go-sockaddr v1.0.2 // indirect
github.com/hashicorp/golang-lru v0.6.0 // indirect
github.com/hashicorp/yamux v0.1.1 // indirect
Expand Down
15 changes: 9 additions & 6 deletions pkg/api/pluginproxy/ds_proxy.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ import (
"time"

"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/trace"

"github.com/grafana/grafana/pkg/api/datasource"
"github.com/grafana/grafana/pkg/infra/httpclient"
Expand Down Expand Up @@ -142,10 +143,12 @@ func (proxy *DataSourceProxy) HandleRequest() {

proxy.ctx.Req = proxy.ctx.Req.WithContext(ctx)

span.SetAttributes("datasource_name", proxy.ds.Name, attribute.Key("datasource_name").String(proxy.ds.Name))
span.SetAttributes("datasource_type", proxy.ds.Type, attribute.Key("datasource_type").String(proxy.ds.Type))
span.SetAttributes("user", proxy.ctx.SignedInUser.Login, attribute.Key("user").String(proxy.ctx.SignedInUser.Login))
span.SetAttributes("org_id", proxy.ctx.SignedInUser.OrgID, attribute.Key("org_id").Int64(proxy.ctx.SignedInUser.OrgID))
span.SetAttributes(
attribute.String("datasource_name", proxy.ds.Name),
attribute.String("datasource_type", proxy.ds.Type),
attribute.String("user", proxy.ctx.SignedInUser.Login),
attribute.Int64("org_id", proxy.ctx.SignedInUser.OrgID),
)

proxy.addTraceFromHeaderValue(span, "X-Panel-Id", "panel_id")
proxy.addTraceFromHeaderValue(span, "X-Dashboard-Id", "dashboard_id")
Expand All @@ -155,11 +158,11 @@ func (proxy *DataSourceProxy) HandleRequest() {
reverseProxy.ServeHTTP(proxy.ctx.Resp, proxy.ctx.Req)
}

func (proxy *DataSourceProxy) addTraceFromHeaderValue(span tracing.Span, headerName string, tagName string) {
func (proxy *DataSourceProxy) addTraceFromHeaderValue(span trace.Span, headerName string, tagName string) {
panelId := proxy.ctx.Req.Header.Get(headerName)
dashId, err := strconv.Atoi(panelId)
if err == nil {
span.SetAttributes(tagName, dashId, attribute.Key(tagName).Int(dashId))
span.SetAttributes(attribute.Int(tagName, dashId))
}
}

Expand Down
6 changes: 4 additions & 2 deletions pkg/api/pluginproxy/pluginproxy.go
Original file line number Diff line number Diff line change
Expand Up @@ -109,8 +109,10 @@ func (proxy *PluginProxy) HandleRequest() {

proxy.ctx.Req = proxy.ctx.Req.WithContext(ctx)

span.SetAttributes("user", proxy.ctx.SignedInUser.Login, attribute.Key("user").String(proxy.ctx.SignedInUser.Login))
span.SetAttributes("org_id", proxy.ctx.SignedInUser.OrgID, attribute.Key("org_id").Int64(proxy.ctx.SignedInUser.OrgID))
span.SetAttributes(
attribute.String("user", proxy.ctx.SignedInUser.Login),
attribute.Int64("org_id", proxy.ctx.SignedInUser.OrgID),
)

proxy.tracer.Inject(ctx, proxy.ctx.Req.Header, span)

Expand Down
2 changes: 1 addition & 1 deletion pkg/bus/bus.go
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ func (b *InProcBus) Publish(ctx context.Context, msg Msg) error {
_, span := b.tracer.Start(ctx, "bus - "+msgName)
defer span.End()

span.SetAttributes("msg", msgName, attribute.Key("msg").String(msgName))
span.SetAttributes(attribute.String("msg", msgName))

return nil
}
Expand Down
4 changes: 2 additions & 2 deletions pkg/expr/commands.go
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ func (gm *MathCommand) NeedsVars() []string {
// failed to execute.
func (gm *MathCommand) Execute(ctx context.Context, _ time.Time, vars mathexp.Vars, tracer tracing.Tracer) (mathexp.Results, error) {
_, span := tracer.Start(ctx, "SSE.ExecuteMath")
span.SetAttributes("expression", gm.RawExpression, attribute.Key("expression").String(gm.RawExpression))
span.SetAttributes(attribute.String("expression", gm.RawExpression))
defer span.End()
return gm.Expression.Execute(gm.refID, vars, tracer)
}
Expand Down Expand Up @@ -163,7 +163,7 @@ func (gr *ReduceCommand) Execute(ctx context.Context, _ time.Time, vars mathexp.
_, span := tracer.Start(ctx, "SSE.ExecuteReduce")
defer span.End()

span.SetAttributes("reducer", gr.Reducer, attribute.Key("reducer").String(gr.Reducer))
span.SetAttributes(attribute.String("reducer", gr.Reducer))

newRes := mathexp.Results{}
for i, val := range vars[gr.VarToReduce].Values {
Expand Down
2 changes: 1 addition & 1 deletion pkg/expr/dataplane.go
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ func shouldUseDataplane(frames data.Frames, logger log.Logger, disable bool) (dt
func handleDataplaneFrames(ctx context.Context, tracer tracing.Tracer, t data.FrameType, frames data.Frames) (mathexp.Results, error) {
_, span := tracer.Start(ctx, "SSE.HandleDataPlaneData")
defer span.End()
span.SetAttributes("dataplane.type", t, attribute.Key("dataplane.type").String(string(t)))
span.SetAttributes(attribute.String("dataplane.type", string(t)))

switch t.Kind() {
case data.KindUnknown:
Expand Down
4 changes: 2 additions & 2 deletions pkg/expr/graph.go
Original file line number Diff line number Diff line change
Expand Up @@ -99,10 +99,10 @@ func (dp *DataPipeline) execute(c context.Context, now time.Time, s *Service) (m
}

c, span := s.tracer.Start(c, "SSE.ExecuteNode")
span.SetAttributes("node.refId", node.RefID(), attribute.Key("node.refId").String(node.RefID()))
span.SetAttributes(attribute.String("node.refId", node.RefID()))
if len(node.NeedsVars()) > 0 {
inputRefIDs := node.NeedsVars()
span.SetAttributes("node.inputRefIDs", inputRefIDs, attribute.Key("node.inputRefIDs").StringSlice(inputRefIDs))
span.SetAttributes(attribute.StringSlice("node.inputRefIDs", inputRefIDs))
}
defer span.End()

Expand Down
28 changes: 13 additions & 15 deletions pkg/expr/nodes.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,12 @@ import (
"github.com/grafana/grafana-plugin-sdk-go/backend"
"github.com/grafana/grafana-plugin-sdk-go/data"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/codes"
"gonum.org/v1/gonum/graph/simple"

"github.com/grafana/grafana/pkg/expr/classic"
"github.com/grafana/grafana/pkg/expr/mathexp"
"github.com/grafana/grafana/pkg/infra/log"
"github.com/grafana/grafana/pkg/infra/tracing"
"github.com/grafana/grafana/pkg/services/datasources"
"github.com/grafana/grafana/pkg/services/featuremgmt"
)
Expand Down Expand Up @@ -236,8 +236,10 @@ func executeDSNodesGrouped(ctx context.Context, now time.Time, vars mathexp.Vars
"datasourceVersion", firstNode.datasource.Version,
)

span.SetAttributes("datasource.type", firstNode.datasource.Type, attribute.Key("datasource.type").String(firstNode.datasource.Type))
span.SetAttributes("datasource.uid", firstNode.datasource.UID, attribute.Key("datasource.uid").String(firstNode.datasource.UID))
span.SetAttributes(
attribute.String("datasource.type", firstNode.datasource.Type),
attribute.String("datasource.uid", firstNode.datasource.UID),
)

req := &backend.QueryDataRequest{
PluginContext: pCtx,
Expand All @@ -261,11 +263,8 @@ func executeDSNodesGrouped(ctx context.Context, now time.Time, vars mathexp.Vars
if e != nil {
responseType = "error"
respStatus = "failure"
span.AddEvents([]string{"error", "message"},
[]tracing.EventValue{
{Str: fmt.Sprintf("%v", err)},
{Str: "failed to query data source"},
})
span.SetStatus(codes.Error, "failed to query data source")
span.RecordError(e)
}
logger.Debug("Data source queried", "responseType", responseType)
useDataplane := strings.HasPrefix(responseType, "dataplane-")
Expand Down Expand Up @@ -313,8 +312,10 @@ func (dn *DSNode) Execute(ctx context.Context, now time.Time, _ mathexp.Vars, s
if err != nil {
return mathexp.Results{}, err
}
span.SetAttributes("datasource.type", dn.datasource.Type, attribute.Key("datasource.type").String(dn.datasource.Type))
span.SetAttributes("datasource.uid", dn.datasource.UID, attribute.Key("datasource.uid").String(dn.datasource.UID))
span.SetAttributes(
attribute.String("datasource.type", dn.datasource.Type),
attribute.String("datasource.uid", dn.datasource.UID),
)

req := &backend.QueryDataRequest{
PluginContext: pCtx,
Expand All @@ -337,11 +338,8 @@ func (dn *DSNode) Execute(ctx context.Context, now time.Time, _ mathexp.Vars, s
if e != nil {
responseType = "error"
respStatus = "failure"
span.AddEvents([]string{"error", "message"},
[]tracing.EventValue{
{Str: fmt.Sprintf("%v", err)},
{Str: "failed to query data source"},
})
span.SetStatus(codes.Error, "failed to query data source")
span.RecordError(e)
}
logger.Debug("Data source queried", "responseType", responseType)
useDataplane := strings.HasPrefix(responseType, "dataplane-")
Expand Down
12 changes: 7 additions & 5 deletions pkg/infra/httpclient/httpclientprovider/tracing_middleware.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ import (
"go.opentelemetry.io/contrib/instrumentation/net/http/httptrace/otelhttptrace"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/codes"
semconv "go.opentelemetry.io/otel/semconv/v1.17.0"
"go.opentelemetry.io/otel/trace"

"github.com/grafana/grafana/pkg/infra/log"
Expand All @@ -30,17 +31,18 @@ func TracingMiddleware(logger log.Logger, tracer tracing.Tracer) httpclient.Midd
ctx = httptrace.WithClientTrace(ctx, otelhttptrace.NewClientTrace(ctx, otelhttptrace.WithoutSubSpans(), otelhttptrace.WithoutHeaders()))
req = req.WithContext(ctx)
for k, v := range opts.Labels {
span.SetAttributes(k, v, attribute.Key(k).String(v))
span.SetAttributes(attribute.String(k, v))
}

tracer.Inject(ctx, req.Header, span)
res, err := next.RoundTrip(req)

span.SetAttributes("http.url", req.URL.String(), attribute.String("http.url", req.URL.String()))
span.SetAttributes("http.method", req.Method, attribute.String("http.method", req.Method))
span.SetAttributes(semconv.HTTPURL(req.URL.String()))
span.SetAttributes(semconv.HTTPMethod(req.Method))
// ext.SpanKind.Set(span, ext.SpanKindRPCClientEnum)

if err != nil {
span.SetStatus(codes.Error, "request failed")
span.RecordError(err)
return res, err
}
Expand All @@ -49,10 +51,10 @@ func TracingMiddleware(logger log.Logger, tracer tracing.Tracer) httpclient.Midd
// we avoid measuring contentlength less than zero because it indicates
// that the content size is unknown. https://godoc.org/github.com/badu/http#Response
if res.ContentLength > 0 {
span.SetAttributes(httpContentLengthTagKey, res.ContentLength, attribute.Key(httpContentLengthTagKey).Int64(res.ContentLength))
span.SetAttributes(attribute.Int64(httpContentLengthTagKey, res.ContentLength))
}

span.SetAttributes("http.status_code", res.StatusCode, attribute.Int("http.status_code", res.StatusCode))
span.SetAttributes(semconv.HTTPStatusCode(res.StatusCode))
if res.StatusCode >= 400 {
span.SetStatus(codes.Error, fmt.Sprintf("error with HTTP status code %s", strconv.Itoa(res.StatusCode)))
}
Expand Down
4 changes: 2 additions & 2 deletions pkg/infra/serverlock/serverlock.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ type ServerLockService struct {
func (sl *ServerLockService) LockAndExecute(ctx context.Context, actionName string, maxInterval time.Duration, fn func(ctx context.Context)) error {
start := time.Now()
ctx, span := sl.tracer.Start(ctx, "ServerLockService.LockAndExecute")
span.SetAttributes("serverlock.actionName", actionName, attribute.Key("serverlock.actionName").String(actionName))
span.SetAttributes(attribute.String("serverlock.actionName", actionName))
defer span.End()

ctxLogger := sl.log.FromContext(ctx)
Expand Down Expand Up @@ -138,7 +138,7 @@ func (sl *ServerLockService) getOrCreate(ctx context.Context, actionName string)
func (sl *ServerLockService) LockExecuteAndRelease(ctx context.Context, actionName string, maxInterval time.Duration, fn func(ctx context.Context)) error {
start := time.Now()
ctx, span := sl.tracer.Start(ctx, "ServerLockService.LockExecuteAndRelease")
span.SetAttributes("serverlock.actionName", actionName, attribute.Key("serverlock.actionName").String(actionName))
span.SetAttributes(attribute.String("serverlock.actionName", actionName))
defer span.End()

ctxLogger := sl.log.FromContext(ctx)
Expand Down
2 changes: 1 addition & 1 deletion pkg/infra/tracing/test_helper.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ func InitializeTracerForTest(opts ...TracerForTestOption) Tracer {

otel.SetTracerProvider(tp)

ots := &Opentelemetry{Propagation: "jaeger,w3c", tracerProvider: tp}
ots := &TracingService{Propagation: "jaeger,w3c", tracerProvider: tp}
_ = ots.initOpentelemetryTracer()
return ots
}
Loading

0 comments on commit e4c1a7a

Please sign in to comment.