Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OpenTelemetry support during function proxy #1684

Open
1 of 3 tasks
LucasRoesler opened this issue Dec 17, 2021 · 6 comments
Open
1 of 3 tasks

Add OpenTelemetry support during function proxy #1684

LucasRoesler opened this issue Dec 17, 2021 · 6 comments

Comments

@LucasRoesler
Copy link
Member

My actions before raising this issue

Expected Behaviour

During function proxy, the Gateway should be able to produce open telemetry spans.

Current Behaviour

There are no tracing spans

List All Possible Solutions and Workarounds

Which Solution Do You Recommend?

I recently did a walk-through for integrating OpenTelemetry with OpenFaaS functions and think it would be nice if the Gateway could produce an OpenTelemetry spans during function invocation. Adding tracing during the function proxy would provide a more accurate picture of the networking in the cluster and enable accurate assessments of the overhead (or lack thereof) from the Gateway.

We previously discussed this in general in #1354 but OpenTelemetry was not a active project at the time, only OpenTracing. OpenTelemetry. OpenTelemetry makes this integration much more feasbile now because we can more easily provide support for multiple exporters. Additionally, the OpenTelemetry providers generally allow all of the required configuration via env variables, which means the integration should require only minimal changes to the Gateway.

During the Gateway startup we would initialize and set the global tracing provider using something like this

shutdownTracing, err := tracing.Provider(config.Version, config.Commit)
if err != nil {
	log.Fatal(err)
}
// Cleanly shutdown and flush telemetry when the application exits.
defer shutdownTracing(ctx)

We can then encapsulate all of the tracing specific code in the Provider implemenation

func Provider(version, commit string) (shutdown Shutdown, err error) {
	exporter := Exporter(os.Getenv("OTEL_EXPORTER"))

	var exp tracesdk.TracerProviderOption
	switch exporter {
	case JaegerExporter:
		// configure the collector from the env variables,
		// OTEL_EXPORTER_JAEGER_ENDPOINT/USER/PASSWORD
		j, e := jaeger.New(jaeger.WithCollectorEndpoint())
		exp, err = tracesdk.WithBatcher(j), e
	case LogExporter:
		w := os.Stdout
		opts := []stdouttrace.Option{stdouttrace.WithWriter(w)}
		if truthyEnv("OTEL_EXPORTER_LOG_PRETTY_PRINT") {
			opts = append(opts, stdouttrace.WithPrettyPrint())
		}
		if !truthyEnv("OTEL_EXPORTER_LOG_TIMESTAMPS") {
			opts = append(opts, stdouttrace.WithoutTimestamps())
		}

		s, e := stdouttrace.New(opts...)
		exp, err = tracesdk.WithSyncer(s), e
	// additional exporters
	default:
		logrus.Warn("tracing disabled")
		// We explicitly DO NOT set the global TracerProvider using otel.SetTracerProvider().
		// The unset TracerProvider returns a no-op "non-recording" span, but still passes through context.
		otel.SetTextMapPropagator(
			propagation.NewCompositeTextMapPropagator(propagation.TraceContext{}, propagation.Baggage{}),
		)
		// return no-op shutdown function
		return func(_ context.Context) {}, nil
	}
	if err != nil {
		return nil, err
	}
	
	// some additional work to
	// finish initializing the provider 
	
	otel.SetTracerProvider(provider)

	shutdown = func(ctx context.Context) {
		// Do not let the application hang forever when it is shutdown.
		ctx, cancel := context.WithTimeout(ctx, time.Second*5)
		defer cancel()

		err := provider.Shutdown(ctx)
		if err != nil {
			logrus.WithError(err).Error("tracing provider did not gracefully shutdown")
		}
	}
	return shutdown, nil
}

Inside the function invocation hanlder here

baseURL := baseURLResolver.Resolve(r)
we would add

	var err error
	_, span := otel.Tracer("Gateway").Start(r.Context(), "Proxy")
	defer func() {
		if err != nil {
			span.SetStatus(codes.Error, err.Error())
			span.RecordError(err)
		}
		span.End()
	}()

This would then show as a new span named "Proxy" between the ingress and the function (if they have tracing enabled). There are a few other things we could do, e.g. adding the status code, original url, and request url as metadata to the span, but this is optional for a minimal implementation.

Steps to Reproduce (for bugs)

  1. Follow this walkthrough https://github.com/LucasRoesler/openfaas-tracing-walkthrough

Context

https://github.com/LucasRoesler/openfaas-tracing-walkthrough

@LucasRoesler
Copy link
Member Author

@alexellis i have created a preview implementation.

We could also decide to add basic tracing to the rest of the gateway endpoints as well, if you would like.

@alexellis
Copy link
Member

Thanks for putting this together. It's on my radar and I've seen it.

@berylshow
Copy link

Please?Does the Java function template have a demo ?

@LucasRoesler
Copy link
Member Author

@berylshow do you mean a demo of tracing support in particular? No, I don't know of any demos for the java template. In general, OpenFaaS will forward all of the required http Headers, so you can probaby follow a generic Java tutorial for manually intrumenting your code. Unfortunately, I am not really a java developer so I am not aware of a good tutorial to recommend.

@yididiyag
Copy link

When shall we expect this, it has been more than a year.

@alexellis
Copy link
Member

Hi folks, we'd be looking for a paying customer to have a need for this and to sponsor the: initial development, and ongoing documentation, maintenance, testing and explanation of how this would and should work in OpenFaaS.

Thanks again for your interest.

If you'd like to tell us more about why you need this for your functions, we have a weekly call, and would welcome your input there.

https://docs.openfaas.com/community/

If you'd like to discuss funding this work, feel free to reach out.

Alex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants