Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(pdk) - respect the trace flags from the parent span (if present) #13015

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

andyroyle
Copy link

Summary

Kong respects the should_sample decision from a parent span if one was provided. Previously it was not doing this resulting in incomplete traces.

When using kong as an internal api gateway we are seeing incomplete traces. The logs show that the following appears to be happening:

(this is based on a sampling rate of 0.01)

  • Service A makes a request to Service B with no traceparent header specified
  • Kong generates a traceparent header and specifies trace flags of "01" (i.e. the trace should be sampled)
  • Service B handles the request from Service A and ingests the traceparent header
  • Service B then makes a request to Service C, setting the traceparent header appropriately (i.e. using flags "01")
  • Kong then resets the trace flags to "00" (i.e. the trace should not be sampled)
  • Service C handles the request from Service B and, respecting the flags from the traceparent header, does not sample the trace

This results in Kong and Services A and B sampling the trace, but Service C does not, and so we get an incomplete trace.

I identified the get_sampling_decision method as the most likely place where the issue occurs, and since parent_should_sample is a tri-state boolean (i.e. it can be true, false, or nil) it makes most sense (to me) to respect the value of parent_should_sample if it exists.

I tested this locally and it behaves as I would expect; when a traceparent header is included in the request, then the flags are respected, however if not, probabilistic sampling is applied.

However rereading the code, I can't work out why that should be the case, since the probablistic sampler is deterministic based on trace id (i.e. the same trace-id will always return the same outcome). Even stepping through the code, I couldn't work out exactly what was going on. It seems likely to me that it's something to do with how the root span is constructed, but I couldn't tell for sure.

Checklist

  • The Pull Request has tests
  • A changelog file has been created under changelog/unreleased/kong or skip-changelog label added on PR if changelog is unnecessary. README.md
  • There is a user-facing docs PR against https://github.com/Kong/docs.konghq.com - PUT DOCS PR HERE

Kong respects the `should_sample` decision from a parent span if one was provided. Previously it was not doing this resulting in incomplete traces.

When using kong as an internal api gateway we are seeing incomplete traces. The logs show that the following appears to be happening:

(this is based on a sampling rate of 0.01)

- Service A makes a request to Service B with no traceparent header specified
- Kong generates a traceparent header and specifies trace flags of `"01"` (i.e. the trace should be sampled)
- Service B handles the request from Service A and ingests the traceparent header
- Service B then makes a request to Service C, setting the traceparent header appropriately (i.e. using flags `"01"`)
- Kong then resets the trace flags to `"00"` (i.e. the trace should *not* be sampled)
- Service C handles the request from Service B and, respecting the flags from the traceparent header, does not sample the trace

This results in Kong and Services A and B sampling the trace, but Service C does not, and so we get an incomplete trace.

I identified the `get_sampling_decision` method as the most likely place where the issue occurs, and since `parent_should_sample` is a tri-state boolean (i.e. it can be `true`, `false`, or `nil`) it makes most sense (to me) to respect the value of `parent_should_sample` if it exists.

I tested this locally and it behaves as I would expect; when a traceparent header is included in the request, then the flags are respected, however if not, probabilistic sampling is applied.

However rereading the code, I can't work out *why* that should be the case, since the probablistic sampler is deterministic based on trace id (i.e. the same trace-id will always return the same outcome). Even stepping through the code, I couldn't work out exactly what was going on. It seems likely to me that it's something to do with how the root span is constructed, but I couldn't tell for sure.
@CLAassistant
Copy link

CLAassistant commented May 10, 2024

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added core/pdk core/tracing cherry-pick kong-ee schedule this PR for cherry-picking to kong/kong-ee labels May 10, 2024
@team-eng-enablement team-eng-enablement added the author/community PRs from the open-source community (not Kong Inc) label May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author/community PRs from the open-source community (not Kong Inc) cherry-pick kong-ee schedule this PR for cherry-picking to kong/kong-ee core/pdk core/tracing size/S
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants