Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible memory leak and failure to restore the observation scope when using a JPA repository from a GraphQL controller #41

Open
nikb1tamerfdp opened this issue Jun 25, 2024 · 4 comments

Comments

@nikb1tamerfdp
Copy link

The context

In my application (Spring Boot 3.2.6), I'm using the GraphQL, Web and JPA starters.
I have created a GraphQL controller, in which I interact with a JPA repository.
I have also enabled observability with Micrometer.

The issue

When I query my controller, everything executes successfully, but I can find this log line (formatted for clarity purpose) :

[...] WARN [...] --- [nio-8080-exec-3] [app=My Application, traceId=7f1a080fea0a3662e74ce0f85d00664b, spanId=3f4b468cb5bccffd] i.m.o.c.ObservationThreadLocalAccessor :
Observation <{
	name=http.server.requests(null),
	error=null,
	context=name='http.server.requests',
	contextualName='null',
	error='null',
	lowCardinalityKeyValues=[
		exception='none',
		method='POST',
		outcome='SUCCESS',
		status='200',
		uri='UNKNOWN'
	],
	highCardinalityKeyValues=[
		http.url='/graphql'
	],
	map=[
		class io.micrometer.tracing.handler.TracingObservationHandler$TracingContext='TracingContext{
			span=ImmutableSpanContext{
				traceId=7f1a080fea0a3662e74ce0f85d00664b,
				spanId=6664922c03ad4f85,
				traceFlags=01,
				traceState=ArrayBasedTraceState{entries=[]}, 
				remote=false, 
				valid=true
			}
		}',
		class io.micrometer.core.instrument.LongTaskTimer$Sample='SampleImpl{
			duration(seconds)=0.0147542, 
			duration(nanos)=1.47542E7, 
			startTimeNanos=29155328641800
		}',
		class io.micrometer.core.instrument.Timer$Sample='io.micrometer.core.instrument.Timer$Sample@3b0b7fc2'
	],
	parentObservation=null
}>
to which we're restoring is not the same as the one set as this scope's parent observation
<{
	name=graphql.datafetcher(null), 
	error=null, 
	context=name='graphql.datafetcher', 
	contextualName='null', 
	error='null', 
	lowCardinalityKeyValues=[
		graphql.error.type='NONE', 
		graphql.field.name='retrieveBookById', 
		graphql.outcome='SUCCESS'
	], 
	highCardinalityKeyValues=[
		graphql.field.path='/retrieveBookById'
	],
	map=[
		class io.micrometer.tracing.handler.TracingObservationHandler$TracingContext='TracingContext{
			span=ImmutableSpanContext{
				traceId=7f1a080fea0a3662e74ce0f85d00664b, 
				spanId=a4204ff90062fd5a, 
				traceFlags=01, 
				traceState=ArrayBasedTraceState{entries=[]}, 
				remote=false, 
				valid=true
			}
		}', 
		class io.micrometer.core.instrument.LongTaskTimer$Sample='SampleImpl{
			duration(seconds)=0.0108987, 
			duration(nanos)=1.08987E7, 
			startTimeNanos=29155332569100
		}', 
		class io.micrometer.core.instrument.Timer$Sample='io.micrometer.core.instrument.Timer$Sample@48ad6d4a'
	], 
	parentObservation={
		name=graphql.request(null), 
		error=null, 
		context=name='graphql.request', 
		contextualName='null', 
		error='null', 
		lowCardinalityKeyValues=[
			graphql.operation='query', 
			graphql.outcome='INTERNAL_ERROR'
		], 
		highCardinalityKeyValues=[
			graphql.execution.id='3bb1672e-660d-bb96-d7c5-4bd6ceb56cd6'
		], 
		map=[
			class io.micrometer.tracing.handler.TracingObservationHandler$TracingContext='TracingContext{
				span=ImmutableSpanContext{
					traceId=7f1a080fea0a3662e74ce0f85d00664b, 
					spanId=deeb0dd86af7c990, 
					traceFlags=01, 
					traceState=ArrayBasedTraceState{entries=[]}, 
					remote=false, 
					valid=true
				}
			}', 
			class io.micrometer.core.instrument.LongTaskTimer$Sample='SampleImpl{
				duration(seconds)=0.0138164, 
				duration(nanos)=1.38164E7, 
				startTimeNanos=29155329694600
			}', 
			class io.micrometer.core.instrument.Timer$Sample='io.micrometer.core.instrument.Timer$Sample@3930f5ed'
		], 
		parentObservation={
			name=http.server.requests(null), 
			error=null, 
			context=name='http.server.requests', 
			contextualName='null', 
			error='null', 
			lowCardinalityKeyValues=[
				exception='none', 
				method='POST', 
				outcome='SUCCESS', 
				status='200', 
				uri='UNKNOWN'
			], 
			highCardinalityKeyValues=[
				http.url='/graphql'
			], 
			map=[
				class io.micrometer.tracing.handler.TracingObservationHandler$TracingContext='TracingContext{
					span=ImmutableSpanContext{
						traceId=7f1a080fea0a3662e74ce0f85d00664b, 
						spanId=6664922c03ad4f85, 
						traceFlags=01, 
						traceState=ArrayBasedTraceState{entries=[]}, 
						remote=false, valid=true
					}
				}',
				class io.micrometer.core.instrument.LongTaskTimer$Sample='SampleImpl{
					duration(seconds)=0.0149397, 
					duration(nanos)=1.49397E7, 
					startTimeNanos=29155328641800
				}', 
				class io.micrometer.core.instrument.Timer$Sample='io.micrometer.core.instrument.Timer$Sample@3b0b7fc2'
			], 
			parentObservation=null
		}
	}
}>.
Most likely a manually created Observation has a scope opened that was never closed. This may lead to thread polluting and memory leaks.

When I connect my application to an Open Telemetry with a Jaeger backend, just to check if everything is ok in terms of trace and spans, it seems that yeah, everything is ok :
Issue - Jaeger

This log is not generated if I remove interaction with my JPA repository.
It's also not generated if I annotate the method of my controller (the one annotated with @QueryMapping) with @Observed, or if I set the property spring.jpa.open-in-view to false (something I prefer not to do).

How to reproduce

You can find a sample application right here.

You can query the controller with these information :

  • URL : http://localhost:8080/graphql
  • Query :
    retrieveBookById(id: "book-1") {
      id
      name
    }

Additional comments

In the original ticket that I've created on Spring Boot project side, @bclozel added this comment, maybe it can give more info :

I'm not sure about the actual source of the problem, but I think that the JPA instrumentation assumes something that is invalid in this case here. First, a bit a background:

  • Spring for GraphQL instruments your controller methods by setting/restoring thread locals before/after calling it. Around this method execution, things can be executed asynchronously on various threads, so we don't rely on threadlocals but we store the current observation in the GraphQL context. I have checked that this behavior is correct.
  • when a controller method is executed, child observations can be started and they will be shown in the resulting traces

In this case, I think that a specific observation is started with the call to the JPA repository and that somehow this child observation is not closed before the GraphQL data fetching observation (your controller call) is ended. I think this observation is stopped later. While this probably won't cause leaks, the trace might look strange.

I suspect this is due to the datasource observation, keeping things opened as long as the JPA session is opened.

@ttddyy
Copy link
Member

ttddyy commented Jun 28, 2024

Thanks for the report.

I ran the code with ObservationTextPublisher, and here are the trimmed outputs for the observation lifecycle:

With spring.jpa.open-in-view=true (default)

START - name='http.server.requests'
 OPEN - name='http.server.requests'
START - name='graphql.request'
START - name='graphql.datafetcher'
 OPEN - name='graphql.datafetcher'
... my.issue.controllers.BooksController     : Trying to retrieve book with ID 'book-1'.
START - name='jdbc.connection'
 OPEN - name='jdbc.connection'
EVENT - CONNECTION_ACQUIRED
START - name='jdbc.query'
 OPEN - name='jdbc.query'
 STOP - name='jdbc.query'
CLOSE - name='jdbc.query'
START - name='jdbc.result-set'
 OPEN - name='jdbc.result-set'
 STOP - name='jdbc.result-set'
CLOSE - name='jdbc.result-set'
EVENT - CONNECTION_COMMIT
... Observation <name=http.server.requests(null),...    <=== WARNING
CLOSE - name='jdbc.connection'
 STOP - name='graphql.datafetcher'
 STOP - name='graphql.request'
 STOP - name='jdbc.connection'
CLOSE - name='jdbc.connection'
CLOSE - name='http.server.requests'
 STOP - name='http.server.requests'

With spring.jpa.open-in-view=false:

START - name='http.server.requests'
 OPEN - name='http.server.requests'
START - name='graphql.request'
START - name='graphql.datafetcher'
 OPEN - name='graphql.datafetcher'
... my.issue.controllers.BooksController     : Trying to retrieve book with ID 'book
START - name='jdbc.connection'
 OPEN - name='jdbc.connection'
EVENT - CONNECTION_ACQUIRED
START - name='jdbc.query'
 OPEN - name='jdbc.query'
 STOP - name='jdbc.query'
CLOSE - name='jdbc.query'
START - name='jdbc.result-set'
 OPEN - name='jdbc.result-set'
 STOP - name='jdbc.result-set'
CLOSE - name='jdbc.result-set'
EVENT - CONNECTION_COMMIT
 STOP - name='jdbc.connection'
CLOSE - name='jdbc.connection'
CLOSE - name='graphql.datafetcher'
 STOP - name='graphql.datafetcher'
 STOP - name='graphql.request'
CLOSE - name='http.server.requests'
 STOP - name='http.server.requests'

With spring.jpa.open-in-view=true (default), it delays closing the entity manager, which means the JDBC connection remains open for lazy-loading entities in the view.

As the output shows, observations start in the order of http.server.requests -> graphql.request -> graphql.datafetcher -> jdbc.connection -> ...
The observations should stop in the reverse order, but due to the open session in view pattern, jdbc.connection stops after graphql.datafetcher and graphql.request.

I think this may be causing the ObservationThreadLocalAccessor to print out the WARN log.
Though, I'm not sure what would be a solution here. At DataSource level observation is started/stopped when JDBC connection is acquired and closed. At spring controller level, an observation is started/closed based on the http request lifecycle. The open-session/entity-manger-in-view pattern kinds of breaking the way they are invoked.

@bclozel I also noticed from the output that the graphql.datafetcher observation is not CLOSED when open-in-view is enabled. Could it be related?

@bclozel
Copy link

bclozel commented Jun 28, 2024

@ttddyy I'm not familiar with the difference between CLOSE and STOP. From an instrumentation perspective, GraphQlObservationInstrumentation is always calling observation.stop() and the context decorator is always wrapping the execution as it should for context propagation.

@codingjourney
Copy link

We also had problems in a small Spring Boot app when we left spring.jpa.open-in-view set to true by mistake. This is what was happening:

  • OpenEntityManagerInViewInterceptor started a global transaction
  • our code started a @NewSpan around a business service method
  • the method was @Transactional so TransactionInterceptor opened a JDBC Connection
  • Datasource Micrometer reacted by starting a connection span with our @NewSpan as parent
  • upon returning from the method, TransactionInterceptor didn't close the Connection (presumably so that other methods may use it)
  • our @NewSpan ended but the connection span was left running
  • OpenEntityManagerInViewInterceptor closed the connection when committing the global transaction
  • Datasource Micrometer reacted by ending the connection span

This left our @NewSpan in an inconsistent state: it was ended but somehow not closed. Micrometer managed to attach other, unrelated spans to it even minutes later (including a subsequent http post ... span) before it was definitely closed and sent out to the OTLP collector. It really messed the traces up.

Given that the connection gets opened deep in the stack and closed several layers up, connection should be a sibling of @NewSpan and not its child. This could be achieved by noting the current span in OpenEntityManagerInViewInterceptor and later starting the connection span within that.

Or you could decide you don't support the Open Session In View (anti-)pattern and include a warning to that effect in your documentation.

@ttddyy
Copy link
Member

ttddyy commented Nov 14, 2024

@codingjourney Thanks for the details.
I haven't had a chance to dive into the details yet.
Assuming it is happening on the same thread, my hunch is that it's caused by the order of ending the spans being different from the the order in which they are opened.
The open-in-view causes the DB connection to close after your business service @NewSpan.
I'm not sure what can be done to prevent the issue, given that the starting and closing order is indeed different.
I'll give this more thought, but it might be best to note that open session/entity-manager in view is an unsupported pattern due to these ordering challenges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants