-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Degradation Under Load From Fragments (Blueprint Walk Churns Memory?) #1352
Comments
Hey Jeff, Thanks for the very thorough issue and investigation. For the sake of expediting our discussion of this, would you be able to provide a pair of queries and perhaps a |
Hi @benwilson512 Thanks for the quick response. While I can't share our actual queries publicly (they are so massive it's really difficult to learn anything from them anyway). I can share the repo I've been using to drive the investigation and isolate the issue: https://github.com/jeffutter/absinthe_test_fragment_performance It isn't exactly setup perfectly to illustrate my hypothesis, since that requires forking and instrumenting Absinthe, but hopefully it can give you an idea of what I'm doing. In that repo you can generate one of the schemas I'm using by doing something like this: Here's a trimmed down version with defmodule FragmentBench.Schema.F do
use Absinthe.Schema
interface(:thing) do
field(:id, :id)
resolve_type(fn _, _ -> nil end)
end
object(:a) do
field(:id, :id)
interface(:thing)
end
object(:b) do
field(:id, :id)
interface(:thing)
end
object(:c) do
field(:id, :id)
interface(:thing)
end
object(:d) do
field(:id, :id)
interface(:thing)
end
object(:e) do
field(:id, :id)
interface(:thing)
end
query do
field(:things, list_of(:thing)) do
resolve(fn _, _ -> {:ok, []} end)
end
end
def query_skip_spread do
AbsintheTestFragmentPerformance.SchemaBuilder.query_skip_spread(5, 0)
end
def query_include_spread do
AbsintheTestFragmentPerformance.SchemaBuilder.query_include_spread(5, 0)
end
def query_skip_inline do
AbsintheTestFragmentPerformance.SchemaBuilder.query_skip_inline(5, 0)
end
def query_include_inline do
AbsintheTestFragmentPerformance.SchemaBuilder.query_include_inline(5, 0)
end
end Likewise, you can generate the queries with Here's versions with query ($skip: Boolean!) {
things {
id
__typename
... AFragment @skip(if: $skip)
... BFragment @skip(if: $skip)
... CFragment @skip(if: $skip)
... DFragment @skip(if: $skip)
... EFragment @skip(if: $skip)
}
}
fragment AFragment on A { id __typename }
fragment BFragment on B { id __typename }
fragment CFragment on C { id __typename }
fragment DFragment on D { id __typename }
fragment EFragment on E { id __typename } query ($skip: Boolean!) {
things {
id
__typename
... on A @skip(if: $skip) { id __typename }
... on B @skip(if: $skip) { id __typename }
... on C @skip(if: $skip) { id __typename }
... on D @skip(if: $skip) { id __typename }
... on E @skip(if: $skip) { id __typename }
}
} The hypothesis I laid out can be seen with setups with larger numbers of fragments and flipping between Any kind of benchmarking, or even the profiles all show the "inline" versions as being faster, taking less time and using fewer function calls. Even looking at memory USAGE it looks like "inline" is more efficient, but what I think all of those methods don't show is the garbage generated by executing the document that must be GC'd afterwards. Please let me know if there's a different way you'd like me to present this and I can try to whip something up. |
Incredible, this will be a huge help. I hope to work on this this weekend. Please feel free to ping me Monday if that hasn't happened! |
Summary
My team and I have been struggling to track down a GraphQL related performance 'regression' for the past couple of weeks. We have two versions of a query that are effectively identical, one has more of it's fragments as spread/named fragments one has more of them as in-lined fragments. The text document for both are > 200kb and the Blueprints are very large (> 5MB from
:erts_debug.size
IIRC).The version with more inline fragments has a slightly smaller text size and blueprint size.
Under heavy load the version with the in-lined fragments has 2-5x worse p95 latency and uses ~2x the CPU.
I'm hoping folks can read this explanation and see if my hypothesis makes sense. I'm making some logical jumps that I haven't been able to specifically measure and substantiate yet.
Details & Background
@skip
or@include
directives on the fragmentsThe only difference
The only inkling of a difference I can find between the two queries is this:
I instrumented the application with the Datadog native continuous profiler and compared two load tests (left "Good", right "Bad"). The time spent doing elixir things is nearly identical between the two, but on the "Bad" side we see this big blob of time spent in... garbage collection.
When we look at the comparison table it's even more obvious.
For some reason the "Bad" query is causing more churn in memory and thus the GC is burning more CPU cycles, taking those cycles away from doing actual work and leading to increased CPU and p95 latency.
Hypothesis
Why would Inline fragments vs spread fragments make any difference to memory usage? One wouldn't think they would, especially when the fragments being in-lined were only used once to begin with (so we're not duplicating them inline in different objects).
My hypothesis, while rather unsatisfactory is that something about the overall shape of the blueprint crosses a tipping point in how the BEAM manages memory. Other, larger queries with or without fragments might fall into this same pitfall. I suspect it has something to do with the width or depth of maps or size of lists and how walk traverses and updates them. I suspect that since
walk
maps over the list of children and re-sets it on the map; even if no data is changing, this cause the BEAM to copy the data and GC the old copies.Supporting Findings
In trying to figure out why in-line vs spread fragments would have any different behavior here I 'instrumented' the
run_phase
function ofAbsinthe.Pipeline
as follows:The idea here is to measure memory used by each phase by doing the following:
This tells us two things:
MemB - MemA
tells us how much the memory grew while running that pipeline stepMemB - MemC
tells us how much of that growth is garbageI then ran this against a sample project I create (working on the red tape to publish a public repo for this).
The sample project allows me to on-the-fly build schemas for an interface with N implementers and then creates queries for those with N either Spread/Named or Inline fragments.
Given the same number of fragments I see a difference like this:
You can see the in-line version on the left has much more allocations and churn than the spread version on the right. The spread version seems to allocate nothing for many of the phases.
What gives?
This is the part I'm having difficulty answering. I dug into some of the more unassuming looking phases like
Phase.Document.Arguments.CoerceLists
and see that they should essentially do nothing given my query, yet processing them seems to allocate memory. This lead me to thewalk
implementation that I highlighted under "Hypothesis".The final leap I'm making which I haven't been able to substantiate is that the inline ("Bad") version's blueprint depth or the width are crossing some threshold that causes the reduce over the children and reassignment to the map to copy. I wonder if it's related to how small vs big maps are handled?
I'm not even sure there are any large maps happening here though, it could potentially have something to do with updating the lists of children.
In conclusion
As you can see, the last bit of my theory depends on a fuzzy and partial understanding of how the BEAM manages memory, particularly when updating maps. If my hypothesis is accurate, I would love to discuss how this could be improved. I've pondered this a bit and quite frankly don't have many good ideas. It doesn't seem feasible that Absinthe could avoid walking the blueprint and reducing/mapping the children while it goes.
Are there any other completely different avenues I should approach this from?
Are there any good techniques for gathering more information about this specific issue?
The text was updated successfully, but these errors were encountered: