Lingering performance issues after dealing with a very complex model #1026

okhaliavka · 2023-10-16T19:04:36Z

Hi!

First of all, thanks for all the hard work put into this awesome library.

I'm in the process of migrating a service heavily reliant on pydantic from v1 to v2.
For context, this service deals with document content, utilizing complex, deeply/recursively nested models with heavy use of tagged unions. Models represent various "building blocks" like block, text, image, table, etc. Basically, imagine parsing an HTML with pydantic and you get a feel of what these models look like.

I was particularly excited about the performance improvements promised by Pydantic v2. However, after completing the migration, our benchmarks showed unexpected results.

There's a significant performance degradation (compared to v1) and notable variance in benchmarks that serialize and deserialize synthetic data using various models. After looking closer, I noticed that this issue only occurs when running multiple benchmarks sequentially. When executed individually, the performance is much better and more consistent. I also noticed that it depends on the order of benchmarks. Eventually, I came to the conclusion that degradation starts on the first fairly complex benchmark and lingers, affecting subsequent simple benchmarks. I managed to boil down the repro to this (full gist here):

benchmark_simple_model()
benchmark_complex_model()  # with wide and deep synthetic data
benchmark_simple_model()

The first benchmark performs well, outpacing v1 with minimal variance. However, the second and third benchmarks exhibit much worse performance compared to v1, with significant variance. What's most interesting is that the third benchmark's performance is more than an order of magnitude worse than the first one, even though they exercise the same simple model with the same small/simple data.

Removing the complex_model benchmark removes this issue - both simple_model benchmarks perform consistently and well.

It seems that processing large and complex data with a highly intricate model leaves a lingering negative performance .

I attempted to profile pydantic-core during these benchmarks. The flamegraph revealed that about 80% of the time in the third benchmark is spent in GILPool::drop.

In comparison, the first benchmark only spends 3% of the time in GILPool::drop.

All in all, it feels like the second benchmark leaves behind obscene amounts of garbage, somehow jamming up future deallocations. I also feel like this issue might be related to PyO3/pyo3#1056, this feeling is reinforced by #997 (comment).

At this point, I'm unsure how to proceed and would appreciate your help. Thanks in advance!

The text was updated successfully, but these errors were encountered:

okhaliavka · 2023-10-16T19:08:23Z

FWIW, I tried running the same benchmarks using the pydantic version that runs under mimalloc (pre-#900). The issue is still present but not as severe. Under mimalloc, the third benchmark is only 1.5 times slower than the first one; under the default allocator, it's ~25 times slower.

I'm not sure what exactly this evidence suggests, but thought I'd share it nonetheless.

davidhewitt · 2023-10-16T19:10:08Z

You're probably right that the issue is related to PyO3/pyo3#1056. One thing we've observed is that after the "GILPool" hits a certain threshold, we get a permanent performance regression which is probably related to CPU cache sizes, see PyO3/pyo3#3299

Are you able to share any "minimal" reproductions of this? It would be extremely helpful for future optimization work. (Also, we're actively working on ripping out the GIL pool upstream!)

okhaliavka · 2023-10-16T21:43:16Z

@davidhewitt probably not the most minimal, but here's the gist I ended up with.

Also, we're actively working on ripping out the GIL pool upstream!

Nice! Do you have any rough estimates on how soon that work will land and make its way to pydantic?

davidhewitt · 2023-10-17T08:14:06Z

It's hard to commit to timeline of volunteer contributions however I'd love to see this by the end of the year.

davidhewitt · 2023-10-17T17:53:14Z

Also thanks for sharing the gist! I get the same reproduction on M1 Mac but on my desktop PC there is no slowdown (which again supports the theory this is related to cache sizes or other platform specific pains).

samuelcolvin · 2023-10-17T18:40:24Z

@davidhewitt what about using

unsafe { py.new_pool() };

At the start of every validation?

davidhewitt · 2023-10-17T19:45:29Z

@davidhewitt what about using
unsafe { py.new_pool() };
At the start of every validation?

That won't help; afaik it's the fact that the pool has been permanently expanded which is the problem. PyO3 could add a new global API to shrink the empty capacity of the pool away.

pydantic-hooky bot assigned adriangb Oct 16, 2023

pydantic-hooky bot added the unconfirmed label Oct 16, 2023

samuelcolvin removed the unconfirmed label Oct 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lingering performance issues after dealing with a very complex model #1026

Lingering performance issues after dealing with a very complex model #1026

okhaliavka commented Oct 16, 2023 •

edited

okhaliavka commented Oct 16, 2023 •

edited

davidhewitt commented Oct 16, 2023

okhaliavka commented Oct 16, 2023

davidhewitt commented Oct 17, 2023

davidhewitt commented Oct 17, 2023

samuelcolvin commented Oct 17, 2023

davidhewitt commented Oct 17, 2023

Lingering performance issues after dealing with a very complex model #1026

Lingering performance issues after dealing with a very complex model #1026

Comments

okhaliavka commented Oct 16, 2023 • edited

okhaliavka commented Oct 16, 2023 • edited

davidhewitt commented Oct 16, 2023

okhaliavka commented Oct 16, 2023

davidhewitt commented Oct 17, 2023

davidhewitt commented Oct 17, 2023

samuelcolvin commented Oct 17, 2023

davidhewitt commented Oct 17, 2023

okhaliavka commented Oct 16, 2023 •

edited

okhaliavka commented Oct 16, 2023 •

edited