perf(expressions): improve various expression building bottlenecks by JonAnCla · Pull Request #11885 · ibis-project/ibis

JonAnCla · 2026-01-31T10:50:10Z

Description of changes

This is a rewrite of changes in #11691, with further changes made beyond scope of that PR to attack a few other bottlenecks I've found from profiling (bottleneck whack-a-mole).

If you prefer I could split these across multiple PRs, I just thought I'd start with one to show their cumulative effect on performance and start discussion :)

Summary of changes:

7e261a4 refactors "Signature binding of Annotable subclasses" to use an approach similar to python dataclasses (i.e. use eval/exec to create an init function for a given signature) instead of doing the init process in generic python code. My initial implementation tried to use dataclasses to do this, but I then found that ibis Annotable subclasses can include annotations intended to capture *args and **kwargs, which dataclasses do not support. This new implementation relies on using the standard library to render a inspect.Signature to string (which can capture all use cases for ibis class annotations), and then using "return locals()" to return the resulting mapping of arg/kwarg names to passed values (relevant code). So almost no code for dynamic code generation has been added (main stuff comes from inbuilt behaviour in inspect.Signature).
- this change yields a ~50% reduction in instantiation time for basic subclasses of Annotable which is used for objects throughout the ibis expression IR code (note that the Coerce/Pattern type checking work done for some Annotable subclasses can be much more expensive than building instances of the classes themselves, so this improvement is not uniform across all Annotables - I measured the improvement on a simple 2 arg class with plain python types for type hints)
- using eval/exec to build a custom init function (instead of trying to use a dataclass) was originally suggested by @kszucs
22c155d Following the above, I noticed that Relation.fields builds a new ordered dict of Field objects on every access to the .fields attribute. Building instances of any object derived ibis.common.grounds.Annotable is i) ~2.5x slower than from a typical python class because AbstractMeta.__call__ is overridden via Annotable.__create__ (and anything that overrides Cls.__new__ or Metaclass.__call__ is ~2.5x slower) and ii) about another 5x slower on top of that because of the type checking work done by Annotable+Coerce+Pattern, so reducing number of repetitive builds of objects is a good way to improve performance in ibis generally.
8ee492a I noticed that another expensive operation is building the Schema of a Relation e.g. following a Project or other operation on a table. When a Schema is instantiated, it re-validates all of the data types in the schema (and this is affected by the performance of Annotable.__create__ on many fields in a Schema), but this is redundant work for Schemas made for Relations as the Relation itself has already validated all of its fields and their datatypes. This commit adds a mechanism to skip validation of the fields passed to a Schema when they are known to be already validated, and uses it on various Relation classes
1523591 Project operations call "rewrite_project_input" on each of their fields to apply a filtered replace operation on them. The same filter is built for every call and because of the Annotable.__create__ pattern building the (same) filter every time results in a lot of redundant overhead. This commit pre-builds the filter at global scope.

I've attached a csv file summarising changes in performance vs a5e7c22 using python 3.12.12 on an Ubuntu-based laptop. Performance improvements for most "wide" table operations are in the 50-90% range. I've since rebased changes onto main and can re-run profiles for each commit/split into separate PRs, also re-run for py3.14 - whatever would help evidence improvements for reviewers. That file was produced by comparing these sets of timings generated from the ibis benchmark suite:
0014_main-a5e7c22.json
0017_with-perf-changes.json

Following these changes, I don't think there much else that's "easy" to change that would provide broad improvements. A key question is whether there is appetite from maintainers for me to submit a later PR to replace the Annotable.__create__ pattern with a plain __init__ approach (but keep the type checking part of Pattern/Coerce).

I'm out of time for today but please let me know how to progress (splitting PRs, more granular performance metrics). Thanks for taking time to read :)

cpcloud

LGTM, just a couple minor suggestions.

Formatting is off too. Running just lint should take of that.

ibis/common/annotations.py

ibis/expr/operations/relations.py

cpcloud · 2026-02-01T15:52:06Z

ibis/expr/schema.py

+    @classmethod
+    def _create_without_validation(cls, fields: FrozenOrderedDict[str, dt.DataType]) -> Self:
+        schema = cls.__new__(cls)
+        if not isinstance(fields, FrozenOrderedDict):


Is this necessary, or can we do the construction at the various call sites where we aren't sure about the type (or have control over it entirely)?

Thanks good thought, I've pushed a change to put the FrozenOrderedDict construction at all call sites

…gested by kszucs

…own to already be validated

cpcloud · 2026-02-01T16:06:00Z

@JonAnCla I pushed a few commits to address my comments. Let me know about the SignatureBuilder commit and whether you had something else in mind for that for a follow-up. I can revert that commit if you want!

JonAnCla · 2026-02-02T08:42:44Z

Nice thanks for the quick turnaround! I'll take a look through your changes and comments over next day or two

…their own FrozenOrderedDict to be passed

JonAnCla · 2026-02-03T20:58:01Z

Thanks @cpcloud I've pushed a change for the callers of Schema._create_without_validation to use FrozenOrderedDict, and run precommit (I couldn't see "lint" command for just, so hope I've got that right)

LMK if you'd like anything else looked at. Thanks!

JonAnCla · 2026-02-23T15:29:44Z

Hey @cpcloud, just wanted to check whether you need anything else from me on this PR ATM?

Appreciate you are probably working on ibis in batches as time allows so no rush, just trying to make sure there's nothing else I can do before you next take a look. Thanks!

JonAnCla · 2026-02-28T09:54:57Z

Actually @cpcloud I need a bit of guidance from you :)

There is a failing unit test that's similar across many backends due to tables not getting removed from a per-backend cache when expected, see e.g. https://github.com/ibis-project/ibis/actions/runs/21647482695/job/62403168853?pr=11885

I've just pushed an example change to one of those tests that shows that forcing GC before the "object should not be in cache now" check makes the test pass

The reason these tests are breaking is that caching the Schema on a Relation (this change) is causing a reference cycle between Schema and Relation instances, so objects are only removed from cache by explicitly running GC. (I think the cycle is Relation -> Schema -> Field -> Original Relation)

So some options that come to mind are:

remove the schema caching change and proceed with other changes only
use a weakref to cache the schema on a relation, though this may mostly defeat the point of caching (I can run the benchmarks to help verify), and it adds a bit of complexity. If I have time this morning I'll follow up with benchmarks and proof if this actually fixes the tests
accept that this new behaviour will cause reference cycles that will only be removed when GC runs, whenever that happens (probably not acceptable)

Do LMK if you have a preference and maybe you have some other ideas. Many thanks :)

JonAnCla · 2026-02-28T11:00:03Z

It looks like using weakref caching of all of "fields" provides no substantial performance improvement

I think its probably better to remove caching "fields" for now...

github-actions bot added the tests Issues or PRs related to tests label Jan 31, 2026

cpcloud reviewed Feb 1, 2026

View reviewed changes

JonAnCla and others added 7 commits February 1, 2026 10:53

- changed implementation to use custom generated bind function as sug…

d750bcd

…gested by kszucs

- cache relation fields on demand to avoid repeatedly building it

a928ca0

- initialise Schemas without validating in places where dtypes are kn…

007936e

…own to already be validated

- precalculate some filters used in rewrites for better performance

91dd8e9

style: fix lints

146b9c5

perf: add slots to ReprableVariableName

3b96b37

perf: reduce getattr overhead

6cf7599

cpcloud force-pushed the expr-build-perf-improvements branch from 1523591 to 6cf7599 Compare February 1, 2026 16:00

cpcloud added 2 commits February 1, 2026 11:04

chore: regen error message

7c62886

refactor: remove SignatureBuilder

5b8d333

JonAnCla added 2 commits February 3, 2026 20:51

- refactor callers of Schema._create_without_validation to construct …

29c9aa0

…their own FrozenOrderedDict to be passed

- precommit

87428f0

JonAnCla mentioned this pull request Feb 24, 2026

perf: swap Signature.bind for internal dataclass within Annotable #11691

Closed

- example showing that running GC does release entries from cache

f342672

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(expressions): improve various expression building bottlenecks#11885

perf(expressions): improve various expression building bottlenecks#11885
JonAnCla wants to merge 12 commits intoibis-project:mainfrom
JonAnCla:expr-build-perf-improvements

JonAnCla commented Jan 31, 2026

Uh oh!

cpcloud left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cpcloud Feb 1, 2026

Uh oh!

JonAnCla Feb 3, 2026

Uh oh!

cpcloud commented Feb 1, 2026

Uh oh!

JonAnCla commented Feb 2, 2026

Uh oh!

JonAnCla commented Feb 3, 2026

Uh oh!

JonAnCla commented Feb 23, 2026

Uh oh!

JonAnCla commented Feb 28, 2026

Uh oh!

JonAnCla commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JonAnCla commented Jan 31, 2026

Description of changes

Uh oh!

cpcloud left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cpcloud Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

JonAnCla Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

cpcloud commented Feb 1, 2026

Uh oh!

JonAnCla commented Feb 2, 2026

Uh oh!

JonAnCla commented Feb 3, 2026

Uh oh!

JonAnCla commented Feb 23, 2026

Uh oh!

JonAnCla commented Feb 28, 2026

Uh oh!

JonAnCla commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants