Skip to content

perf(expressions): improve various expression building bottlenecks#11885

Open
JonAnCla wants to merge 12 commits intoibis-project:mainfrom
JonAnCla:expr-build-perf-improvements
Open

perf(expressions): improve various expression building bottlenecks#11885
JonAnCla wants to merge 12 commits intoibis-project:mainfrom
JonAnCla:expr-build-perf-improvements

Conversation

@JonAnCla
Copy link
Contributor

Description of changes

This is a rewrite of changes in #11691, with further changes made beyond scope of that PR to attack a few other bottlenecks I've found from profiling (bottleneck whack-a-mole).

If you prefer I could split these across multiple PRs, I just thought I'd start with one to show their cumulative effect on performance and start discussion :)

Summary of changes:

  • 7e261a4 refactors "Signature binding of Annotable subclasses" to use an approach similar to python dataclasses (i.e. use eval/exec to create an init function for a given signature) instead of doing the init process in generic python code. My initial implementation tried to use dataclasses to do this, but I then found that ibis Annotable subclasses can include annotations intended to capture *args and **kwargs, which dataclasses do not support. This new implementation relies on using the standard library to render a inspect.Signature to string (which can capture all use cases for ibis class annotations), and then using "return locals()" to return the resulting mapping of arg/kwarg names to passed values (relevant code). So almost no code for dynamic code generation has been added (main stuff comes from inbuilt behaviour in inspect.Signature).
    • this change yields a ~50% reduction in instantiation time for basic subclasses of Annotable which is used for objects throughout the ibis expression IR code (note that the Coerce/Pattern type checking work done for some Annotable subclasses can be much more expensive than building instances of the classes themselves, so this improvement is not uniform across all Annotables - I measured the improvement on a simple 2 arg class with plain python types for type hints)
    • using eval/exec to build a custom init function (instead of trying to use a dataclass) was originally suggested by @kszucs
  • 22c155d Following the above, I noticed that Relation.fields builds a new ordered dict of Field objects on every access to the .fields attribute. Building instances of any object derived ibis.common.grounds.Annotable is i) ~2.5x slower than from a typical python class because AbstractMeta.__call__ is overridden via Annotable.__create__ (and anything that overrides Cls.__new__ or Metaclass.__call__ is ~2.5x slower) and ii) about another 5x slower on top of that because of the type checking work done by Annotable+Coerce+Pattern, so reducing number of repetitive builds of objects is a good way to improve performance in ibis generally.
  • 8ee492a I noticed that another expensive operation is building the Schema of a Relation e.g. following a Project or other operation on a table. When a Schema is instantiated, it re-validates all of the data types in the schema (and this is affected by the performance of Annotable.__create__ on many fields in a Schema), but this is redundant work for Schemas made for Relations as the Relation itself has already validated all of its fields and their datatypes. This commit adds a mechanism to skip validation of the fields passed to a Schema when they are known to be already validated, and uses it on various Relation classes
  • 1523591 Project operations call "rewrite_project_input" on each of their fields to apply a filtered replace operation on them. The same filter is built for every call and because of the Annotable.__create__ pattern building the (same) filter every time results in a lot of redundant overhead. This commit pre-builds the filter at global scope.

I've attached a csv file summarising changes in performance vs a5e7c22 using python 3.12.12 on an Ubuntu-based laptop. Performance improvements for most "wide" table operations are in the 50-90% range. I've since rebased changes onto main and can re-run profiles for each commit/split into separate PRs, also re-run for py3.14 - whatever would help evidence improvements for reviewers. That file was produced by comparing these sets of timings generated from the ibis benchmark suite:
0014_main-a5e7c22.json
0017_with-perf-changes.json

Following these changes, I don't think there much else that's "easy" to change that would provide broad improvements. A key question is whether there is appetite from maintainers for me to submit a later PR to replace the Annotable.__create__ pattern with a plain __init__ approach (but keep the type checking part of Pattern/Coerce).

I'm out of time for today but please let me know how to progress (splitting PRs, more granular performance metrics). Thanks for taking time to read :)

@github-actions github-actions bot added the tests Issues or PRs related to tests label Jan 31, 2026
Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a couple minor suggestions.

Formatting is off too. Running just lint should take of that.

@classmethod
def _create_without_validation(cls, fields: FrozenOrderedDict[str, dt.DataType]) -> Self:
schema = cls.__new__(cls)
if not isinstance(fields, FrozenOrderedDict):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary, or can we do the construction at the various call sites where we aren't sure about the type (or have control over it entirely)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks good thought, I've pushed a change to put the FrozenOrderedDict construction at all call sites

@cpcloud cpcloud force-pushed the expr-build-perf-improvements branch from 1523591 to 6cf7599 Compare February 1, 2026 16:00
@cpcloud
Copy link
Member

cpcloud commented Feb 1, 2026

@JonAnCla I pushed a few commits to address my comments. Let me know about the SignatureBuilder commit and whether you had something else in mind for that for a follow-up. I can revert that commit if you want!

@JonAnCla
Copy link
Contributor Author

JonAnCla commented Feb 2, 2026

Nice thanks for the quick turnaround! I'll take a look through your changes and comments over next day or two

@JonAnCla
Copy link
Contributor Author

JonAnCla commented Feb 3, 2026

Thanks @cpcloud I've pushed a change for the callers of Schema._create_without_validation to use FrozenOrderedDict, and run precommit (I couldn't see "lint" command for just, so hope I've got that right)

LMK if you'd like anything else looked at. Thanks!

@JonAnCla
Copy link
Contributor Author

Hey @cpcloud, just wanted to check whether you need anything else from me on this PR ATM?

Appreciate you are probably working on ibis in batches as time allows so no rush, just trying to make sure there's nothing else I can do before you next take a look. Thanks!

@JonAnCla
Copy link
Contributor Author

Actually @cpcloud I need a bit of guidance from you :)

There is a failing unit test that's similar across many backends due to tables not getting removed from a per-backend cache when expected, see e.g. https://github.com/ibis-project/ibis/actions/runs/21647482695/job/62403168853?pr=11885

I've just pushed an example change to one of those tests that shows that forcing GC before the "object should not be in cache now" check makes the test pass

The reason these tests are breaking is that caching the Schema on a Relation (this change) is causing a reference cycle between Schema and Relation instances, so objects are only removed from cache by explicitly running GC. (I think the cycle is Relation -> Schema -> Field -> Original Relation)

So some options that come to mind are:

  1. remove the schema caching change and proceed with other changes only
  2. use a weakref to cache the schema on a relation, though this may mostly defeat the point of caching (I can run the benchmarks to help verify), and it adds a bit of complexity. If I have time this morning I'll follow up with benchmarks and proof if this actually fixes the tests
  3. accept that this new behaviour will cause reference cycles that will only be removed when GC runs, whenever that happens (probably not acceptable)

Do LMK if you have a preference and maybe you have some other ideas. Many thanks :)

@JonAnCla
Copy link
Contributor Author

It looks like using weakref caching of all of "fields" provides no substantial performance improvement

I think its probably better to remove caching "fields" for now...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tests Issues or PRs related to tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants