perf(expressions): improve various expression building bottlenecks#11885
perf(expressions): improve various expression building bottlenecks#11885JonAnCla wants to merge 12 commits intoibis-project:mainfrom
Conversation
cpcloud
left a comment
There was a problem hiding this comment.
LGTM, just a couple minor suggestions.
Formatting is off too. Running just lint should take of that.
ibis/expr/schema.py
Outdated
| @classmethod | ||
| def _create_without_validation(cls, fields: FrozenOrderedDict[str, dt.DataType]) -> Self: | ||
| schema = cls.__new__(cls) | ||
| if not isinstance(fields, FrozenOrderedDict): |
There was a problem hiding this comment.
Is this necessary, or can we do the construction at the various call sites where we aren't sure about the type (or have control over it entirely)?
There was a problem hiding this comment.
Thanks good thought, I've pushed a change to put the FrozenOrderedDict construction at all call sites
…own to already be validated
1523591 to
6cf7599
Compare
|
@JonAnCla I pushed a few commits to address my comments. Let me know about the |
|
Nice thanks for the quick turnaround! I'll take a look through your changes and comments over next day or two |
…their own FrozenOrderedDict to be passed
|
Thanks @cpcloud I've pushed a change for the callers of Schema._create_without_validation to use FrozenOrderedDict, and run precommit (I couldn't see "lint" command for just, so hope I've got that right) LMK if you'd like anything else looked at. Thanks! |
|
Hey @cpcloud, just wanted to check whether you need anything else from me on this PR ATM? Appreciate you are probably working on ibis in batches as time allows so no rush, just trying to make sure there's nothing else I can do before you next take a look. Thanks! |
|
Actually @cpcloud I need a bit of guidance from you :) There is a failing unit test that's similar across many backends due to tables not getting removed from a per-backend cache when expected, see e.g. https://github.com/ibis-project/ibis/actions/runs/21647482695/job/62403168853?pr=11885 I've just pushed an example change to one of those tests that shows that forcing GC before the "object should not be in cache now" check makes the test pass The reason these tests are breaking is that caching the Schema on a Relation (this change) is causing a reference cycle between Schema and Relation instances, so objects are only removed from cache by explicitly running GC. (I think the cycle is Relation -> Schema -> Field -> Original Relation) So some options that come to mind are:
Do LMK if you have a preference and maybe you have some other ideas. Many thanks :) |
|
It looks like using weakref caching of all of "fields" provides no substantial performance improvement I think its probably better to remove caching "fields" for now... |
Description of changes
This is a rewrite of changes in #11691, with further changes made beyond scope of that PR to attack a few other bottlenecks I've found from profiling (bottleneck whack-a-mole).
If you prefer I could split these across multiple PRs, I just thought I'd start with one to show their cumulative effect on performance and start discussion :)
Summary of changes:
inspect.Signatureto string (which can capture all use cases for ibis class annotations), and then using "return locals()" to return the resulting mapping of arg/kwarg names to passed values (relevant code). So almost no code for dynamic code generation has been added (main stuff comes from inbuilt behaviour in inspect.Signature).Relation.fieldsbuilds a new ordered dict of Field objects on every access to the .fields attribute. Building instances of any object derivedibis.common.grounds.Annotableis i) ~2.5x slower than from a typical python class becauseAbstractMeta.__call__is overridden viaAnnotable.__create__(and anything that overridesCls.__new__orMetaclass.__call__is ~2.5x slower) and ii) about another 5x slower on top of that because of the type checking work done by Annotable+Coerce+Pattern, so reducing number of repetitive builds of objects is a good way to improve performance in ibis generally.Annotable.__create__on many fields in a Schema), but this is redundant work for Schemas made for Relations as the Relation itself has already validated all of its fields and their datatypes. This commit adds a mechanism to skip validation of the fields passed to a Schema when they are known to be already validated, and uses it on various Relation classesAnnotable.__create__pattern building the (same) filter every time results in a lot of redundant overhead. This commit pre-builds the filter at global scope.I've attached a csv file summarising changes in performance vs a5e7c22 using python 3.12.12 on an Ubuntu-based laptop. Performance improvements for most "wide" table operations are in the 50-90% range. I've since rebased changes onto main and can re-run profiles for each commit/split into separate PRs, also re-run for py3.14 - whatever would help evidence improvements for reviewers. That file was produced by comparing these sets of timings generated from the ibis benchmark suite:
0014_main-a5e7c22.json
0017_with-perf-changes.json
Following these changes, I don't think there much else that's "easy" to change that would provide broad improvements. A key question is whether there is appetite from maintainers for me to submit a later PR to replace the
Annotable.__create__pattern with a plain__init__approach (but keep the type checking part of Pattern/Coerce).I'm out of time for today but please let me know how to progress (splitting PRs, more granular performance metrics). Thanks for taking time to read :)