Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define inlined, more-dumb versions of Mod_bounds ops #3605

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

glittershark
Copy link
Member

The mod-bounds ops that used polymorphic functions and first-class modules were
not only a little bit over-abstracted for some peoples' taste, they actually had
noticeably worse performance, due to suboptimal inlining behavior (and
allocation!) of the Accent_lattice FCM. This changes them to be dumber and
more repetitive, but more direct, which also gets us a few percent performance
win.

@glittershark glittershark requested a review from liam923 February 19, 2025 19:54
Copy link
Contributor

@liam923 liam923 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed in person, I disagree with the thesis that this code is "better". I think it is harder to read and maintain. As of now, I think it is also harder to reason about due to some of the axes being backwards (but I think we can fix this). However, it is clearly more performant, and these are hot paths. So I think this is worth it.

@glittershark glittershark force-pushed the aspsmith/with-kind-perf branch from 4e73b25 to 8536286 Compare February 19, 2025 20:14
@glittershark glittershark force-pushed the aspsmith/dumber-mod-bounds-ops branch from 8cfbcc5 to ee44e52 Compare February 19, 2025 20:17
Copy link
Collaborator

@goldfirere goldfirere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really love the hybrid approach in this PR: keeping Mod_bounds as an Axis_collection, but then just inlining a few function calls. Would we get these same performance gains with [@inline]? Then we should do that. Or if the goal is to remove the Axis_collection abstraction, then go all the way and do it -- possibly recouping even more perf gains.

@goldfirere
Copy link
Collaborator

@liam923

As we discussed in person, I disagree with the thesis that this code is "better".

I had a long convesation with @lpw25 about this last week -- because he thinks this new simplified code is better, and I disagreed. I continue to find the old more abstract code easier to reason about than the new code, because I find the abstraction of Axis_collection to be conceptually straightforward, even with its parameterization with Indexed. I can learn that abstraction once, and then each use of a function in it is easy to understand in context.

But: this abstraction has been evolving. I know because I spent a bunch of time adding capabilities to it (e.g. Map2 and Fold). And I believe you recently added Indexed and the mono and poly variants. Folks who come across it for the first time have to spend a not-quite-trivial amount of time understanding what's going on and tracing through the meanings of the various pieces. On the other hand, if we had done the simple stuff to begin with, I think the net time spent would be significantly lower, even accounting for several times spending several minutes routinely updating the fields being affected. (Tuning the abstraction is much more fun than repetitively editing field names. We're not optimizing for fun, sadly.)

Until Leo made the point to me about this time cost, I pushed back about which code was better. I continue to think that, if this code were more at rest, Axis_collection makes it better. I think some of this just comes down to personal taste: at one point I used Option.bind because that just makes more sense to me than a direct match, but Leo found the code more confusing. Perhaps if the majority of the team prefers the simpler code, that really does make it better. In any case, though, in this case, I have to agree that going on without Axis_collection is the right call, just because of the time-cost of its maintenance.

(And then, of course, there's the performance aspect!)

@glittershark glittershark force-pushed the aspsmith/with-kind-perf branch from 8536286 to a0af476 Compare February 20, 2025 16:26
Base automatically changed from aspsmith/with-kind-perf to main February 20, 2025 20:24
@glittershark glittershark force-pushed the aspsmith/dumber-mod-bounds-ops branch 2 times, most recently from 7aac3d7 to 90881e1 Compare February 25, 2025 18:45
@glittershark
Copy link
Member Author

I've pushed a (trivial) rebase followed by another commit, 90881e1, which gets rid of Axis_collection entirely, at @goldfirere's suggestion. perf seems basically equivalent. @goldfirere and @liam923 PTAL

The mod-bounds ops that used polymorphic functions and first-class modules were
not only a little bit over-abstracted for some peoples' taste, they actually had
noticeably worse performance, due to suboptimal inlining behavior (and
allocation!) of the `Accent_lattice` FCM. This changes them to be dumber and
more repetitive, but more direct, which also gets us a few percent performance
win.
Let's inhabit one side of this architectural fence entirely, rather than
straddling it.
@glittershark glittershark force-pushed the aspsmith/dumber-mod-bounds-ops branch from 90881e1 to 4002964 Compare February 25, 2025 18:53
@glittershark
Copy link
Member Author

and another rebase, to fix conflicts; the commit to review is now 4002964

create ~locality ~linearity ~uniqueness ~portability ~contention ~yielding
~externality ~nullability

let less_or_equal t1 t2 =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd rather the annoying pattern-match here. Nothing stops this from accidentally forgetting an axis otherwise.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I now see why this is hard. I don't have a great idea here. Maybe the best is to point here from the definition of Mod_bounds saying that any addition needs to be reflected? Ditto equal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized I forgot to mention this in the commit message, but the choice to make these records abstract and only expose getters and setters was intentional, to pave the road for having the more efficient representation (bit fields) next.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also concerned about forgetting new axes here. Here's a not-fun suggestion:

val match_axes : t -> (locality:Locality.t -> linearity:Linearity.t -> ... -> 'a) -> 'a

Hopefully this would get inlined the way I'd hope.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the eliminator approach is probably the nicest choice here, if it does in fact get inlined - but I am wary of doing it until after we do the bitfield refactor, because it's only then that we know if it's introduced noticeable pessimization. Do you mind if we wait until after that change to do this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me, yes

@@ axis_less_or_equal ~le:Nullability.le ~axis:(Pack (Nonmodal Nullability))
(nullability t1) (nullability t2)

let equal t1 t2 =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another good place for a pattern-match

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see also #3605 (comment)

let value_for_axis (type a) ~(axis : a Axis.t) : a =
if Axis_set.mem relevant_axes axis
then
let (module Bound_ops) = Axis.get axis in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do better here? I thought we identified this as an allocate-y operation which might be slow. And it's in a loop here. It's a bit more verbose, but https://github.com/goldfirere/flambda-backend/blob/rae/dismantle-axis-collection/typing/jkind.ml#L1654 has another way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also make value_for_axis take the join operation as an argument.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't end up showing up on a profile, fwiw, for whatever reason. We could make it worse for consistency's sake, but I'm hesitant to?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My profile shows a decent amount of time in Accent_lattice. I think I will try to improve this.

A.meet base_modifier parsed_modifier.txt)
}
let value_for_axis (type a) ~(axis : a Axis.t) : a =
let (module A) = Axis.get axis in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems analogous to the situation in normalize

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

analogously, this doesn't really show up on a profile, so I'm hesitant to make it more verbose without good reason.

This is actually much faster, and has the same effect for lattice reasons
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants