-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define inlined, more-dumb versions of Mod_bounds ops #3605
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed in person, I disagree with the thesis that this code is "better". I think it is harder to read and maintain. As of now, I think it is also harder to reason about due to some of the axes being backwards (but I think we can fix this). However, it is clearly more performant, and these are hot paths. So I think this is worth it.
4e73b25
to
8536286
Compare
8cfbcc5
to
ee44e52
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really love the hybrid approach in this PR: keeping Mod_bounds
as an Axis_collection
, but then just inlining a few function calls. Would we get these same performance gains with [@inline]
? Then we should do that. Or if the goal is to remove the Axis_collection
abstraction, then go all the way and do it -- possibly recouping even more perf gains.
I had a long convesation with @lpw25 about this last week -- because he thinks this new simplified code is better, and I disagreed. I continue to find the old more abstract code easier to reason about than the new code, because I find the abstraction of But: this abstraction has been evolving. I know because I spent a bunch of time adding capabilities to it (e.g. Until Leo made the point to me about this time cost, I pushed back about which code was better. I continue to think that, if this code were more at rest, (And then, of course, there's the performance aspect!) |
8536286
to
a0af476
Compare
7aac3d7
to
90881e1
Compare
I've pushed a (trivial) rebase followed by another commit, 90881e1, which gets rid of |
The mod-bounds ops that used polymorphic functions and first-class modules were not only a little bit over-abstracted for some peoples' taste, they actually had noticeably worse performance, due to suboptimal inlining behavior (and allocation!) of the `Accent_lattice` FCM. This changes them to be dumber and more repetitive, but more direct, which also gets us a few percent performance win.
Let's inhabit one side of this architectural fence entirely, rather than straddling it.
90881e1
to
4002964
Compare
and another rebase, to fix conflicts; the commit to review is now 4002964 |
create ~locality ~linearity ~uniqueness ~portability ~contention ~yielding | ||
~externality ~nullability | ||
|
||
let less_or_equal t1 t2 = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd rather the annoying pattern-match here. Nothing stops this from accidentally forgetting an axis otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... I now see why this is hard. I don't have a great idea here. Maybe the best is to point here from the definition of Mod_bounds
saying that any addition needs to be reflected? Ditto equal
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized I forgot to mention this in the commit message, but the choice to make these records abstract and only expose getters and setters was intentional, to pave the road for having the more efficient representation (bit fields) next.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also concerned about forgetting new axes here. Here's a not-fun suggestion:
val match_axes : t -> (locality:Locality.t -> linearity:Linearity.t -> ... -> 'a) -> 'a
Hopefully this would get inlined the way I'd hope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the eliminator approach is probably the nicest choice here, if it does in fact get inlined - but I am wary of doing it until after we do the bitfield refactor, because it's only then that we know if it's introduced noticeable pessimization. Do you mind if we wait until after that change to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me, yes
@@ axis_less_or_equal ~le:Nullability.le ~axis:(Pack (Nonmodal Nullability)) | ||
(nullability t1) (nullability t2) | ||
|
||
let equal t1 t2 = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another good place for a pattern-match
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see also #3605 (comment)
let value_for_axis (type a) ~(axis : a Axis.t) : a = | ||
if Axis_set.mem relevant_axes axis | ||
then | ||
let (module Bound_ops) = Axis.get axis in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do better here? I thought we identified this as an allocate-y operation which might be slow. And it's in a loop here. It's a bit more verbose, but https://github.com/goldfirere/flambda-backend/blob/rae/dismantle-axis-collection/typing/jkind.ml#L1654 has another way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also make value_for_axis
take the join operation as an argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't end up showing up on a profile, fwiw, for whatever reason. We could make it worse for consistency's sake, but I'm hesitant to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My profile shows a decent amount of time in Accent_lattice
. I think I will try to improve this.
A.meet base_modifier parsed_modifier.txt) | ||
} | ||
let value_for_axis (type a) ~(axis : a Axis.t) : a = | ||
let (module A) = Axis.get axis in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems analogous to the situation in normalize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
analogously, this doesn't really show up on a profile, so I'm hesitant to make it more verbose without good reason.
This is actually much faster, and has the same effect for lattice reasons
The mod-bounds ops that used polymorphic functions and first-class modules were
not only a little bit over-abstracted for some peoples' taste, they actually had
noticeably worse performance, due to suboptimal inlining behavior (and
allocation!) of the
Accent_lattice
FCM. This changes them to be dumber andmore repetitive, but more direct, which also gets us a few percent performance
win.