Replies: 3 comments 3 replies
-
I think we should seriously consider making the allowed definitions much more restrictive. For example, I think that the following rules would account for all useful cases of this I have seen:
These rules have the benefit that they are easy to compile, don't need to mutate any fields that aren't already mutable, and they mean that immutable recursive types are actually least fixed points. Now maybe these are too strict and they miss some important case, but the benefits of such a simplification are pretty large, so I think we should seriously consider something in this vein. I'd be happy to commit some Jane Street time to trying out specifications on our code base to see how much code they break. |
Beta Was this translation helpful? Give feedback.
-
Early opinions:
|
Beta Was this translation helpful? Give feedback.
-
I remembered a small case that is slightly problematic: let glob = ref (ref None)
let escape r = glob := r
let my_ref =
let r = ref None in
escape r;
r
let () = assert (r == !glob) The assertion holds (in a single-domain world).
The issue is mostly independent of my current work in #12596, so I don't plan to address it there. |
Beta Was this translation helpful? Give feedback.
-
Introduction
This is a discussion about the (present and) future of compilation of recursive bindings, in particular the ones involving non-functional values. It is written by me, @lthls, on behalf of a small team at OCamlPro that has been dealing with this issue for a while in the context of the Flambda project.
This is going to be a fairly long post, which should be of interest only to a small number of people. We do discuss removing support for a few features; if you rely on recursive bindings that are not obviously functions, you might want to check the proposal section and see if you are impacted by any of the proposed changes, or read some sections to understand how the compiler deals with these bindings. A binding like
let rec f x = ...
orlet rec f = fun x -> ...
is obviously a function, butlet rec f = let local_def = ... in fun x -> ...
is not and will go through the complicated algorithm described here.Current state
Currently, in addition to the normal typing rules, recursive bindings are checked for a property that looks like well-foundedness during type-checking. This rejects meaningless definitions like
let rec x = x
, but also definitions that might make sense in a lazy language but not in OCaml, such aslet rec x = x + 1
.The objective is to ensure that all definitions that pass this check can be compiled using a simple scheme that pre-allocates values, computes the final values using the pre-allocated placeholders for the recursive variables, and then patch the pre-allocated slots with the contents of the final values.
Details of the checking algorithm
The algorithm has two main passes. The first one computes, for each binding, whether it can be pre-allocated (
Static
) or not (Dynamic
). We will call this the binding's classification. The second one computes a sort of dependency graph, which associates to each binding how it depends on the other bindings. These dependencies are called modes.Classification
Some obvious cases: functions can be pre-allocated (more on that later), blocks with a statically known size can be allocated (tuples, records, non-constant variants, lazy blocks, ...). These are classified as
Static
. Function applications are classified asDynamic
(even when the type only allows blocks of a single size), and so are branching constructions (if cond then expr1 else expr2
) and fields accesses (expr1.label
).Some less obvious, but reasonable cases:
let x = expr1 in expr2
can be pre-allocated ifexpr2
can be pre-allocated, in an environment wherex
has been classified corresponding to its definitionexpr1
.So
let x = (a, b) in x
can be pre-allocated (classified asStatic
), butlet x = a + b in x
cannot. Butlet x = a + b in (x, x)
can be pre-allocated.Sequences are treated as let-bindings without bound variables, so for example
print_endline "Hello"; (a, b)
is classified asStatic
.Finally, the current implementation classifies as
Static
a number of constructions that cannot actually be pre-allocated. The common theme is that these constructions have to return a value that trivially does not depend on any other previous value, recursive or not. This includes constants (numbers, strings, constant variant constructors, and also blocks with constant contents), loops and assignments (which always return()
), and a few other special cases.And for some reason, anything involving objects is also classified as
Static
(fortunately, the later part of the check will actually reject these definitions if they are actually recursive).Modes
The mode computation is rather well documented, both in the implementation and in an associated paper: A Practical Mode System for Recursive Definitions, by Alban Reynaud, Gabriel Scherer and Jeremy Yallop (that is unfortunately not linked to in the source file, but can be found from the authors' websites).
To sum up, each binding will depend on the recursive variables either not at all (
Ignore
in the paper and implementation), or safely for recursive definitions (Guard
orDelay
), or unsafely (Return
orDeference
).We do not propose any changes to the modes and their computation, so we will not describe this in detail.
Final result
The final result combines both analyses: if the binding is classified as
Dynamic
, it is rejected unless it does not depend on the recursive variables at all. If it is classified as static, it is accepted only if it does not depend unsafely on any recursive variable.The actual compilation algorithm
The compilation algorithm (which is duplicated in the bytecode and native compilers, but behaves the same way) does not get any input from the type-checking phase at the moment (except that the binding did, in one of its earlier forms, pass the check). So it first re-computes a classification, then uses it to generate non-recursive code based on runtime pre-allocation and patching primitives.
Special case for functions
A special case is made for recursive bindings that are only syntactic functions. Although those bindings trivially pass the check and could be compiled using the normal pass, it is much better to use the various backends' support for recursive functions to generate the corresponding terms directly. In practice this generates a single closure block, with additional functions stored using infix headers. This is the least problematic case, so we will not discuss it further.
Extended classification
At this point the classification will be used to determine how to pre-allocate values, so it is stricter than the type-checking version.
For functions, the compiler at this point know the exact representation of closures so it will compute the exact size. For blocks, the decision is the same as the type-checking phase except that the size is actually used.
Let-bindings are handled using environments like the type-checking version. Function calls are still considered dynamic (
RHS_nonrec
in the implementations), but branches may have to be handled more carefully (see #12059) because the intermediate compilation passes might have introduced branching..Finally, the constant (and object-related) terms are now considered dynamic, conflicting with the earlier decision. This will introduce a few hacks in the next phase to deal with these inconsistencies.
Compilation scheme
The bindings are now split into two sets: static bindings, with their pre-allocation scheme, and dynamic bindings. Note that like in the typing check, bindings may be static even if they're not actually recursive.
The compilation algorithm then generate three code sequences:
Dynamic
earlier, this could have been done even before the pre-allocations, but for the other ones all of the bound variables are required to be in scope. The dummy constants are guaranteed not to escape thanks to both the mode check that prevents the recursive variables from escaping, even in theStatic
case, and the fact that all expressions in this case are supposed to evaluate to constants, not containing any reference to the recursive values in their result. The new bindings shadow the previous ones (the only deliberate occurrence of duplicate bindings in the compiler that I'm aware of).let rec s = let t = ref 0 in escape t; t
will produce two different reference cells, one bound tos
and one escaped through theescape
function, that can be updated independently.Issues with this approach
There are several issues with the current approach. The main ones can be described thus:
let rec f = let () = () in fun x -> f x
is rejected, butlet rec f = let _ = () in fun x -> f x
is accepted).The first issue is particularly tricky to deal with, as any fix that rejects previously accepted programs could, in theory, break user code. So we have been investigating both breaking solutions (restricting the set of accepted bindings) and non-breaking solutions (adding complex logic to deal correctly with all the accepted cases).
The future: possible modifications being considered
Propagating information from type-checking to compilation
The classification and modes could be stored somewhere and propagated to the compilation pass, to avoid cases where bindings are allowed to refer to recursive variables but cannot be actually pre-allocated.
The compilation pass would still need to explore the bindings to compute the actual sizes (mostly for functions), but then that could be checked against the propagated classification for consistency.
The modes could be used to switch some bindings classified as
Static
toDynamic
when they don't actually depend on any recursive variable, as the generated code is better this way.The main issue is how to handle constants, with two different solutions being considered.
Restrict the current check for constants
This is straight-forward: simply classify constants as
Dynamic
. The only side-effect is that some programs become rejected; in most cases it's harmless (who cares aboutlet rec x = while true do x done
?), but it's possible that someone has a small hack somewhere that happens to rely on that.Introduce a new classification
Constant
We can introduce a
Constant
classification, which can then be used to make the current implementation clearer (constants would be type-checked the same asStatic
bindings but compiled asDynamic
bindings, similar to what is done currently).Even better, we could actually store the constant value in the classification, so that these constants would be initialized at their correct value from the start and their definition would only be evaluated for its side-effects.
By pushing the evaluation part after all pre-allocation and evaluation of constant/dynamic bindings has finished, we could even remove the duplicate binding hack.
Moving the compilation pass earlier
The distance between the type-checking pass and the compilation pass makes it possible that the terms that have been checked and those that are compiled differ significantly. We would like to solve this issue by moving the compilation pass earlier, either on the typedtree or during the translation to Lambda (the latter being the preferred option).
The main issue is that this early in the compiler, functions cannot be pre-allocated as their size is not yet fixed (closure representations can differ in bytecode and native, and the set of free variables may change due to some optimizations). So far we've thought of two different possibilities to solve this issue.
Eta-expansion of bindings classified as functions
We can extend the classification to have a dedicated cases for functions, and when compiling a function which is not a syntactic function, we would first eta-expand it to get back a syntactic function. For example,
let rec f = let x = 1 in fun y -> f (x + y)
would becomelet rec f param = (let x = 1 in fun y -> f (x + y)) param
. These bindings would then be evaluated after all static bindings have been pre-allocated and all dynamic/constant bindings have been computed, but before the static bindings are evaluated.The main drawback is that this could silently break existing code. For instance,
let rec f = init (); fun x -> <...>
would be compiled to a function that callsinit
each time (including for recursive calls). So it is only considered as an opt-in solution (with the default being either a compilation error or one of the more involved solutions described below).Dissection of bindings
This idea consists of exploring the bindings in depth and split out everything that is not a function into either non-recursive bindings, pre-allocated blocks, or effect-only expressions. It has been implemented for a while as a pre-processing pass for Flambda 2, and has been proposed for upstreaming in #8956. However, the current implementation has known issues (for example ocaml-flambda/flambda-backend#1814) and is rather complex.
We hope to be able to find a better solution (less complex, benefiting from propagation of type-checking information), solving the more tricky issues by using an additional layer of indirection. This is still in its very early stages though (no prototype yet).
Roadmap and integration plans
Our current plan is to try to fit some of the simpler parts into the next release (5.2). For instance, the propagation of the information from the type-checker to the compilation pass seems important, and relatively easy to review. The addition of the constant case should also fit well within that time frame, either in combination with the propagation or separately.
The handling of functions will likely take a bit more time. The eta-expansion solution is not very realistic from our point of view, so we will likely not spend time on it unless other developers express their interest in it. We hope however to have a proper dissection pass ready in a few months at most, and from there focus on reviewing and discuss how to integrate it in the compiler (separate pass on lambda, or integrate with the translation from the typedtree).
We hope to be able to integrate that somewhere before the release of 5.3, although if everything goes smoothly it might actually land before the 5.2 freeze.
Acknowledgements
The main people working on this particular project have been myself and @chambart, as part of the Flambda project. We would also like to thank the people behind the current recursion check algorithm, for giving us a model to start from, @AltGr for submitting the initial dissection PR, @aedianys for digging into this issue and extracting incremental solutions, and @gasche for taking the time to look at our proposed solutions.
Beta Was this translation helpful? Give feedback.
All reactions