-
Notifications
You must be signed in to change notification settings - Fork 34
Reduction #105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
Janno
wants to merge
7
commits into
rocq-prover:master
Choose a base branch
from
bluerock-io:janno/reduction
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Reduction #105
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
6be81c7
WIP reduction document
8c7a31a
More colors
ea3fe19
Update text/XXX-reduction.md
Janno efe5a75
Update text/XXX-reduction.md
Janno 8a78f6a
Update text/XXX-reduction.md
Janno 6650d21
Update text/XXX-reduction.md
Janno 7daa1f3
Update text/XXX-reduction.md
Janno File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,226 @@ | ||
# Reduction Strategies | ||
|
||
## Summary Table | ||
|
||
| Strategy | Performance | Normalization | Arguments | Reduction Flags | Refolding | Effect on Qed | | ||
|---------------|----------------|---------------|---------------------------|--------------------------|------------------|----------------------------------------------------------------------------------------| | ||
| vm_compute | ✅✅✅ | Full | ❌ | ❌ | ❌ | ✅✅✅ Bespoke Cast honored by Qed | | ||
| lazy | ✅✅ | Full / Head | ❌ | ✅✅ | ❌ | ✅✅ Default cast does not store reduction flags | | ||
| cbv,compute | ✅ | Full / Head | ❌ | ❌ | ❌ | ✅ Default cast coincides but will use kernel reduction/conversion instead of cbv | | ||
| hnf | ❔ | Head | ❔ | ❌ | ✅ | ❔ | | ||
| simpl | ⛔⛔ | Full / Head | ✅/⛔ | ⛔ (only head) | ✅ | ⛔ Uses default cast | | ||
| cbn | ⛔⛔⛔ | Full / Head | ✅✅ | ✅✅ | ✅✅ | ⛔ Uses default cast | | ||
| **Proposals** | | ||
| [kred](#kred) | ✅ | Full / Head | ✅✅ | ✅✅ | ✅ | ⛔ Uses default cast | | ||
| [MSP](#msp) | ✅✅ (= lazy) | Any (quoting) | ❌ (not necessary) | ❌ (not necessary) | ❌ | ✅✅✅ Fully integrated into kernel reduction/conversion | | ||
|
||
## Details | ||
|
||
* `vm_compute` | ||
* Fully normalizes Rocq terms | ||
* Translates Rocq terms to OCaml bytecode for efficiency | ||
* Generally faster than all strategies listed below | ||
* Toplevel definitions have their weakly reduced values cached | ||
* Translation does have a cost that is offset by precompiling toplevel definitions | ||
* Does not support `Arguments` | ||
* Does not support any reduction flags (not even `head`) | ||
* Does not support fixpoint refolding | ||
* Has a dedicated cast AST node that is honored by `Qed` | ||
* `lazy` | ||
* Kernel reduction used in typechecking / conversion | ||
* The most efficient Rocq reduction engine written in OCaml | ||
* Full normalization by default | ||
* Does not support `Arguments` | ||
* Supports all reduction flags | ||
* Does not support fixpoint refolding ([\[Experiment\] Hack a global fixpoint expansion in conversion. by ppedrot · Pull Request #18672 · coq/coq](https://github.com/coq/coq/pull/18672) ) | ||
* Largely coincides with the default cast except for reduction flags that | ||
will not be recorded in the cast. | ||
* `cbv`, `compute` | ||
* ??? | ||
* Eager reduction | ||
* Does not support `Arguments` | ||
* Supports all reduction flags | ||
* Does not support fixpoint refolding | ||
* Default cast will coincide in terms of the result but will use kernel | ||
reduction/conversion to get there. | ||
* `hnf` | ||
* Head normal form | ||
* Does not support Arguments (❔) | ||
* Does not support reduction flags (❔) | ||
* Supports fixpoint refolding | ||
* `simpl` | ||
* Rocq’s premier simplification engine | ||
* Does not unfold aliases unless they uncover reduction steps | ||
* Performance mixed | ||
* Less efficient than `lazy` across the board but usually better than `cbn` | ||
* Issues with primitive projections ([Primitive projections introduce slowdown in cbn and simpl · Issue #15720 · coq/coq](https://github.com/coq/coq/issues/15720) ) | ||
* Issues with nested fixpoints ([simpl is exponential in the number of nested fixpoints · Issue #13772 · coq/coq](https://github.com/coq/coq/issues/13772) ) | ||
* Issues without fixpoints or primitive projections ([!simpl scales exponentially even without nested fixpoints · Issue #16397 · coq/coq](https://github.com/coq/coq/issues/16397) ) | ||
* Partial support for `Arguments` | ||
* `simpl` will ignore `Arguments : simpl never` in `match <HERE> with` | ||
* Only supports `head` reduction flag | ||
* Supports fixpoint refolding | ||
* No dedicated cast, relies on default cast and thus kernel | ||
reduction/conversion at `Qed` time which could lead to performance | ||
issues | ||
* `cbn` | ||
* An alternative to `simpl` | ||
* Eagerly reduces aliases even when they uncover no reduction steps | ||
* Performance is bad | ||
* Slower than any other reduction mechanism often by orders of magnitude | ||
* Similar issues as with `simpl`, just worse | ||
* Full support for `Arguments` | ||
* Supports fixpoint refolding | ||
* Also refolds aliases (❔) | ||
* No dedicated cast, relies on default cast and thus kernel | ||
reduction/conversion at `Qed` time which could lead to performance | ||
issues | ||
|
||
|
||
# Problematic Reduction Examples | ||
|
||
## <a id="reif"></a> Reification with User Terms | ||
Iris Proof Mode and BlueRock’s automation use reflection and computation to | ||
perform changes to the shallowly embedded iris goal. | ||
|
||
### Requirements | ||
|
||
* Must only reduce computation introduced by the automation tactics | ||
* Can be addressed with whitelist reduction flags, **and** | ||
* by creating fresh copies of all functions used in the tactics. | ||
* Both solutions are incredibly tedious and error-prone | ||
* Needs a reasonably performant strategy (lazy or better) | ||
* Reduction should ideally be captured by a bespoke cast so that it can be | ||
replayed or coincide with kernel reduction for Qed performance | ||
|
||
## <a id="rep"></a> Avoiding Term Repetition in Proof Terms | ||
|
||
BlueRock's automation avoids term repetition in the proof term by accumulating | ||
intermediate results inside the context of an ever-growing function type. The | ||
arguments of the function are newly injected terms that cannot be computed from | ||
existing values that have been passed to the function. | ||
|
||
This approach works wonders for term size but Qed performance depends entirely | ||
on the ability to force terms across the forall boundary of this function type | ||
of dynamic arity. For example, in the following type, `1 + 1` is reduced twice: | ||
once in checking the type of `f` and once in checking the return type. | ||
|
||
```coq | ||
let n := 1 + 1 in | ||
forall (f : Fin.t n), Fin.t n | ||
|
||
Unfortunately not all values can be forced and types such as telescopes will | ||
produce terms of cubic size without any way to share their reduction. | ||
|
||
### Requirements | ||
|
||
* The solution must integrate into Qed, i.e. the kernel. | ||
* It must support whitelists or otherwise allow specific selection of terms to | ||
be reduced (see “Reification with User Terms” above) | ||
* This eliminates vm_compute | ||
|
||
|
||
## <a id="nice"></a> General “make goal look nice” Strategy | ||
|
||
Any kind of longer-running automation that handles user terms probably needs to | ||
integrate a generic simplification strategy to perform the same kind of | ||
“normalization” that users rely on. | ||
|
||
simpl and cbn are the obvious candidates but only cbn is safe to call on | ||
arbitrary terms in arbitrary contexts without violating simpl never. | ||
Unfortunately, this eliminates the faster of the two strategies and leaves us | ||
with occasional catastrophic slowdowns in cbn. | ||
|
||
### Requirements | ||
|
||
* Must honor simpl never | ||
* Must be quite fast so that it can be called after every goal change | ||
* Must support Arguments | ||
* Must support fixpoint refolding (as goals will otherwise become unreadable) | ||
* Should be somewhat principled so that users can predict its results | ||
|
||
# Proposed Reduction Strategies | ||
|
||
## <a href="kred"></a> [kred: A faster, somewhat compatible cbn alternative](https://github.com/coq/coq/pull/19309) | ||
|
||
`kred` is a half finished re-implementation of `cbn` based on the kernel’s | ||
reduction machine. It is essentially `lazy` with support for `Arguments` and | ||
fixpoint refolding. It tries to be as faithful as possible to `cbn` but will | ||
sometimes deviate, especially when refolding fixpoints opportunities are too | ||
expensive to find. | ||
|
||
Another way to look at it is to call it a slightly less ambitious and slightly | ||
less agressively refolding version of cbn with the benefit of delayed | ||
substitution. | ||
|
||
### Advantages over cbn | ||
|
||
kred can be arbitrarily faster than `cbn` and its performance will often | ||
coincide with `lazy` when terms can be fully reduced. It is always faster than | ||
`cbn` even when terms are not fully reduced. | ||
|
||
These properties make `kred` a promising candidate for [a General “make goal look nice” Strategy](#nice). | ||
|
||
### Disadvantages | ||
|
||
As mentioned, `kred` is not as aggressive about refolding and will sometimes | ||
generate slightly different goals. | ||
|
||
`kred` is unfinished. The current code has at least one bug that introduces | ||
incorrect de-Bruijn indices in some specific cases. | ||
|
||
kred.ml is basically a copy of cClosure.ml. Its refolding mechanism cannot be | ||
integrated into the kernel because it is simply not sound | ||
(https://github.com/coq/coq/pull/18672). Handling Arguments require substantial | ||
changes that are unlikely to be ported to cClosure.ml. So `kred` essentially | ||
doubles the amount of kernel code with the added complexity of the copy not | ||
being a verbatim copy. | ||
|
||
|
||
## <a href="msp"></a> [MSP: Multi Stage Programming](https://github.com/coq/coq/pull/17839) | ||
|
||
The MSP prototype provides primitives (currently implemented using actual | ||
“primitives” but a next version will use dedicated AST nodes) that freeze or | ||
force reduction of terms. Those primitives can be nested to perform fine-grained | ||
reduction of some but not other terms. | ||
|
||
The very basic example below force-appends two lists containing user terms | ||
without touching those user terms which are guarded by the `block` primitive. | ||
|
||
```coq | ||
run ([block P; block Q] ++ [block R; block S]) | ||
``` | ||
|
||
blocked terms may contain unblocked sub-terms which are going to be reduced up | ||
to blocked sub-sub-terms, etc. | ||
|
||
`run` is what triggers MSP to work. `block` and `unblock` have no meaning | ||
outside of run. | ||
|
||
MSP addresses the use cases [Reification with User Terms](#reif) and [Avoiding Term | ||
Repetition in Proof Terms](#rep) completely and drastically improves on all existing | ||
reduction strategies in the precision it offers for reducing specific terms. | ||
|
||
### Advantages | ||
|
||
The current MSP prototype has already been tested on examples representing these | ||
use cases and it is able to overcome existing shortfalls both in complexity of | ||
use as well as in performance. | ||
|
||
### Disadvantages | ||
|
||
MSP must live in the kernel for it to be useful at Qed time. It might be | ||
possible to make it opt-in but that hardly changes anything. | ||
|
||
Automation-relevant theory must be rewritten to incorporate `block`, `unblock`, | ||
and `run` in all the necessary places. This cannot be avoided with this | ||
approach. | ||
|
||
The current prototype introduces a sizeable performance regression for | ||
developments that do not use the feature (+7% or so in developments that stress | ||
the kernel) because of the way it has to find applications of the new primitives | ||
before they are (mis-)handled in the usual way by the rest of the machine. | ||
|
||
The next iteration will use dedicated `constr` nodes which is a way more invasive | ||
change but should allow us to avoid any performance impact on developments that | ||
do not use the feature. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking at the draft!
I just remembered and re-read your comment in rocq-prover/rocq#13881 (comment). I suppose there really is no way to return unreduced terms from
vm_compute
, is there?In any case, I'll probably try to expand on "incompatible" a little bit. It seems to me that at least some vm casts could be preserved in terms that use MSP with enough care.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently no, but that does not mean that it is that farfetched.
First, the VM is already able to manipulate and return axioms, existential variables, and free variables (which are fully normalized values, but from an operational point of view, they are fully unreduced terms). So, extending accumulators to support arbitrary Gallina term is trivial. One has to be a bit careful with open terms so that bound values are properly kept around. But, accumulators are already able to accumulate information (hence their name), so no major change here either.
Second, if the reduction is only delayed rather than just fully blocked, one has to make sure that the bytecode is still around to restart the reduction. But again, the VM already supports similar patterns, e.g., for the
return
clause ofmatch
constructs. The kernel even has some mechanisms to ensure that Gallina terms and their corresponding bytecode stay synchronized, because this is needed when caching weakly-reduced toplevel definitions (though, this fragile piece of code is probably not exercised as much as it should be).The hardest part will be to make sure that the Gallina term in the accumulator and the corresponding bytecode agree on what the bound variables are. Currently, there is absolutely no similar mechanism in the bytecode compiler. To give a concrete example, consider
Definition foo x := let y := x in block (x + y).
The blocked Gallina term has two bound variables (
x
andy
), while the corresponding bytecode has only one bound valuex
(y
was optimized away by the compiler). Restarting the reduction by pushing two values on the VM stack would quickly lead to a memory corruption.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for the detailed explanation. Are there already plans/ideas for handling the binder discrepancies?
Somewhat orthogonal: It just occurred to me while trying to explain this to coworkers that both
vm_compute
andnative_compute
are currently trivially compatible with MSP. MSP must not (and does not) change the normal forms of terms and bothvm_compute
andnative_compute
perform full normalization. The only difference from MSP is in the result of weak head reduction (and even there it only changes the shape of arguments to the head constructor/constant/etc, not to the head itself). I should expand on this a bit in this document to avoid confusion.So until
vm_compute
/native_compute
supports weak head reduction I don't think they are affected at all. All we need is to treat the primitives as boxing/unboxing operators of some box type.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. In fact, last time I suggested that the bytecode compiler should handle let-in constructs in a slightly more predictable way, the idea did not live long: rocq-prover/rocq#13234
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the intention, then support un
vm_compute
is Very Easy™ (I don't remember how native compute works, but I don't see why it would be a problem). You simply reserve one extra tag, and otherwise compileblock t
likefun x => t
(better: compile it as a 0-argument closure), so thatblock t
is a weak-head normal form. When it comes to reconstruction, force the closure and evaluatet
(which is exactly what you do with functions). This should let you control the order of reduction as you need (I say “should” because your document doesn't really specify the semantics ofblock
/unblock
/run
(it should!)).You need to use a different tag for
block t
than forfun x => t
because it's going to behave differently (e.g. the normal form offun x => t
isfun x => t'
, but if I understand correctly, the normal form ofblock t
ist'
(in both case I taket'
to be the normal form oft
)).Supporting weak-head reduction in the VM is essentially the same problem as @silene describes above. Normalisation-by-evaluation really is a surprising sweet spot: you simplify the problem a lot when you're only asking to reconstruct normal-form terms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for that. I actually opened this PR way too early by accident and that's why the document is still in pretty rudimentary shape. The actual idea was to pass around a draft (in the form of a PR against a fork of this repo, not this repo directly) in order to collect additional challenging use cases for reduction before we (possibly) resume work on discussing/specifying/implementing MSP (and
kred
). But I somehow ended up targeting the wrong repo and now I am getting super useful feedback here so I really want keep this PR open.I'll update the document with some more details on MSP next week. For now, the best summary is the old prototype itself. The PR description is probably slightly outdated but the code was very close to the desired semantics. The primitives are in theories/Force/Force.v and there are tests that should be helpful to illustrate the intended spec in test-suite/success/force.v
Your description of supporting MSP in the absence of weak-head reduction sounds correct to me except that
block t
simply reduces toblock t'
.@block T t
has typeBlocked T
which can be thought of a record containingT
. Bothunblock
andrun
function as projections in this simplified view. I don't remember the exact reason butBlocked T
cannot be convertible toT
. It probably involves canonicity.I really wonder if this could end up helping simplify the discussion around bytecode compilation/optimization. I see that rocq-prover/rocq#13234 contains strong arguments for pattern matching being the only viable solution for predictable reduction performance and that also matches my personal experience (although I can almost never use
vm_compute
so I don't know much about the performance pitfalls there). MSP would indeed be one possible way to specify the order of reduction even invm_compute
. If that could help simplify the bytecode compilation by making some "optimizations" unnecessary, maybe it could also make implementing weak-head reduction easier in the long run by keeping the bytecode very close to the original terms. (Weak-head reduction invm_compute
would be incredibly useful independently of anything discussed here.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MSP does not make the above optimization unnecessary. (The only purpose of this optimization is to allow the garbage collector to free memory earlier in some very rare cases. This optimization is not even part of OCaml's bytecode compiler, for good reasons. It is specific to Coq.) And except for peephole ones, there is no other optimization in the bytecode compiler anyway. So, MSP and bytecode optimizations are two completely separate issues.