Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Holmake to rebuild based on theory hashes rather than timestamps #381

Open
xrchz opened this issue Nov 30, 2016 · 8 comments
Open

Holmake to rebuild based on theory hashes rather than timestamps #381

xrchz opened this issue Nov 30, 2016 · 8 comments

Comments

@xrchz
Copy link
Member

xrchz commented Nov 30, 2016

This would for example avoid cascading rebuilds if you just add a comment to a script file (and it produces exactly the same theory).

@xrchz
Copy link
Member Author

xrchz commented Apr 24, 2017

In my vision for this, it would also obviate a rebuild in the following situation. barTheory depends on fooTheory. You add a theorem to fooTheory which is not used in barTheory. Then barTheory does not need to be rebuilt even after fooTheory is rebuilt to include the new theorem.

@mn200
Copy link
Member

mn200 commented Apr 24, 2017

This seems unlikely. Presumably foo's hash has just changed. In that situation, bar has to rebuild. However, its hash won't change (I suppose). If baz only depends on bar, it won't need to be rebuilt. On the other hand, suppose the change to foo has redefined an important constant that bar uses, but the theorems that bar states have the same statements so that when bar is rebuilt it doesn't change its external nature. (Say the tactics proving bar's theorems are robust in the face of whatever change was made.)

Now theory baz is only explicitly a descendent of bar (so the graph records a link form bar to baz, but not from foo to baz), but it does have a theorem in it that depends on foo. Maybe that theorem is now false... So, we actually have to rebuild everything in baz too (though there'd be lots of similar seeming situations when we'd prefer not to).

@marioxcc
Copy link
Contributor

If this is implemented, it would be better to move the old theory and then compare the new and the old file byte for byte, file instead of overwriting it with the new theory file and then comparing hashes. Although probably theory hashes will never collide in practice, this is not provable, and thus if one were to model the build process, including this optimization, one could not prove that it is correct, not even probabilistically (in cryptography it is common to assume that the hash is a function chosen at random; this approach is used under the name of “pseudorandom function model” but note that this assumption does not match reality).

@xrchz
Copy link
Member Author

xrchz commented Jan 6, 2025

I started writing this issue again 🤦‍♀️ ...

This issue is a proposal to replace the timestamps (based on modification time of the script file that built the theory) currently used to identify theories (these are recorded in the Theory.dat files) with (recursive) content hashes. The content to hash is the Theory.dat file content (or the logically relevant subset of it in case anything can be omitted).

The point of doing this is to enable reuse of theories even when their script files are modified as long as the exported logical content is the same. This includes even changing a proof, for example, since Theory.dat files don't record proofs, as long as the change doesn't affect the theorem statement or oracles. This in turn means that Theory.dat files can in principle be shared with others - even those who don't have the script file - once they are built (although Holmake probably needs to be changed for that to actually work).

One of my motivations for wanting this is that I believe it will make module management (as explored in https://gist.github.com/xrchz/11d664a58448b680e3ea0e7364c72814) easier, and may even be a prerequisite for some of the ideas there. Luckily, it can be carved out as as an independent upgrade to HOL4.

The issue that @mn200 mentioned above prevents the thing I asked for above - not having to rebuild a theory when one of its ancestors adds irrelevant logical content - but now I'm not asking for that. With recursive content hashes I think soundness should be straightforward, right? Even if more rebuilding is required than I initially thought, it is still less than status quo.

@mn200
Copy link
Member

mn200 commented Jan 6, 2025

I think we might store the fooTheory.dat file's hash not only in the .dat files that descend from it, but also in the corresponding fooTheory.sml file, with an export in fooTheory.sig (binding val foo_dathash : string perhaps). However, I suspect that you want to store a hash of the ancestors' ancTheory.sml as well (not just the .dat) : there are some places where you can add stuff to those files that is not apparent in the .dat. Given this, I wonder if we are not just reinventing a build system that uses hashes everywhere. If so, what's the received wisdom on doing so?

@xrchz
Copy link
Member Author

xrchz commented Jan 6, 2025

What lives in Theory.sml that is logically relevant and not in Theory.dat? Could everything soundness-critical be moved to the .dat?

@mn200
Copy link
Member

mn200 commented Jan 6, 2025

Currently, users can put SML code of arbitrary flavours into fooTheory.sml (and make entry points visible as well by also changing fooTheory.sig). The entry-points are adjoin_to_... There aren't that many existing uses, and they could probably all be removed. (Perhaps just make that a precondition to this.)

By scanning the output of grep -r I think the only interesting uses of this facility are in src/coretypes/pairScript.sml, where all sorts of changes are made to make pairs usable in places like new_definition. I think this code should move to pairLib, and just assume that that will always be loaded (as bossLib already does).

Finally, there is also the ability to generate code requiring that a certain SML module be loaded when a Theory is loaded (Theory.add_ML_dependency). To date, this facility has mostly been used for ensuring that custom pretty-printing code is loaded when a theory is loaded, but it might get used in important ways.

Luckily, I think it's trivial to make these changes get reflected in fooTheory.dat: simply record the data in the .dat file even if that file does nothing with the information. (Could do this with the general-purpose adjoin_to_ entry points too, but they're so gross (and work-around able) that I'd much prefer not to.)

@konrad-slind
Copy link
Contributor

konrad-slind commented Jan 7, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants