Holmake to rebuild based on theory hashes rather than timestamps #381

xrchz · 2016-11-30T21:21:36Z

This would for example avoid cascading rebuilds if you just add a comment to a script file (and it produces exactly the same theory).

xrchz · 2017-04-24T01:31:13Z

In my vision for this, it would also obviate a rebuild in the following situation. barTheory depends on fooTheory. You add a theorem to fooTheory which is not used in barTheory. Then barTheory does not need to be rebuilt even after fooTheory is rebuilt to include the new theorem.

mn200 · 2017-04-24T05:07:25Z

This seems unlikely. Presumably foo's hash has just changed. In that situation, bar has to rebuild. However, its hash won't change (I suppose). If baz only depends on bar, it won't need to be rebuilt. On the other hand, suppose the change to foo has redefined an important constant that bar uses, but the theorems that bar states have the same statements so that when bar is rebuilt it doesn't change its external nature. (Say the tactics proving bar's theorems are robust in the face of whatever change was made.)

Now theory baz is only explicitly a descendent of bar (so the graph records a link form bar to baz, but not from foo to baz), but it does have a theorem in it that depends on foo. Maybe that theorem is now false... So, we actually have to rebuild everything in baz too (though there'd be lots of similar seeming situations when we'd prefer not to).

marioxcc · 2017-11-14T17:31:58Z

If this is implemented, it would be better to move the old theory and then compare the new and the old file byte for byte, file instead of overwriting it with the new theory file and then comparing hashes. Although probably theory hashes will never collide in practice, this is not provable, and thus if one were to model the build process, including this optimization, one could not prove that it is correct, not even probabilistically (in cryptography it is common to assume that the hash is a function chosen at random; this approach is used under the name of “pseudorandom function model” but note that this assumption does not match reality).

xrchz · 2025-01-06T08:07:26Z

I started writing this issue again 🤦‍♀️ ...

This issue is a proposal to replace the timestamps (based on modification time of the script file that built the theory) currently used to identify theories (these are recorded in the Theory.dat files) with (recursive) content hashes. The content to hash is the Theory.dat file content (or the logically relevant subset of it in case anything can be omitted).

The point of doing this is to enable reuse of theories even when their script files are modified as long as the exported logical content is the same. This includes even changing a proof, for example, since Theory.dat files don't record proofs, as long as the change doesn't affect the theorem statement or oracles. This in turn means that Theory.dat files can in principle be shared with others - even those who don't have the script file - once they are built (although Holmake probably needs to be changed for that to actually work).

One of my motivations for wanting this is that I believe it will make module management (as explored in https://gist.github.com/xrchz/11d664a58448b680e3ea0e7364c72814) easier, and may even be a prerequisite for some of the ideas there. Luckily, it can be carved out as as an independent upgrade to HOL4.

The issue that @mn200 mentioned above prevents the thing I asked for above - not having to rebuild a theory when one of its ancestors adds irrelevant logical content - but now I'm not asking for that. With recursive content hashes I think soundness should be straightforward, right? Even if more rebuilding is required than I initially thought, it is still less than status quo.

mn200 · 2025-01-06T23:01:08Z

I think we might store the fooTheory.dat file's hash not only in the .dat files that descend from it, but also in the corresponding fooTheory.sml file, with an export in fooTheory.sig (binding val foo_dathash : string perhaps). However, I suspect that you want to store a hash of the ancestors' ancTheory.sml as well (not just the .dat) : there are some places where you can add stuff to those files that is not apparent in the .dat. Given this, I wonder if we are not just reinventing a build system that uses hashes everywhere. If so, what's the received wisdom on doing so?

xrchz · 2025-01-06T23:17:17Z

What lives in Theory.sml that is logically relevant and not in Theory.dat? Could everything soundness-critical be moved to the .dat?

mn200 · 2025-01-06T23:47:19Z

Currently, users can put SML code of arbitrary flavours into fooTheory.sml (and make entry points visible as well by also changing fooTheory.sig). The entry-points are adjoin_to_... There aren't that many existing uses, and they could probably all be removed. (Perhaps just make that a precondition to this.)

By scanning the output of grep -r I think the only interesting uses of this facility are in src/coretypes/pairScript.sml, where all sorts of changes are made to make pairs usable in places like new_definition. I think this code should move to pairLib, and just assume that that will always be loaded (as bossLib already does).

Finally, there is also the ability to generate code requiring that a certain SML module be loaded when a Theory is loaded (Theory.add_ML_dependency). To date, this facility has mostly been used for ensuring that custom pretty-printing code is loaded when a theory is loaded, but it might get used in important ways.

Luckily, I think it's trivial to make these changes get reflected in fooTheory.dat: simply record the data in the .dat file even if that file does nothing with the information. (Could do this with the general-purpose adjoin_to_ entry points too, but they're so gross (and work-around able) that I'd much prefer not to.)

konrad-slind · 2025-01-07T02:06:07Z

Yeah, the main impetus for adjoin_to_theory was the theory of pairs because, for example, new_definition had to be re-educated from the bare curried form to handle definitions taking tupled arguments. If that can be moved to pairLib, I am happy to see adjoin_to_theory go away. With that, will theories be "essentially declarative"?

…

On Mon, Jan 6, 2025 at 5:47 PM Michael Norrish ***@***.***> wrote: Currently, users can put SML code of arbitrary flavours into fooTheory.sml (and make entry points visible as well by also changing fooTheory.sig). The entry-points are adjoin_to_... There aren't that many existing uses, and they could probably all be removed. (Perhaps just make that a precondition to this.) By scanning the output of grep -r I think the only interesting uses of this facility are in src/coretypes/pairScript.sml, where all sorts of changes are made to make pairs usable in places like new_definition. I think this code should move to pairLib, and just assume that that will always be loaded (as bossLib already does). Finally, there is also the ability to generate code requiring that a certain SML module be loaded when a Theory is loaded ( Theory.add_ML_dependency). To date, this facility has mostly been used for ensuring that custom pretty-printing code is loaded when a theory is loaded, but it might get used in important ways. Luckily, I think it's trivial to make these changes get reflected in fooTheory.dat: simply record the data in the .dat file even if that file does nothing with the information. (Could do this with the general-purpose adjoin_to_ entry points too, but they're so gross (and work-around able) that I'd much prefer not to.) — Reply to this email directly, view it on GitHub <#381 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAIOAD7G5KBJQ4B5DBCV4PD2JMIZ5AVCNFSM6AAAAABUVDJWLKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNZUGEZDMMJRGY> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

xrchz added Build/Holmake Feature Request labels Nov 30, 2016

xrchz assigned mn200 Nov 30, 2016

xrchz mentioned this issue Jan 7, 2025

Deprecate adjoin_to_theory #1383

Open

dnezam mentioned this issue Feb 1, 2025

HOL Module Manager #1392

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Holmake to rebuild based on theory hashes rather than timestamps #381

Holmake to rebuild based on theory hashes rather than timestamps #381

xrchz commented Nov 30, 2016

xrchz commented Apr 24, 2017

mn200 commented Apr 24, 2017

marioxcc commented Nov 14, 2017

xrchz commented Jan 6, 2025

mn200 commented Jan 6, 2025

xrchz commented Jan 6, 2025

mn200 commented Jan 6, 2025

konrad-slind commented Jan 7, 2025 via email

Holmake to rebuild based on theory hashes rather than timestamps #381

Holmake to rebuild based on theory hashes rather than timestamps #381

Comments

xrchz commented Nov 30, 2016

xrchz commented Apr 24, 2017

mn200 commented Apr 24, 2017

marioxcc commented Nov 14, 2017

xrchz commented Jan 6, 2025

mn200 commented Jan 6, 2025

xrchz commented Jan 6, 2025

mn200 commented Jan 6, 2025

konrad-slind commented Jan 7, 2025 via email