Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove invalid entries from cache-db #4407

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

EMattfolk
Copy link
Contributor

@EMattfolk EMattfolk commented Oct 20, 2022

Description of the change

This is a step in the process of resolving #4133. cache-db.json currently contains references to files that sometimes does not exist which, while not harmful, is a bit confusing.

Fully fixing #4133 would also require deleting the compiled modules for which the source files have been deleted. I do not know the implications of deleting things from the output folder which is why I skipped that for now. Adding it in later should be fairly trivial if that is the route to take. I have had some discussions about this in the past and it seems to be a rather controversial topic. I have also seen many problems arise because old output is left around, so I would like to get some more discussion going.

Note

This change successfully removes modules from cache-db.json if the source file is removed or the module is renamed, so it works fine when:

  • The source file has been renamed
  • The source file has been deleted
  • The module name inside the source file has changed

It does not work when only the module name is changed inside the file. Fixed. Though I think this could be detected by doing the following:

  1. Check if there are two modules with the same source file
  2. Remove the module that is currently not being compiled

Checklist:

  • Added a file to CHANGELOG.d for this PR (see CHANGELOG.d/README.md)
  • Added myself to CONTRIBUTORS.md (if this is my first contribution)
  • Linked any existing issues or proposals that this pull request should close
  • Updated or added relevant documentation
  • Added a test for the contribution (if applicable)

Modules are now removed if they are not being compiled and another module has
the same source.
@EMattfolk EMattfolk changed the title Remove missing files from cache-db Remove invalid entries from cache-db Oct 26, 2022
@i-am-the-slime
Copy link
Contributor

Could this be merged?

@f-f
Copy link
Member

f-f commented Nov 24, 2022

@i-am-the-slime as documented here merging things to the compiler requires two people from core to approve this. I would have a look at this myself but I forgot how this chunk of the compiler works and I'm extremely swamped at the moment. Given that it's the Thanksgiving week a lot of folks might be offline, so it will probably be at least one more week before anyone can look at it.
The change is fortunately quite small and it's tackling something that has been discussed in an issue and on Discord, so I don't see particular hurdles for it to go in soon.

@@ -136,7 +136,7 @@ rebuildModuleWithIndex MakeActions{..} exEnv externs m@(Module _ _ moduleName _
--
-- If timestamps or hashes have not changed, existing externs files can be used to provide upstream modules' types without
-- having to typecheck those modules again.
make :: forall m. (Monad m, MonadBaseControl IO m, MonadError MultipleErrors m, MonadWriter MultipleErrors m)
make :: forall m. (Monad m, MonadBaseControl IO m, MonadError MultipleErrors m, MonadWriter MultipleErrors m, MonadIO m)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this MonadIO constraint added for?

Copy link
Contributor Author

@EMattfolk EMattfolk Jan 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to check which files existed in order to remove them from the cache which meant I had to do IO. I'm kind of new to haskell but my LSP suggested I add that constraint and it just worked. It felt kind of like it shouldn't be there when I wrote it, not sure how I would do it otherwise

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I just couldn't see what required this constraint straight away

(\acc1 e -> case e of
-- This pattern will clean up duplicates over time.
-- It will keep the cache in a good state if it was in a good state before.
m1 : m2 : _ ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this just when Set.size e > 1 then add e \\ modules to the accumulator? Then there's no need to wait until multiple compilations reach a fixed point (assuming I understand the comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so? What this means is if there are two or more modules with the same name in the cache, at most one of them is being compiled right now (i.e. it is in the modules set). e \\ modules (assuming \\ is set difference) will be all duplicate modules, but we want to ignore those, not find them. Perhaps e \\ (e \\ modules) is easier to understand?

Anyway, in practice it is possible there are multiple duplicates if you rename and compile several times, but after this change it is only possible to have a single duplicate, so this code is enough. If you think it is better to refactor it I will do that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, not sure if I was unclear or I don't understand - I was suggesting that e \\ modules is added to the accumulator that goes into this fold to construct toRemove, as acc2 = S.union (Set.singleton m1 \\ modules) acc1 etc

I don't feel strongly about the code, just trying to follow through and understand "clean up duplicates over time"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand what you mean now. I must have been tired when I read your comment initially :) Yes, just doing the set difference is probably the way to go. I will refactor it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're in there, I think this might be a bit clearer if you separate out the duplicate pruning and the missing file pruning into separate functions.

Also, general tip: you're doing a lot of bespoke folds here to accumulate and aggregate stuff, and I think you'd be better off using simpler primitives like concat and filter and M.fromListWith.

tests/TestMake.hs Outdated Show resolved Hide resolved
@EMattfolk
Copy link
Contributor Author

EMattfolk commented Jan 28, 2023

I may have accidentally simplified the implementation quite a bit. I believe the behavior is the same.

What do you think about also removing the old compiled output while we're at it? @rhendric @nwolverson Or should this be merged as-is?

Edit: looks like something broke. I think I remember discussing about a usecase where the cache remembers modules that were switched out or similar. Not sure I agree that should be a feature, but I it what it is I guess

@nwolverson
Copy link
Contributor

What do you think about also removing the old compiled output while we're at it

My general feeling was this PR is currently uncontroversial and at this point probably want to get it merged rather than expanding scope.

Apologies for not noticing the question and responding sooner. I don't know what the build problem was - can we kick that (perhaps via bringing up to date with latest) and see where we stand?

@EMattfolk
Copy link
Contributor Author

My general feeling was this PR is currently uncontroversial and at this point probably want to get it merged rather than expanding scope.

Makes sense, this PR wasn't supposed to do that from the beginning anyway.

I don't know what the build problem was - can we kick that (perhaps via bringing up to date with latest) and see where we stand?

Do you mean fixing the code which is failing the test cases that I wrote? Or to relax the tests a bit to allow the current version through? I'm more in favor of the second as that will simplify the implementation (as well as giving more predictable behaviour). Either way I could probably get a fix in quite quickly since I have more time for these kinds of things nowadays 🙂

@nwolverson
Copy link
Contributor

I mean there are no logs for the failed build checks (from what I see) so it's not clear if they were merely "build issues" or actual concrete things to be resolved. I assume from your comment that there was a test failure - I'm happy to comment on a particular case if you can give a pointer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants