Description
Before filing a bug
- I have installed the latest dev version of {precommit} with
remotes::install_github('lorenzwalthert/precommit')
- I have installed the latest hook revisions (update with
precommit::autoupdate()
) - I have installed the latest release of the upstream Python framework pre-comit as described under the update instructions.
Describe the bug
My team sporadically sees the following error when running the style-files
hook (example here):
Error: package or namespace load failed for ‘styler’:
.onLoad failed in loadNamespace() for 'styler', details:
call: mkdirs.default(path, mustWork = TRUE)
error: Failed to create directory (tried 5 times), most likely because of lack of file permissions
(directory '/home/runner/.cache/R/R.cache/styler' exists but nothing beyond):
/home/runner/.cache/R/R.cache/styler/1.10.0
Execution halted
It's difficult to debug since it occurs sporadically, but it happens once every few runs.
I've seen hints of this problem elsewhere, namely in r-lib/styler#1180 (comment) and r-lib/styler#1118 (comment), but I wanted to open a dedicated issue since I had trouble finding any discussion about it.
To Reproduce
It's difficult to reproduce this bug, since it doesn't occur deterministically, but if you clone https://github.com/ccao-data/model-res-avm/ and run pre-commit run style-files --all-files
a few times, you're bound to run into it.
Expected behavior
style-files
should not raise an error related to R.cache
directory permissions.
Additional context
I believe this error is caused by a race condition that occurs when precommit runs multiple styler
processes in parallel on the same machine.
Here's my best guess at a root cause: remove_old_cache_files
gets the version-specific cache path with a R.cache::getCachePath()
call, which will create a directory at that path if it does not exist. Then, the remove_old_cache_files
function deletes that directory if it's empty. I'm not 100% confident in this assessment, since I would need to modify the underlying R.utils
code in order to confirm it, but I suspect that if process A cleans up the directory in between when the mkdirs()
call in process B creates it and then checks to confirm it exists, then process B will raise an error saying that the directory wasn't able to be created. (One remaining mystery that doesn't quite fit this story: I don't totally understand why the fileAccess()
call for the parent directory is returning -1, which causes the "likely because of a lack of file permissions" clause in the error message.)
While I'm not totally sure of this root cause analysis, I do feel confident that the problem is due to some sort of race condition in the getCachePath()
call in remove_old_cache_files
. I've never been able to reproduce the error when setting the require_serial
option in my pre-commit config.
Suggested fix
It looks like a try_fetch
call was added in r-lib/styler@54df59b and released in v1.10.3 in an attempt to suppress the error; I believe that we're still seeing the error in our precommit calls because precommit is pinned to 1.10.2, so one possible solution would be to just bump styler to 1.10.3 in precommit. Another option would be for us to stop running styler in parallel, which we can do by setting the require_serial
option.
That being said, you may also be interested in resolving the underlying race condition instead of suppressing its error. Whatever the precise root cause of the race condition is, it's definitely coming from the mkdirs()
call in getCachePath()
, so I expect it could be resolved pretty easily by switching to an alternative way of constructing that directory path that doesn't create the directory, like file.path(tools::R_user_dir("R.cache", which = "cache"), "styler", styler_version)
.
Ultimately, I think it's up to you whether and how you want to handle this issue, I just wanted to share the results of my investigation in case it's useful!