Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement caching #74

Open
nlf opened this issue Jul 27, 2022 · 1 comment
Open

feat: implement caching #74

nlf opened this issue Jul 27, 2022 · 1 comment

Comments

@nlf
Copy link
Contributor

nlf commented Jul 27, 2022

currently every operation performed through this module has to maintain its own cache. pacote half implements some form of caching, but now that we don't keep integrity values for locally built tarballs (i.e. git repositories) pacote will never actually use the cache.

in order to resolve this, i propose that we implement git specific caching semantics in this module.

these are just some thoughts i had about how this could work, that may or may not be useful when we implement.

ideally, we use cacache so that we maintain consistency in our caching locations. this gives us a tiny bit of a challenge, however, in that we would need to cache specific entities in different ways

  1. package.json (including corgis, so this is really two entities in one key similar to how make-fetch-happen handles different accept headers and content-types
  2. npm-shrinkwrap.json if the package has one present, we need it accessible
  3. a built tarball, meaning we've cloned the repo, checked out the appropriate reference, installed dependencies and run prepare scripts (if necessary), and run npm pack

where we start to venture into uncharted waters is with regards to how we identify and correctly handle stale repositories. to this end, i think one approach we could take is to also store a tarball comprising of the raw git repository in the cache. this tarball would be used as a means of determining if the requested reference has changed or not. the flow would look something like this:

  • request the raw git repository from cacache or clone it
  • extract the raw git repository (if we just cloned it, skip this)
  • update the raw git repository (git fetch, again skip if we just cloned)
  • resolve the requested git ref to a commit hash
  • attempt to read the requested file from the cache, use the commit hash as an etag of sorts
  • if the file does not exist, do what is necessary to retrieve it and store it

this would mean that every installation of a git repository will require us to extract a tarball of the repository, and do a git fetch before retrieving anything but that's a very considerably better state than we have today where we clone the entire repository from scratch every time we need something from it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
@nlf and others