Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

files cache with size #7846

Draft
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

ThomasWaldmann
Copy link
Member

No description provided.

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Sep 25, 2023

This is trying to prepare a bigger change in how borg works: giving up with precise refcounting and removing chunk size information from the chunks index/cache (so borg could just work with the set of existing chunks with no further information about them in the chunks index/cache).

A first step into that direction is adding the chunk sizes to the files cache (it currently only has the chunk ids).

If borg create detects a file is unchanged, we need to have that information at hand to create the item's chunk list without reading/chunking the file.

NewCache tries to be similar to AdHocCache, but additionally has the files cache.

@ThomasWaldmann ThomasWaldmann force-pushed the files-cache-with-size branch 3 times, most recently from 527b612 to aef7d7a Compare September 25, 2023 21:40
@codecov-commenter
Copy link

codecov-commenter commented Sep 25, 2023

Codecov Report

Attention: Patch coverage is 63.46154% with 133 lines in your changes are missing coverage. Please review.

Project coverage is 82.83%. Comparing base (54a85bf) to head (1389bd1).
Report is 1 commits behind head on master.

Files Patch % Lines
src/borg/cache.py 61.11% 117 Missing and 9 partials ⚠️
src/borg/archive.py 80.64% 6 Missing ⚠️
src/borg/archiver/rinfo_cmd.py 66.66% 0 Missing and 1 partial ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7846      +/-   ##
==========================================
- Coverage   83.75%   82.83%   -0.93%     
==========================================
  Files          67       67              
  Lines       12061    12166     +105     
  Branches     2189     2197       +8     
==========================================
- Hits        10102    10078      -24     
- Misses       1367     1491     +124     
- Partials      592      597       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

the files cache used to have only the chunk ids,
so it had to rely on the chunks index having the
size information - which is problematic with e.g.
the AdhocCache (has size==0 for all not new chunks) and blocked using the files cache there.
incref: returns (id, size), so it needs the size if it can't
get it from the chunks index. also needed for updating stats.

decref: caller does not always have the chunk size (e.g. for
metadata chunks),
as we consider 0 to be an invalid size, we call with size == 1
in that case. thus, stats might be slightly off.
when given, force using the AdHocCache.
if a chunk is missing in repo, it will also be missing in a ad-hoc
built chunks index.
thus:

- no cache.path
- skip on-disk cache corruption tests for AdHocCache
Also:
- move common code to ChunksMixin
- always use ._txn_active (not .txn_active)
if we use AdHocCache or NewCache, we do not have precise refcounting.
thus, we do not delete repo objects as their refcount does not go to zero.

check --repair will just remove the orphans.
Only LocalCache implements these.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants