Skip to content

Commit ec385d0

Browse files
committed
cache key consistency
1 parent 642df6f commit ec385d0

File tree

4 files changed

+97
-9
lines changed

4 files changed

+97
-9
lines changed

docs/source/cache.rst

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -173,6 +173,93 @@ The cache timestamp functionality is fully backward compatible:
173173
* No changes to Repository or ProjectDirectory APIs
174174
* All existing code continues to work unchanged
175175

176+
Best Practices
177+
--------------
178+
179+
Shared Cache Usage
180+
~~~~~~~~~~~~~~~~~~
181+
182+
.. warning::
183+
**Recommendation: Use Separate Cache Instances**
184+
185+
While it's technically possible to share the same cache object across multiple Repository instances,
186+
we **strongly recommend using separate cache instances** for each repository for the following reasons:
187+
188+
**Recommended Approach - Separate Caches:**
189+
190+
.. code-block:: python
191+
192+
from gitpandas import Repository
193+
from gitpandas.cache import DiskCache
194+
195+
# Create separate cache instances for each repository
196+
cache1 = DiskCache(filepath='repo1_cache.gz')
197+
cache2 = DiskCache(filepath='repo2_cache.gz')
198+
199+
repo1 = Repository('/path/to/repo1', cache_backend=cache1)
200+
repo2 = Repository('/path/to/repo2', cache_backend=cache2)
201+
202+
**Benefits of Separate Caches:**
203+
204+
* **Complete Isolation**: No risk of cache eviction conflicts between repositories
205+
* **Predictable Memory Usage**: Each repository has its own memory budget
206+
* **Easier Debugging**: Cache issues are isolated to specific repositories
207+
* **Better Performance**: No lock contention in multi-threaded scenarios
208+
* **Clear Cache Management**: You can clear or manage each repository's cache independently
209+
210+
**If You Must Share Caches:**
211+
212+
If you need to share a cache object across multiple repositories (e.g., for memory constraints),
213+
the system is designed to handle this safely:
214+
215+
.. code-block:: python
216+
217+
from gitpandas import Repository
218+
from gitpandas.cache import EphemeralCache
219+
220+
# Shared cache (not recommended but supported)
221+
shared_cache = EphemeralCache(max_keys=1000)
222+
223+
repo1 = Repository('/path/to/repo1', cache_backend=shared_cache)
224+
repo2 = Repository('/path/to/repo2', cache_backend=shared_cache)
225+
226+
# Each repository gets separate cache entries
227+
files1 = repo1.list_files() # Creates cache key: list_files||repo1||None
228+
files2 = repo2.list_files() # Creates cache key: list_files||repo2||None
229+
230+
**Shared Cache Considerations:**
231+
232+
* Repository names are included in cache keys to prevent collisions
233+
* Cache eviction affects all repositories sharing the cache
234+
* Memory usage is shared across all repositories
235+
* Very active repositories may evict cache entries from less active ones
236+
237+
Cache Size Planning
238+
~~~~~~~~~~~~~~~~~~~
239+
240+
When planning cache sizes, consider:
241+
242+
* **Repository Size**: Larger repositories generate more cache entries
243+
* **Operation Types**: Some operations (like ``cumulative_blame``) create many cache entries
244+
* **Memory Constraints**: Balance cache size with available system memory
245+
* **Analysis Patterns**: Frequently repeated analyses benefit from larger caches
246+
247+
**Recommended Cache Sizes:**
248+
249+
.. code-block:: python
250+
251+
# Small repositories (< 1000 commits)
252+
cache = EphemeralCache(max_keys=100)
253+
254+
# Medium repositories (1000-10000 commits)
255+
cache = EphemeralCache(max_keys=500)
256+
257+
# Large repositories (> 10000 commits)
258+
cache = EphemeralCache(max_keys=1000)
259+
260+
# For disk/Redis caches, you can use larger sizes
261+
cache = DiskCache(filepath='cache.gz', max_keys=5000)
262+
176263
API Reference
177264
-------------
178265

gitpandas/cache.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,8 +83,9 @@ def deco(self, *args, **kwargs):
8383
force_refresh = is_propagated_force or explicit_force_refresh
8484

8585
# Generate the cache key (ensure force_refresh itself is not part of the key)
86+
# Use || as delimiter to avoid conflicts with repository names containing underscores
8687
key_parts = [str(kwargs.get(k)) for k in key_list]
87-
key = f"{key_prefix}_{self.repo_name}_{'_'.join(key_parts)}"
88+
key = f"{key_prefix}||{self.repo_name}||{'_'.join(key_parts)}"
8889
logging.debug(f"Cache key generated for {key_prefix}: {key}")
8990

9091
# Explicitly log force refresh bypass of cache read

tests/test_cache.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -571,9 +571,9 @@ def mock_set(k, v):
571571
assert len(captured_keys) == 1
572572
key = captured_keys[0]
573573

574-
# Key should have proper separators
574+
# Key should have proper separators (new format uses ||)
575575
assert key.startswith("test_method_")
576-
assert "_test/repo_" in key
576+
assert "||test/repo||" in key
577577

578578
# Key should contain parameter values
579579
assert "val1" in key

tests/test_cache_key_consistency.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -136,8 +136,8 @@ def mock_set(k, v):
136136

137137
# Keys should be different because repo_name is different
138138
assert key1 != key2
139-
assert "/path/to/repo_" in key1
140-
assert "/path/to/repo/_" in key2
139+
assert "||/path/to/repo||" in key1
140+
assert "||/path/to/repo/||" in key2
141141

142142
def test_complex_key_generation(self, temp_cache_path):
143143
"""Test key generation with complex parameters"""
@@ -159,10 +159,10 @@ def mock_set(k, v):
159159

160160
# Check key format
161161
key = captured_keys[0]
162-
assert key.startswith("complex_method_")
163-
assert "_value1_" in key
164-
assert "_value2_" in key
165-
assert "_value3" in key
162+
assert key.startswith("complex_method||")
163+
assert "value1_" in key
164+
assert "value2_" in key
165+
assert "value3" in key
166166

167167
# Call again with different order of parameters in the call
168168
# Python should normalize kwargs, so the key should be the same

0 commit comments

Comments
 (0)