Caching #28

haesleinhuepf · 2024-12-04T16:12:44Z

Assume we have a list of PDF filenames and we have a function embed(pdf_filename, slide_no). Now we would like to establish a persistent cache for the results of this function. E.g. today I call the function with parameters "test.pdf" and 0 ( for the first slide), it will return a vector of 1500 numbers. It takes a while to compute this. Hence, if I call the function again tomorrow, the computation should be avoided. Instead we retrieve the vector from a persistent storage; a file.

git-bob think about strategies to achieve this

The text was updated successfully, but these errors were encountered:

github-actions · 2024-12-04T16:13:45Z

^{This message was generated by git-bob (version: 0.12.0, model: github_models:gpt-4o, log), an experimental AI-based assistant. It can make mistakes and has limitations. Check its messages carefully.}

To establish a persistent cache for the embed function, you can use Python's shelve module, which provides a simple persistent storage for key-value pairs. Here's a basic implementation strategy:

import shelve

def cache_embed(pdf_filename, slide_no):
    key = f"{pdf_filename}_{slide_no}"
    with shelve.open('embed_cache.db') as cache:
        if key in cache:
            return cache[key]
        else:
            result = embed(pdf_filename, slide_no)
            cache[key] = result
            return result

With this setup, the cache_embed function will first check if the result is already in the cache. If it is, the cached result is returned. Otherwise, it computes the result and stores it in the cache for future use. This avoids redundant computations and leverages disk storage for persistence across sessions.

haesleinhuepf · 2024-12-04T16:14:38Z

CC @lea-33 - I asked gitbob in a more structured way than in #13 ;-)

lea-33 self-assigned this Dec 6, 2024

lea-33 added the enhancement New feature or request label Dec 6, 2024

lea-33 linked a pull request Dec 11, 2024 that will close this issue

Update README.md #31

Merged

lea-33 closed this as completed in #31 Dec 11, 2024

lea-33 reopened this Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caching #28

Caching #28

haesleinhuepf commented Dec 4, 2024 •

edited

Loading

github-actions bot commented Dec 4, 2024

haesleinhuepf commented Dec 4, 2024

Caching #28

Caching #28

Comments

haesleinhuepf commented Dec 4, 2024 • edited Loading

github-actions bot commented Dec 4, 2024

haesleinhuepf commented Dec 4, 2024

haesleinhuepf commented Dec 4, 2024 •

edited

Loading