Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching #28

Open
haesleinhuepf opened this issue Dec 4, 2024 · 2 comments · Fixed by #31
Open

Caching #28

haesleinhuepf opened this issue Dec 4, 2024 · 2 comments · Fixed by #31
Assignees
Labels
enhancement New feature or request

Comments

@haesleinhuepf
Copy link
Member

haesleinhuepf commented Dec 4, 2024

Assume we have a list of PDF filenames and we have a function embed(pdf_filename, slide_no). Now we would like to establish a persistent cache for the results of this function. E.g. today I call the function with parameters "test.pdf" and 0 ( for the first slide), it will return a vector of 1500 numbers. It takes a while to compute this. Hence, if I call the function again tomorrow, the computation should be avoided. Instead we retrieve the vector from a persistent storage; a file.

git-bob think about strategies to achieve this

Copy link

github-actions bot commented Dec 4, 2024

This message was generated by git-bob (version: 0.12.0, model: github_models:gpt-4o, log), an experimental AI-based assistant. It can make mistakes and has limitations. Check its messages carefully.

To establish a persistent cache for the embed function, you can use Python's shelve module, which provides a simple persistent storage for key-value pairs. Here's a basic implementation strategy:

import shelve

def cache_embed(pdf_filename, slide_no):
    key = f"{pdf_filename}_{slide_no}"
    with shelve.open('embed_cache.db') as cache:
        if key in cache:
            return cache[key]
        else:
            result = embed(pdf_filename, slide_no)
            cache[key] = result
            return result

With this setup, the cache_embed function will first check if the result is already in the cache. If it is, the cached result is returned. Otherwise, it computes the result and stores it in the cache for future use. This avoids redundant computations and leverages disk storage for persistence across sessions.

@haesleinhuepf
Copy link
Member Author

CC @lea-33 - I asked gitbob in a more structured way than in #13 ;-)

@lea-33 lea-33 self-assigned this Dec 6, 2024
@lea-33 lea-33 added the enhancement New feature or request label Dec 6, 2024
@lea-33 lea-33 linked a pull request Dec 11, 2024 that will close this issue
@lea-33 lea-33 reopened this Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants