Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing Job Schedule #520

Open
jh0274 opened this issue Oct 27, 2023 · 5 comments
Open

Indexing Job Schedule #520

jh0274 opened this issue Oct 27, 2023 · 5 comments

Comments

@jh0274
Copy link

jh0274 commented Oct 27, 2023

Hi!

Thank you for the awesome plugin.. so good.

Can i ask why you took the decision to have the indexing job run every 60mins in obsidian? The reason i ask is because i wanted to reduce it but wondered if there would be knock-on impacts?

I'm also wondering whether you'd perhaps intended to have people use Khoj alongside other Obsidian searches that index much quicker? Omnisearch etc..

Thanks again!

@debanjum
Copy link
Collaborator

debanjum commented Nov 2, 2023

60mins is an arbitrary interval to run the indexing job. This was just based on the idea that folks may not need to search, chat about stuff they've just written down. This can be reduced to any interval that is greater than the time it takes Khoj to index your knowledge base and shouldn't have any other impact.

Changing the indexing job interval isn't trivial currently, as we don't expose a user configuration. So you'll have to modify the source code and build khoj locally.

The reason Khoj takes longer to index on first run (then say Omnisearch) is it uses a machine learning/AI model to generate the index. This is more compute intensive, than traditional indexing. It shouldn't take too much time to update the index on subsequent runs though (as it only updates changes and isn't indexing everything from scratch everytime)

@jh0274
Copy link
Author

jh0274 commented Nov 2, 2023 via email

@jh0274
Copy link
Author

jh0274 commented Nov 3, 2023

@debanjum so there's nothing wrong in the indexing. I just noticed that the query "who is around and wants to meetup in November" (for chat) returns results where periods of time (Q3 2023, Late October, Next year etc etc) are ranked higher than the result i was looking for which specifically mentions 'meeting up in November'. This makes sense given the nature of the search..

I've had a quick look at S-Bert (which i think hosts the underlying models?) for a way of configuring the encoding/embedding better deal with these types of searches/queries but not found much. Are you aware of any changes to config i can make to help this?

Thanks again!

@debanjum
Copy link
Collaborator

debanjum commented Nov 3, 2023

Oh are you using offline chat or OpenAI in Khoj?

The date awareness of the bare SBert search AI models isn't that great. But we work around that by exposing search query filters like the date filter. If you use OpenAI (preferably GPT4) for chat, the model can usually answer such questions better. The offline chat doesn't currently use query filter to reduce response latency.

@jh0274
Copy link
Author

jh0274 commented Nov 4, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants