Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing to Shared Memory Index in Multi-processing environment #179

Open
MadhavEsDios opened this issue Mar 18, 2025 · 2 comments
Open

Writing to Shared Memory Index in Multi-processing environment #179

MadhavEsDios opened this issue Mar 18, 2025 · 2 comments

Comments

@MadhavEsDios
Copy link

Hi developers,
Firstly, thanks a lot for you excellent work with this repo !

I am currently using the NGT Index compiled with the -DNGT_SHARED_MEMORY_ALLOCATOR=ON flag.
NGT version: 2.3.12
Environment: Python 3.9

Context:
I am using NGT in a multi-processing environment (specifically a library called Celery), where there are for eg. 5 read processes and 1 write process.
The single write process is responsible for adding new objects to the NGT Index, and ensures that this is done in a sequential manner, so that no locking mechanism is required.

Each of these 6 processes have an instance of the shared-memory NGT Index, with only the write process opening the index in read_only=False mode.

My use case is that, new objects added via the write process should be visible to the other 5 read processes as soon as a write operation has been completed (i.e. the index has been rebuilt and saved).

Issue:
After some debugging I have realised that currently it is not possible for the other 5 read processes to be aware of new objects added via the write process to the shared index automatically. For me to be able to access the newly added objects in other processes, I have to close and re-open the index in each of the processes.

I was under the impression that since the index is in shared memory mode, any newly added objects should be automatically accessible to other processes.

Questions:

  1. Even in shared memory mode, is there some kind of metadata-cache created in RAM for each index instance ?
    Could this be the reason why new objects are not automatically reflected in other processes ?

2)Is there any better way other than having to close and re-open the index in each process, as this seems to be a fairly standard use case ?
For example maybe there exists a refresh flag that automatically on addition of objects to one instance of the index refreshes all other instances as well.

Thanks a lot for your time and I look forward to your response !

@masajiro
Copy link
Member

Thank you for using NGT.
I don't remember the detailed behavior, but I think updates will be reflected in other processes if it's in shared memory mode.
First, could you check if the index has the following file structure?

grp grpc objpo objpoc prf robjpo robjpoc trei treic trel trelc

In shared memory mode, the file structure should be as shown above.

@MadhavEsDios
Copy link
Author

MadhavEsDios commented Mar 19, 2025

@masajiro Thanks a lot for your super prompt response !

I can indeed confirm that I have the same file structure for my index:
"grp grpc objpo objpoc prf robjpo robjpoc trei treic trel trelc"

Note: I am also currently debugging some stuff on my end, so will also get back to you/ close the issue if indeed it was a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants