-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate Graph Reordering into HNSW #9
Comments
Hello all, Jean, we discussed this idea in our GSearch (https://www.biorxiv.org/content/10.1101/2022.10.21.513218v2) paper but have not implemented it yet. It would be nice to have this idea implemented if @BlaiseMuhirwa is interested. I feel like the cache ordering is useful while the random sampling is more complicated considering the fact that the hierarchy is already there. Thanks, |
I would be interested in adding the graph reordering feature. On a separate note, it would also be useful to include benchmark numbers (recall@k, QPS) in the README on the ANN benchmark datasets. |
Hi,
Possibly https://arxiv.org/abs/1707.00143 is more interesting?
Le ven. 26 mai 2023 à 21:01, Blaise Muhirwa ***@***.***> a
écrit :
… I would be interested in adding the graph reordering feature. On a
separate note, it would also be useful to include benchmark numbers
***@***.***, QPS) in the README on the ANN benchmark datasets.
—
Reply to this email directly, view it on GitHub
<#9 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEXM7NOIBLW5TSLWHBCNIDTXID4ZDANCNFSM6AAAAAAYPHHVDI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Coleman et al. showed that reordering the nodes in every layer such that the neighbors of each node are laid out closer in memory improves query time performance by about 40%. The idea is that reordering provides a cache-efficient search mechanism that reduces the search overhead due to random accesses in HNSW.
They also showed that using hierarchy is not strictly necessary in certain settings. They replaced the hierarchy with "a process where we randomly sample 50 nodes and use the closest option as the initialization." They observed no statistically significant difference between the hierarchical search procedure and this random sampling process in terms of recall or query time over 10k items.
I can work on integrating these features.
The text was updated successfully, but these errors were encountered: