Skip to content

Unexpected performance results for jvector on 100M dataset compared to on-disk knn #213

@jinjunzh

Description

@jinjunzh

Is there any benchmark performance comparison between the current jvector plugin and the on-disk mode with knn plugin? I conducted a comparative test but did not observe a clear advantage of jvector in terms of performance and resource usage. Does this align with expectations?
On the SIFT 1M dataset, I can see the benefits of jvector. However, at the 100M scale, jvector's performance is an order of magnitude worse than the on-disk approach, even under memory-constrained conditions with high page misses.
Image

Additionally, I observed that during query stress testing under memory constraints, jvector's disk I/O throughput is an order of magnitude higher than the on-disk method, along with higher JVM heap usage and storage space consumption by an order of magnitude.
My understanding is that jvector should demonstrate better competitive advantages at ultra-large data scales. However, the results above do not support this. Are there any other explanations or suggestions?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions