What is the bug?
There is a significant native thread leak in the k-NN plugin during HNSW indexing. The operating system reports a continuous, linear increase in threads (LWPs) that does not plateau. In our environment, this reached over 77,000 threads, eventually causing the OS to kill the OpenSearch process or hit max_user_processes limits.
How can one reproduce the bug?
- Use OpenSearch 3.3.2 or 3.4.0 with the k-NN plugin installed.
- Configure an HNSW index.
- Ensure
knn.algo_param.index_thread_qty is set to a value > 1 (default on most multi-core systems).
- Perform continuous bulk indexing with a standard
refresh_interval (e.g., 1s or 60s).
- Monitor the OS thread count for the OpenSearch PID using
ps -o nlwp <PID> or btop.
- Observe that the thread count rises steadily and is never reclaimed, even after indexing stops.
What is the expected behavior?
The native thread pool used for HNSW graph construction should properly join and terminate worker threads or reuse a fixed-size pool. The OS-level thread count should remain stable and proportional to the configured thread pool settings.
What is your host/environment?
- OS: Ubuntu 24.04
- Version: 3.3.2 (also confirmed on 3.4.0)
- Plugins: k-NN (HNSW)