You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current implementation of get_memory_usage() in BaseANN uses psutil.Process().memory_info().rss to measure memory usage, which may not accurately represent the actual memory consumption when using memory-mapped indexing. We are testing our soon-to-be-released new algorithm, and ann-benchmarks showed the same memory usage for both In-Memory and On-Disk implementations, which prompted further investigation.
The RSS value includes both private and shared memory pages. So when using memory-mapped indexing, RSS value can report significantly high RAM footprint. However, the system can reclaim the memory space used by memory-mapped files, so the process does not truly own this memory, though it's included in RSS measurements.
One possible fix would be to subtract shared space from RSS, or better yet, use USS (Unique Set Size) from memory_full_info(). This will be more accurate in both cases, in-memory or on-disk indexing.
"Determining how much memory a process really uses is not an easy matter. RSS (Resident Set Size), which is what most people usually rely on, is misleading because it includes both the memory which is unique to the process and the memory shared with other processes. What would be more interesting in terms of profiling is the memory which would be freed if the process was terminated right now. In the Linux world this is called USS (Unique Set Size), and this is the major feature which was introduced in psutil 4.0.0 (not only for Linux but also for Windows and OSX)."
Thank you for looking into this issue.
The text was updated successfully, but these errors were encountered:
The current implementation of
get_memory_usage()
inBaseANN
usespsutil.Process().memory_info().rss
to measure memory usage, which may not accurately represent the actual memory consumption when using memory-mapped indexing. We are testing our soon-to-be-released new algorithm, andann-benchmarks
showed the same memory usage for both In-Memory and On-Disk implementations, which prompted further investigation.The RSS value includes both private and shared memory pages. So when using memory-mapped indexing, RSS value can report significantly high RAM footprint. However, the system can reclaim the memory space used by memory-mapped files, so the process does not truly own this memory, though it's included in RSS measurements.
One possible fix would be to subtract shared space from RSS, or better yet, use USS (Unique Set Size) from
memory_full_info()
. This will be more accurate in both cases, in-memory or on-disk indexing.psutils man page: https://psutil.readthedocs.io/en/latest/#psutil.Process.memory_full_info
There is a blog post referenced in the psutils man page: https://gmpy.dev/blog/2016/real-process-memory-and-environ-in-python
From the blog:
Thank you for looking into this issue.
The text was updated successfully, but these errors were encountered: