Memory usage tracking is possibly not so accurate (especially with memory-mapped files) #581

mesibo · 2025-03-28T06:13:14Z

The current implementation of get_memory_usage() in BaseANN uses psutil.Process().memory_info().rss to measure memory usage, which may not accurately represent the actual memory consumption when using memory-mapped indexing. We are testing our soon-to-be-released new algorithm, and ann-benchmarks showed the same memory usage for both In-Memory and On-Disk implementations, which prompted further investigation.

The RSS value includes both private and shared memory pages. So when using memory-mapped indexing, RSS value can report significantly high RAM footprint. However, the system can reclaim the memory space used by memory-mapped files, so the process does not truly own this memory, though it's included in RSS measurements.

One possible fix would be to subtract shared space from RSS, or better yet, use USS (Unique Set Size) from memory_full_info(). This will be more accurate in both cases, in-memory or on-disk indexing.

psutils man page: https://psutil.readthedocs.io/en/latest/#psutil.Process.memory_full_info

There is a blog post referenced in the psutils man page: https://gmpy.dev/blog/2016/real-process-memory-and-environ-in-python

From the blog:

"Determining how much memory a process really uses is not an easy matter. RSS (Resident Set Size), which is what most people usually rely on, is misleading because it includes both the memory which is unique to the process and the memory shared with other processes. What would be more interesting in terms of profiling is the memory which would be freed if the process was terminated right now. In the Linux world this is called USS (Unique Set Size), and this is the major feature which was introduced in psutil 4.0.0 (not only for Linux but also for Windows and OSX)."

Thank you for looking into this issue.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory usage tracking is possibly not so accurate (especially with memory-mapped files) #581

Memory usage tracking is possibly not so accurate (especially with memory-mapped files) #581

mesibo commented Mar 28, 2025

Memory usage tracking is possibly not so accurate (especially with memory-mapped files) #581

Memory usage tracking is possibly not so accurate (especially with memory-mapped files) #581

Comments

mesibo commented Mar 28, 2025