Skip to content

Memory usage tracking is possibly not so accurate (especially with memory-mapped files) #581

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mesibo opened this issue Mar 28, 2025 · 0 comments

Comments

@mesibo
Copy link

mesibo commented Mar 28, 2025

The current implementation of get_memory_usage() in BaseANN uses psutil.Process().memory_info().rss to measure memory usage, which may not accurately represent the actual memory consumption when using memory-mapped indexing. We are testing our soon-to-be-released new algorithm, and ann-benchmarks showed the same memory usage for both In-Memory and On-Disk implementations, which prompted further investigation.

The RSS value includes both private and shared memory pages. So when using memory-mapped indexing, RSS value can report significantly high RAM footprint. However, the system can reclaim the memory space used by memory-mapped files, so the process does not truly own this memory, though it's included in RSS measurements.

One possible fix would be to subtract shared space from RSS, or better yet, use USS (Unique Set Size) from memory_full_info(). This will be more accurate in both cases, in-memory or on-disk indexing.

psutils man page: https://psutil.readthedocs.io/en/latest/#psutil.Process.memory_full_info

There is a blog post referenced in the psutils man page: https://gmpy.dev/blog/2016/real-process-memory-and-environ-in-python

From the blog:

"Determining how much memory a process really uses is not an easy matter. RSS (Resident Set Size), which is what most people usually rely on, is misleading because it includes both the memory which is unique to the process and the memory shared with other processes. What would be more interesting in terms of profiling is the memory which would be freed if the process was terminated right now. In the Linux world this is called USS (Unique Set Size), and this is the major feature which was introduced in psutil 4.0.0 (not only for Linux but also for Windows and OSX)."

Thank you for looking into this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant