Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(snapshots): optimise memory consumption for restores #3465

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

arouene
Copy link
Contributor

@arouene arouene commented Nov 23, 2023

Code was stacking closures of metadata of the whole snapshot before starting to process the files.
This PR invert the logic by processing the directories breadth-first, freeing memory as the files are processed.

Resolves #3460.

Code was stacking closures of metadata of the whole snapshot before starting to
process the files.

This PR invert the logic by processing the directories breadth-first, freeing
memory as the files are processed.
@arouene arouene changed the title fix(restore): optimise memory consumption for restores fix(snapshots): optimise memory consumption for restores Nov 23, 2023
@renardguill
Copy link

👍

Copy link

codecov bot commented Dec 1, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (9471d28) 75.83% compared to head (c6df24b) 75.84%.
Report is 7 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3465   +/-   ##
=======================================
  Coverage   75.83%   75.84%           
=======================================
  Files         465      465           
  Lines       37166    37166           
=======================================
+ Hits        28184    28187    +3     
+ Misses       7050     7048    -2     
+ Partials     1932     1931    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jkowalski
Copy link
Contributor

In principle I like this, the only concern is progress reporting - it will likely mess up ETA estimation for restores.

Can you run this change before/after on some large directory and perhaps capture couple screenshots when this is running?

@arouene
Copy link
Contributor Author

arouene commented Dec 1, 2023

In principle I like this, the only concern is progress reporting - it will likely mess up ETA estimation for restores.

Can you run this change before/after on some large directory and perhaps capture couple screenshots when this is running?

Yes sure, I will try to get more information / stats / screenshots.

For information, the snapshots of my tests are done on a 5 TB volume of data (shared drive of plain files).

@arouene
Copy link
Contributor Author

arouene commented Dec 1, 2023

I understand for the ETA, I have no stats on this for now. But on memory the win is clear.

As I said, my playground is about 5TB,

Screenshot from 2023-12-01 11-10-58

Our biggest folder is: 1 478 455 files
image

The memory consumption was about 10GB at least

image

And is now about 1GB (the screenshot is taken from the last restore today, which says 701MiB of heap)

Screenshot from 2023-12-01 11-46-23

Screenshot from 2023-12-01 11-45-40

@Lyndon-Li
Copy link
Contributor

In principle I like this, the only concern is progress reporting - it will likely mess up ETA estimation for restores.

Can you run this change before/after on some large directory and perhaps capture couple screenshots when this is running?

In this way, probably, the totalBytes grows together with doneBytes, which is not what we want to see.

As an alternative, can we divide the workers in two sets, one consumes from front and the other consumes from back? We can also adjust the number of workers in each set to adjust the priority.

@julio-lopez
Copy link
Collaborator

julio-lopez commented Dec 7, 2023

@arouene what do the stats shown above belong to? (~5 TB, ~26 million files, ~18K directories)

Is that the total content in the repository or something else?

Also, what's the makeup of the directory being restored?

  • What's the maximum directory depth? average, etc?
  • What's the largest directory size (in number of entries)? and what's the typical (not average, more akin to the mode) number of entries in a directory?

@julio-lopez julio-lopez requested review from jkowalski and removed request for julio-lopez December 8, 2023 02:44
@arouene
Copy link
Contributor Author

arouene commented Dec 8, 2023

Thanks for your interest in this PR! I'm trying to get better stats...

@arouene what do the stats shown above belong to? (~5 TB, ~26 million files, ~18K directories)

Is that the total content in the repository or something else?

It's the total content of the directory which is snapshoted, it's also about the size of the S3 repository as the files are not changing much nor often.

Also, what's the makeup of the directory being restored?

* What's the maximum directory depth? average, etc?

Number of directories: 18 530
Max directory depth: 7

So the layout is very flat, with one big directory that contains a long list of sub-directories that them-selves contains a lot of files.

Like so :

\DFS
    \EXPORT\
    \SGF\ -> 972 direct sub-folders
        \...\ -> about thousand files
        \...\ -> about thousand files
        \...\ -> about thousand files
        ...
    \TEMP\
    \WEBSERVICES\
* What's the largest directory size (in number of entries)? and what's the typical (not average, more akin to the mode) number of entries in a directory?

The largest directory is 1 149 564 files (at depth 4).

the typical, is about a thousand files per folder (for about a thousand folders), with some subfolders sometimes that have many extra files.

Would it be possible to get the number of files/directories to restore with the size, without caching all the metadata ?
For what I understand, we can only get thoses informations by getting all the manifests blocks from the remote repository ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

High memory consumption at restore
5 participants