fix(snapshots): optimise memory consumption for restores #3465

arouene · 2023-11-23T19:20:29Z

Code was stacking closures of metadata of the whole snapshot before starting to process the files.
This PR invert the logic by processing the directories breadth-first, freeing memory as the files are processed.

Resolves #3460.

Code was stacking closures of metadata of the whole snapshot before starting to process the files. This PR invert the logic by processing the directories breadth-first, freeing memory as the files are processed.

renardguill · 2023-11-30T08:53:20Z

👍

codecov · 2023-12-01T04:46:44Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (9471d28) 75.83% compared to head (c6df24b) 75.84%.
Report is 7 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #3465   +/-   ##
=======================================
  Coverage   75.83%   75.84%           
=======================================
  Files         465      465           
  Lines       37166    37166           
=======================================
+ Hits        28184    28187    +3     
+ Misses       7050     7048    -2     
+ Partials     1932     1931    -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jkowalski · 2023-12-01T05:03:09Z

In principle I like this, the only concern is progress reporting - it will likely mess up ETA estimation for restores.

Can you run this change before/after on some large directory and perhaps capture couple screenshots when this is running?

arouene · 2023-12-01T09:47:58Z

In principle I like this, the only concern is progress reporting - it will likely mess up ETA estimation for restores.

Can you run this change before/after on some large directory and perhaps capture couple screenshots when this is running?

Yes sure, I will try to get more information / stats / screenshots.

For information, the snapshots of my tests are done on a 5 TB volume of data (shared drive of plain files).

arouene · 2023-12-01T21:25:40Z

I understand for the ETA, I have no stats on this for now. But on memory the win is clear.

As I said, my playground is about 5TB,

Our biggest folder is: 1 478 455 files

The memory consumption was about 10GB at least

And is now about 1GB (the screenshot is taken from the last restore today, which says 701MiB of heap)

Lyndon-Li · 2023-12-06T09:58:59Z

In principle I like this, the only concern is progress reporting - it will likely mess up ETA estimation for restores.

Can you run this change before/after on some large directory and perhaps capture couple screenshots when this is running?

In this way, probably, the totalBytes grows together with doneBytes, which is not what we want to see.

As an alternative, can we divide the workers in two sets, one consumes from front and the other consumes from back? We can also adjust the number of workers in each set to adjust the priority.

julio-lopez · 2023-12-07T00:58:35Z

@arouene what do the stats shown above belong to? (~5 TB, ~26 million files, ~18K directories)

Is that the total content in the repository or something else?

Also, what's the makeup of the directory being restored?

What's the maximum directory depth? average, etc?
What's the largest directory size (in number of entries)? and what's the typical (not average, more akin to the mode) number of entries in a directory?

arouene · 2023-12-08T13:59:09Z

Thanks for your interest in this PR! I'm trying to get better stats...

@arouene what do the stats shown above belong to? (~5 TB, ~26 million files, ~18K directories)

Is that the total content in the repository or something else?

It's the total content of the directory which is snapshoted, it's also about the size of the S3 repository as the files are not changing much nor often.

Also, what's the makeup of the directory being restored?
* What's the maximum directory depth? average, etc?

Number of directories: 18 530
Max directory depth: 7

So the layout is very flat, with one big directory that contains a long list of sub-directories that them-selves contains a lot of files.

Like so :

\DFS
    \EXPORT\
    \SGF\ -> 972 direct sub-folders
        \...\ -> about thousand files
        \...\ -> about thousand files
        \...\ -> about thousand files
        ...
    \TEMP\
    \WEBSERVICES\

* What's the largest directory size (in number of entries)? and what's the typical (not average, more akin to the mode) number of entries in a directory?

The largest directory is 1 149 564 files (at depth 4).

the typical, is about a thousand files per folder (for about a thousand folders), with some subfolders sometimes that have many extra files.

Would it be possible to get the number of files/directories to restore with the size, without caching all the metadata ?
For what I understand, we can only get thoses informations by getting all the manifests blocks from the remote repository ?

Fix restore memory consumption

35fb337

Code was stacking closures of metadata of the whole snapshot before starting to process the files. This PR invert the logic by processing the directories breadth-first, freeing memory as the files are processed.

arouene changed the title ~~fix(restore): optimise memory consumption for restores~~ fix(snapshots): optimise memory consumption for restores Nov 23, 2023

Merge branch 'master' into fix-restore-memory-consumption

1a42568

lupusA assigned jkowalski and julio-lopez and unassigned jkowalski and julio-lopez Nov 24, 2023

lupusA requested a review from julio-lopez November 24, 2023 04:49

Merge branch 'master' into fix-restore-memory-consumption

c6df24b

julio-lopez requested review from jkowalski and removed request for julio-lopez December 8, 2023 02:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(snapshots): optimise memory consumption for restores #3465

fix(snapshots): optimise memory consumption for restores #3465

arouene commented Nov 23, 2023

renardguill commented Nov 30, 2023

codecov bot commented Dec 1, 2023

jkowalski commented Dec 1, 2023

arouene commented Dec 1, 2023

arouene commented Dec 1, 2023 •

edited

Lyndon-Li commented Dec 6, 2023

julio-lopez commented Dec 7, 2023 •

edited

arouene commented Dec 8, 2023

fix(snapshots): optimise memory consumption for restores #3465

Are you sure you want to change the base?

fix(snapshots): optimise memory consumption for restores #3465

Conversation

arouene commented Nov 23, 2023

renardguill commented Nov 30, 2023

codecov bot commented Dec 1, 2023

Codecov Report

jkowalski commented Dec 1, 2023

arouene commented Dec 1, 2023

arouene commented Dec 1, 2023 • edited

Lyndon-Li commented Dec 6, 2023

julio-lopez commented Dec 7, 2023 • edited

arouene commented Dec 8, 2023

arouene commented Dec 1, 2023 •

edited

julio-lopez commented Dec 7, 2023 •

edited