rdsquashfs feature suggestion: hardlink duplicate files on extract #73

Zaxim · 2020-11-11T23:35:10Z

tl;dr: I have a squashfs file with millions of duplicated files in them, it would be awesome to be able to extract the image and hardlink (or reflink) the duplicated files

My specific use case is an abuse of the intended functionality of squashfs, but I have been using squashfs as a directory archival tool to consolidate dozens of Apple Time Machine backup folders [1]. Time Machine uses directory hardlinks to snapshot the entire filesystems and preserve space, but I have Time Machine backups from different drives and systems which don't share those hardlinks but have very similar files. mksquashfs has been the only tool that's been able to scale to the number of files and hardlinks that I'm dealing with and properly do deduplication as I append directories to my single squashfs file.

I can always mount the squashfs image and browse to the specific files/folders I want to retrieve, but I was thinking it would be cool to be able extract the image and use the deduplication table to create files on the disk as hardlinks or reflinks on COW filesystems such as BTRFS. I'm not sure how hard this would be to implement in rdsquashfs to do so.

[1] There are pitfalls with using mksquashfs on Apple Time Machine folders. Namely, squashfs does not support all the crazy xattr stuff that macOS applies to files, so some things don't restore completely, but as a file archive, it works fine.

AgentD · 2020-11-13T13:09:07Z

Only unpacking duplicated files once and creating copy-on-write reflinks sounds like a very interesting idea.

On Linux this would be done with an FICLONE, FICLONERANGE or FIDEDUPERANGE ioctl. On MacOS and *BSD I have not found an explicit way to do this yet. I think this can be done implicitly through the fcopyfile syscall on MacOS.

AgentD mentioned this issue Nov 13, 2020

libsquashfs: support lowlevel scanning/mapping #75

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rdsquashfs feature suggestion: hardlink duplicate files on extract #73

rdsquashfs feature suggestion: hardlink duplicate files on extract #73

Zaxim commented Nov 11, 2020 •

edited

Loading

AgentD commented Nov 13, 2020 •

edited

Loading

rdsquashfs feature suggestion: hardlink duplicate files on extract #73

rdsquashfs feature suggestion: hardlink duplicate files on extract #73

Comments

Zaxim commented Nov 11, 2020 • edited Loading

AgentD commented Nov 13, 2020 • edited Loading

Zaxim commented Nov 11, 2020 •

edited

Loading

AgentD commented Nov 13, 2020 •

edited

Loading