Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rdsquashfs feature suggestion: hardlink duplicate files on extract #73

Open
Zaxim opened this issue Nov 11, 2020 · 1 comment
Open

rdsquashfs feature suggestion: hardlink duplicate files on extract #73

Zaxim opened this issue Nov 11, 2020 · 1 comment

Comments

@Zaxim
Copy link

Zaxim commented Nov 11, 2020

tl;dr: I have a squashfs file with millions of duplicated files in them, it would be awesome to be able to extract the image and hardlink (or reflink) the duplicated files

My specific use case is an abuse of the intended functionality of squashfs, but I have been using squashfs as a directory archival tool to consolidate dozens of Apple Time Machine backup folders [1]. Time Machine uses directory hardlinks to snapshot the entire filesystems and preserve space, but I have Time Machine backups from different drives and systems which don't share those hardlinks but have very similar files. mksquashfs has been the only tool that's been able to scale to the number of files and hardlinks that I'm dealing with and properly do deduplication as I append directories to my single squashfs file.

I can always mount the squashfs image and browse to the specific files/folders I want to retrieve, but I was thinking it would be cool to be able extract the image and use the deduplication table to create files on the disk as hardlinks or reflinks on COW filesystems such as BTRFS. I'm not sure how hard this would be to implement in rdsquashfs to do so.

[1] There are pitfalls with using mksquashfs on Apple Time Machine folders. Namely, squashfs does not support all the crazy xattr stuff that macOS applies to files, so some things don't restore completely, but as a file archive, it works fine.

@AgentD
Copy link
Owner

AgentD commented Nov 13, 2020

Only unpacking duplicated files once and creating copy-on-write reflinks sounds like a very interesting idea.

On Linux this would be done with an FICLONE, FICLONERANGE or FIDEDUPERANGE ioctl. On MacOS and *BSD I have not found an explicit way to do this yet. I think this can be done implicitly through the fcopyfile syscall on MacOS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants