Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request - duplicate directories #42

Open
trushworth opened this issue Dec 3, 2021 · 0 comments
Open

Feature request - duplicate directories #42

trushworth opened this issue Dec 3, 2021 · 0 comments

Comments

@trushworth
Copy link

A directory is a duplicate if everything in it (regular files, hidden files, and sub-directories) is identical to the contents of some other directory. This would be easy to do by defining a hash for a directory as the hash of the sorted list of hashes of all of its contents. It needs to be sorted by the hash values in case files or sub-directories have different names. The database could simply treat directories as another kind of file, although I expect it might be better to have a way to distinguish them, if only for listing.

A simple flage like "--directories" could be used to enable this during scan, and possibly to show only directories for a list command. I haven't really thought about the flags an how best they could be made consistent with the existing flags.

This would be interesting in the case where things like source code trees have been copied or picture databases or any directory tree structure that is fairly large. If the user can know that entire directory trees are duplicates the whole of a duplicate tree can be removed with "rm -r " instead of having to work through the files one by one, and then the empty directories, The workflow might end up something like:

  1. find and remove duplicate directory trees (i.e. groups of files at once)
  2. find and remove individula duplicate files (as we do now)

And by the way, thanks for a useful tool! I've stalled out on duplicate removal many times just because the other tools don't manage incremental work all that well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant