Skip to content

dfirsec/dup_file_finder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Duplicate File Finder

Generic badge

This script scans a directory tree and identifies duplicate files with a given file extension. It uses SHA256 hashing to compare the files and outputs the duplicate matches to a CSV file.

File signatures courtesy of: fleep @ua-nick

Prerequisites

Python 3.8 or higher

Installation

  1. Clone the repository:
git clone https://github.com/dfirsec/dup_file_finder.git
  1. Navigate to the project directory:
cd dup_file_finder
  1. Install the dependencies using poetry:
poetry install

Usage

  1. Create the virtual environment
poetry shell
  1. Run using the following commands:
python dup_file_finder.py dirpath ext
  • dirpath: The directory path to scan for duplicate files.
  • ext: The file extension to scan for.

Example

python dup_file_finder.py /path/to/directory pdf

This will scan the specified directory for PDF files and identify duplicate matches. The results will be saved to a CSV file named duplicate_matches.csv in the results directory.

Contributing

Contributions are welcome! If you find any issues or have suggestions for improvement, please create an issue or submit a pull request.

License

This project is licensed under the MIT License.

About

Search for duplicate files based on extension.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages