Skip to content

RonShih/FSC_DedupeRate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Target

Derive data deduplication rate of a file in C.

Run this program

  1. run make or command as below:
gcc program.c -o program -L/usr/local/opt/[email protected]/lib -I/usr/local/opt/[email protected]/include -lcrypto -lz -lm
  1. Execute this program with variable chunksize, hashtablesize, and file(s).
    Modify argv[ ] in main function to enable multiple files input.
./program chunksize(KB) hashtablesize file1 file2 ...

Example with only a file. (Note. If the target is a folder, tar it)

./dedupe 8 1000000 linux-5.3

Steps:

  1. Build a hash table with hashtablesize entries
  2. Read file(s) in chunksize byte stream
  3. Generate fingerprints from each chunk by SHA-1
  4. Fetch prefix(32bits) as hashcode of each fingerprint(160bits)
  5. Insert hashcode into hash table and store its fingerprint in lined list to avoid false positive
  6. When collision happens, traverse the link and compare their fingerprints.
  7. Derive dedup rate:
1 - (unique_chunks/total_chunks)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published