Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Declare gitattributes, reorganize Git history, and open a GitHub maintenance support request #166

Open
samrocketman opened this issue May 26, 2024 · 0 comments

Comments

@samrocketman
Copy link

samrocketman commented May 26, 2024

Why

Developers and artists cloning the repo will clone a few gigabytes less (2.5GB saved in following tests).

Suggestion

GitHub will perform git gc only on demand. An org admin will need to open a support request with GitHub explicitly requesting git gc --aggressive on the asset repo.

Before this happens; the history should be rewritten with gitattributes to declare binaries in the initial commit.

Here's the .gitattributes file I used in tests.

* binary
*.cfdg text diff
*.md text diff
*.svg text=auto diff
.git* text diff

Background

Git handles binary deltas just fine; but you can improve how it handles binary data if you declare with Git you have binaries.

Git delta compression on binaries:

  • Original asset repo (no checkout): 8.4GB
  • Without gitattributes (git gc): 7.7GB
  • With gitattributes (git gc): 5.9GB
  • Raw assets size (single checkout without Git): 11GB

11GB of assets are tracked across 216 Git commits with automatic delta compression (8.4GB) packed into separate "blob packs". If you do Git maintanence and execute git gc --aggressive then git will "repack" all of the commits and determine their binary deltas (more efficient packing after you have a long history of commits).

If you add a .gitattributes file and reorganize the Git history so that the gitattributes is the initial commit, then Git appears to handle binary assets more efficiently than relying on their automatic heuristics for binary deltas.

File size by file extension:

File extension File size by type Type of file
7z 243M binary
blend 3.6G binary
cfdg 4.0K text
jpg 8.8M binary
kra 516K binary
md 4.0K text
odg 32K binary
png 2.8M binary
psd 4.0M binary
svg 3.0M text/mixed
xcf 6.3G binary
zip 189M binary
total 11G

Benchmark

git gc --aggressive benchmark (with reordered history with .gitattributes is initial commit):

70 minutes

$ time git gc --aggressive
Enumerating objects: 4300, done.
Counting objects: 100% (4300/4300), done.
Delta compression using up to 8 threads
Compressing objects:  83% (3586/4296)
Compressing objects: 100% (4296/4296), done.
Writing objects: 100% (4300/4300), done.
Selecting bitmap commits: 163, done.
Building bitmaps: 100% (106/106), done.
Total 4300 (delta 2034), reused 2167 (delta 0), pack-reused 0

real	70m31.830s
user	301m40.752s
sys	0m29.756s

Some source

Size calculation:

total_size() { grep -o '.*total$' | sed "s/\\([^ \\t]\\+\\).*total/${1}: \\1/";}

find * -type f | sed 's/^.*\.//' | sort -u | while read -er ext; do find * -type f -name "*.${ext}" -exec du -sch {} + | total_size "${ext}";done

du -shc * | total_size total

Background Reading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant