Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for git partial clones #11463

Open
chadlwilson opened this issue Apr 8, 2023 · 5 comments
Open

Support for git partial clones #11463

chadlwilson opened this issue Apr 8, 2023 · 5 comments
Labels

Comments

@chadlwilson
Copy link
Member

Issue Type
  • Feature enhancement
Summary

It'd be great to have some degree of support for git partial clones.

Implementation Notes

On the server side:

  • I believe it's unnecessary to do full clones of the repository. On the server side GoCD needs to know complete history of the targetted branch so that it can compute modifications since the last revision, and report diffs, but I believe it generally does not need blobs and believe it does need trees
  • Currently the server side does --no-checkout clones but cannot shallow clone due the above
  • It's possible that partial clones might be able to be made the default here.

On the client/agent side:

  • This is likely more challenging, or with more downsides (on-the fly fetching), so it'd likely have to be a configuration option, similar to shallow clones.
Motivations
  • Server: Polling repos can be slow, and a bottleneck on GoCD servers, especially with slow or rate-limited remotes
  • Server: Flyweight clones can take up a lot of space, especially when people have binaries in repos, either in history or by design (even though that is not a good idea)
  • Agent: Clones can also take up a lot of space on static agents, or take a lot of time to do fresh on elastic agents. Shallow clones don't work well with some pipelines, because the revision to build is often a # of commits back in the history, and the repo needs to be unshallowed. Unshallowing repos can be expensive for git remotes.
    • Partial clones may also suffer from some challenges here, however fetching the missing blobs/tree on the fly to go back n revisions is likely to be a lot less costly than fetching more revisions on a shallow clone, and a lot less expensive than full unshallowing
Possible challenges
  • interactions with arbitrary ref specs for configured "branch"
  • interacting with existing clones
    • should the server re-clone everything?
    • on client/agent side, what if it's a shallow clone?
  • should the existing shallow clone support have been doing --single-branch
  • old git versions, believe it is supported 2.24.0+, most performant/reliable in 2.29.0+(?)
  • submodules. Eugh.
    • Possibly should not support partial clones when there are submodules, for ease of implementation?
@kterui9019
Copy link

I think this is a great feature proposal!

We are running gocd on kubernetes and we are having trouble with elastic agent starting up very slowly because the material is monorepo.
Any possible solutions until this feature is supported?

@chadlwilson
Copy link
Member Author

If agent slowness is the main concern (rather than server), shallow clones might help if you don't need the entire commit history on your agents for builds, you are not commonly building historical revisions more than 100 revisions back from HEAD and HEAD does not contain references to massive binaries that would need to be pulled anyway.

You may also get different performance characteristics on HTTPS vs SSH clones, depending on your repository manager.

Otherwise, not much choice but to reduce the size of your repository and rewrite history, or build off a fork with rewritten history?

@kterui9019
Copy link

We had enabled shallow clone but never cared whether the cloning method was HTTPS or SSH.
I will try to see if it will be faster. Thank you very much.

@chadlwilson
Copy link
Member Author

Worth experimenting with. It kind of depends where the slowness is coming from (raw transfer speeds, work the repository manager needs to do etc). There are env vars such as GIT_CURL_VERBOSE=1 GIT_TRACE=1 which one can sometimes set to see (on the agent) where things are taking time within git commands.

It's also worth noting that shallow clones can be very expensive for the repository manager to deal with and compute if GoCD is having to "unshallow" the clones to fetch the specific revision needed for the build. In some cases and repos that can be slower than not using shallow clones at all. The GitHub blog linked in the description discusses this.

@stale
Copy link

stale bot commented Aug 12, 2023

This issue has been automatically marked as stale because it has not had activity in the last 90 days.
If you can still reproduce this error on the master branch using local development environment or on the latest GoCD Release, please reply with all of the information you have about it in order to keep the issue open.
Thank you for all your contributions.

@stale stale bot added the stale label Aug 12, 2023
@chadlwilson chadlwilson added no stalebot Don't mark this stale. and removed stale labels Aug 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants