Skip to content

Commit f1ec35b

Browse files
committed
FAQ: wip
1 parent dfc2d00 commit f1ec35b

File tree

1 file changed

+230
-0
lines changed

1 file changed

+230
-0
lines changed

Documentation/FAQ.md

Lines changed: 230 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,230 @@
1+
# Frequently Answered Questions
2+
3+
## Table of Contents
4+
5+
* [Why did `git-filter-repo` rewrite commit hashes?](#why-did-git-filter-repo-rewrite-commit-hashes)
6+
* [Why did `git-filter-repo` rewrite more commit hashes than I expected?](#why-did-git-filter-repo-rewrite-more-commit-hashes-than-i-expected)
7+
* [Why did `git-filter-repo` rewrite other branches too?](#why-did-git-filter-repo-rewrite-other-branches-too)
8+
* [Help! Can I recover or undo the filtering?](#help-can-i-recover-or-undo-the-filtering)
9+
* [Can you change `git-filter-repo` to allow future folks to recover from `--force`'d rewrites?](#can-you-change-git-filter-repo-to-allow-future-folks-to-recover-from---forced-rewrites)
10+
* [Can I use `git-filter-repo` to fix a repository with corruption?](#Can-I-use-git-filter-repo-to-fix-a-repository-with-corruption)
11+
* [What kinds of problems does `git-filter-repo` not try to solve?](#What-kinds-of-problems-does-git-filter-repo-not-try-to-solve)
12+
13+
14+
## Why did `git-filter-repo` rewrite commit hashes?
15+
16+
This is fundamental to how Git operates. In more detail...
17+
18+
Each commit in Git is a hash of its contents. Those contents include
19+
the commit message, the author (name, email, and time authored), the
20+
committer (name, email and time committed), the toplevel tree hash,
21+
and the parent(s) of the commit. This means that if any of the commit
22+
fields change, including the tree hash or the hash of the parent(s) of
23+
the commit, then the hash for the commit will change.
24+
25+
(The same is true for files ("blobs") and trees stored in git as well;
26+
each is a hash of its contents, so literally if anything changes, the
27+
commit hash will change.)
28+
29+
If you attempt to write commit (or tree or blob) objects with an
30+
incorrect hash, Git will reject it as corrupt.
31+
32+
## Why did `git-filter-repo` rewrite more commit hashes than I expected?
33+
34+
There are two aspects to this, or two possible underlying questions users
35+
might be asking here:
36+
* Why did commits newer than the ones I expected have their hash change?
37+
* Why did commits older than the ones I expected have their hash change?
38+
39+
For the first question, see [why filter-repo rewrites commit
40+
hashes](#why-did-git-filter-repo-rewrite-commit-hashes), and note that
41+
if you modify some old commit, perhaps to remove a file, then obviously
42+
that commit's hash must change. Further, since that commit will have a
43+
new hash, any other commit with that commit as a parent will need to
44+
have a new hash. That will need to chain all the way to the most recent
45+
commits in history. This is fundamental to Git and there is nothing you
46+
can do to change this.
47+
48+
For the second question, there are two causes: (1) the filter you
49+
specified applies to the older commits too, or (2) git-fast-export and
50+
git-fast-import (both of which git-filter-repo uses) canonicalize
51+
history in various ways. The second cause means that even if you have
52+
no filter, these tools sometimes change commit hashes. This can happen
53+
in any of these cases:
54+
55+
* If you have signed commits, the signatures will be stripped
56+
* If you have commits with extended headers, the extended headers will
57+
be stripped (signed commits are actually a special case of this)
58+
* If you have commits in an encoding other than UTF-8, they will by
59+
default be re-encoded into UTF-8
60+
* If you have a commit without an author, one will be added that
61+
matches the committer.
62+
* If you have trees that are not canonical (e.g. incorrect sorting
63+
order), they will be canonicalized
64+
65+
If this affects you and you really only want to rewrite newer commits in
66+
history, you can use the `--refs` argument to git-filter-repo to specify
67+
a range of history that you want rewritten.
68+
69+
(For those attempting to be clever and use `--refs` for the first
70+
question: Note that if you attempt to only rewrite a few old commits,
71+
then all you'll succeed in is adding new commits that won't be part of
72+
any branch and will be subject to garbage collection. The branches will
73+
still hold on to the unrewritten versions of the commits. Thus, you
74+
have to rewrite all the way to the branch tip for the rewrite to be
75+
meaningful. Said another way, the `--refs` trick is only useful for
76+
restricting the rewrite to newer commits, never for restricting the
77+
rewrite to older commits.)
78+
79+
## Why did `git-filter-repo` rewrite other branches too?
80+
81+
git-filter-repo's name is git-filter-**_repo_**. Obviously it is going
82+
to rewrite all branches by default.
83+
84+
`git-filter-repo` can restrict its rewriting to a subset of history,
85+
such as a single branch, using the `--refs` option. However, using that
86+
comes with the risk that one branch now has a different version of some
87+
commits than other branches do; usually, when you rewrite history, you
88+
want all branches that depend on what you are rewriting to be updated.
89+
90+
## Help! Can I recover or undo the filtering?
91+
92+
Sure, _if_ you followed the instructions. The instructions told you to
93+
make a fresh clone before running git-filter-repo. If you did that, you
94+
can just throw away your clone with the flubbed rewrite, and make a new
95+
clone.
96+
97+
If you didn't make a fresh clone, and you didn't run with `--force`, you
98+
would have seen the following warning:
99+
```
100+
Aborting: Refusing to destructively overwrite repo history since
101+
this does not look like a fresh clone.
102+
[...]
103+
Please operate on a fresh clone instead. If you want to proceed
104+
anyway, use --force.
105+
```
106+
If you then added `--force`, well, you were warned.
107+
108+
If you didn't make a fresh clone, and you started with `--force`, and you
109+
didn't think to read the description of the `--force` option:
110+
```
111+
Ignore fresh clone checks and rewrite history (an irreversible
112+
operation, especially since it by default ends with an
113+
immediate pruning of reflogs and old objects).
114+
```
115+
and you didn't read even the beginning of the manual
116+
```
117+
git-filter-repo destructively rewrites history
118+
```
119+
and you think it's okay to run a command with `--force` in it on something
120+
you don't have a backup of, then now is the time to reasses your life
121+
choices. `--force` should be a pretty clear warning sign. (If someone
122+
on the internet suggested `--force`, you should complain at them very
123+
loudly, especially if it was on Stack Overflow or some similar site. And
124+
you should also learn to carefully vet commands suggested by others on the
125+
internet.)
126+
127+
See also the next question.
128+
129+
## Can you change `git-filter-repo` to allow future folks to recover from --force'd rewrites?
130+
131+
This will never be supported.
132+
133+
* Providing an alternate method to restore would require storing both
134+
the original history and the new history, meaning that those who are
135+
trying to shrink their repository size instead see it grow and have to
136+
figure out extra steps to expunge the old history to see the actual
137+
size savings. Experience showed with other tools that this was
138+
frustrating and difficult to figure out for many users. Providing an
139+
alternate method to restore would mean that users who are trying to
140+
purge sensitive data from their repository still find the sensitive
141+
data after the rewrite because it hasn't actually been purged. In
142+
order to actually purge it, they have to take extra steps, which again
143+
has made things difficult for users in the past with other tools.
144+
145+
* Providing an alternate method to restore would also mean trying to
146+
figure out what should be backed up and how. The obvious choices used
147+
by previous tools only actually provided partial backups (reflogs
148+
would be ignored for example, as would uncommitted changes whether
149+
staged or not). The only reasonable full backup mechanism is making a
150+
separate clone, which is both expensive and something the user can and
151+
should understand how to do on their own.
152+
153+
* Providing an alternate method to restore would also mean providing
154+
documentation on how to restore. Past methods by other tools in the
155+
history rewriting space suggested that it was rather difficult for
156+
users to figure out. Difficult enough, in fact, that users simply
157+
didn't ever use them. They instead made a separate clone before
158+
rewriting history and if they didn't like the rewrite, then they just
159+
blew it away and made a new clone to work with. Since that was
160+
observed to be the easy restoration method, I simply enforced it with
161+
this tool, requiring users who look like they might not be operating
162+
on a fresh clone to use the --force flag.
163+
164+
But more than all that, if there were an alternate method to restore,
165+
why would you have needed to specify the --force flag? Doesn't its
166+
existence (and the wording of its documentation) make it pretty clear on
167+
its own that there isn't going to be a way to restore?
168+
169+
## Can I use `git-filter-repo` to fix a repository with corruption?
170+
171+
Some kinds of corruption can be fixed, in conjunction with `git
172+
replace`. If `git fsck` reports warnings/errors for certain objects,
173+
you can often [replace them and rewrite
174+
history](examples-from-user-filed-issues.md#Handling-repository-corruption).
175+
176+
## What kinds of problems does `git-filter-repo` not try to solve?
177+
178+
This question is often asked in the form of "How do I..." or even
179+
written as a statement such as "I found a bug with `git-filter-repo`;
180+
the behavior I got was different than I expected..." But if you're
181+
trying to do one of the things below, then `git-filter-repo` is behaving
182+
as designed and the way you solve your problem is you use a different
183+
tool.
184+
185+
### Filtering history but magically keeping the same commit IDs
186+
187+
This is impossible. If you modify commits, or the files contained in
188+
them, then you change their commit IDs; this is [fundamental to
189+
Git](why-did-git-filter-repo-rewrite-commit-hashes).
190+
191+
However, _if_ you don't need to modify commits, but just don't want to
192+
download everything, then look into one of the following:
193+
* [partial clones](https://git-scm.com/docs/partial-clone)
194+
* the ugly, retarded hack known as [shallow clones](https://git-scm.com/docs/shallow)
195+
* a massive hack like [cheap fake
196+
clones](https://github.com/newren/sequester-old-big-blobs) that at
197+
least let you put your evil overlord laugh to use
198+
199+
### Bidirectional development between filtered and unfiltered repository (josh)
200+
201+
Some folks want to extract a subset of a repository, do development work
202+
on it, then bring those changes back to the original repository, and
203+
send further changes in both directions. Such a tool can be written
204+
using fast-export and fast-import, but would need to make very different
205+
design decisions than `git-filter-repo` did. Such a tool would be
206+
capable of supporting this kind of development, but lose the ability
207+
["to write arbitrary filters using a scripting
208+
language"](https://josh-project.github.io/josh/#concept) among other
209+
features that `git-filter-repo` has.
210+
211+
Such a tool exists; it's called [Josh](https://github.com/josh-project/josh).
212+
213+
```
214+
To guarantee filters are reversible we have to restrict the kind of
215+
filter that can be used; It is not possible to write arbitrary filters
216+
using a scripting language like is allowed in other tools
217+
```
218+
219+
### Filtering based on the difference (a.k.a. patch or change) between commits (rebase)
220+
### Conversion between different version control systems (reposurgeon)
221+
### Having two people filter their clone of the repository (with the same filtering command) and getting the same new commit IDs
222+
223+
<!--
224+
## How do I see what was removed?
225+
226+
* Give answer in terms of `git rev-list --objects --all` in both a
227+
separate fresh clone from before the rewrite and in the repo where
228+
the rewrite was done. Then find the objects that exist in the old
229+
but not the new.
230+
-->

0 commit comments

Comments
 (0)