Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Option to reset committer but keep author #379

Open
sm-Fifteen opened this issue Jul 6, 2022 · 6 comments
Open

Feature request: Option to reset committer but keep author #379

sm-Fifteen opened this issue Jul 6, 2022 · 6 comments

Comments

@sm-Fifteen
Copy link

Usually, tools that alter commits and rewrite history like git rebase or git commit --amend will overwrite the "Commiter" and "Commit Date" fields to match the user who created the new commits, such that "Author" still reflects the person who wrote the changes, while making it clear that someone else reorganized the commit history and pushed these commits.

Commit ownership is something some servers like BitBucket may check when new commits are pushed, so correctly setting the committer's identity is important. This is currently possible with a commit callback, if somewhat clumsy:

git filter-repo --path src/ --to-subdirectory-filter my-module --tag-rename '':'my-module-' --commit-callback '
    if commit.committer_email != b"[email protected]":
        commit.committer_name = b"Me"
        commit.committer_email = b"[email protected]"
'

Could this be made into a standard option flag? Something like --reset-committer, maybe?

@joneff
Copy link

joneff commented Jul 12, 2022

I ran in the same issue when doing extensive linting (that requires commit amending and rebasing). It's quite easy:

# run after all the other scripts
git filter-repo --commit-callback '
  commit.committer_name = commit.author_name
  commit.committer_email = commit.author_email
  commit.committer_date = commit.author_date 
'

This resets all the committer data to the author data. Works like a charm.

@sm-Fifteen
Copy link
Author

@joneff: That isn't the use case I was talking about, though. Some git servers will refuse your pushes if you're trying to push new commits where you are not the committer. This issue is about introducing a quick and simple way to ensure that you are the committer on all created commits, regardless of who the original author is (which is the reason why author and committer are separate fields to begin with). Your callback instead resets the committer on all commits to match the author, which is not what I would want/need in this case.

@newren
Copy link
Owner

newren commented Aug 1, 2024

@sm-Fifteen: You've made the invocation unnecessarily complex. The --path, --to-subdirectory-filter, and --tag-rename options you provided have nothing to do with your stated objective, so the command line is just:

git filter-repo --commit-callback '
    if commit.committer_email != b"[email protected]":
        commit.committer_name = b"Me"
        commit.committer_email = b"[email protected]"
'
although the `if`-statement ends up being somewhat meaningless; you could just assign unconditionally since it wouldn't hurt for the commits where you are already the committer, which further simplifies the command to:

git filter-repo --commit-callback '
commit.committer_name = b"Me"
commit.committer_email = b"[email protected]"
'

I'm a little surprised that you say you want to rewrite all the commits to be owned by you, though. Don't you mean all "new" commits, since some point in history that have already been pushed? Or is this a fork of an open source repository, or a copy of a repository that used to be tracked in some other system that you are trying to push to BitBucket? If the latter, the hook is stupid and should be disabled for your initial uploading (if not just disabled entirely, though I never won that battle at my previous workplace where they had a similar stupid hook; but they were totally willing to disable it for initial imports).

If you do mean to just rewrite all the new commits, then the --reset-committer option doesn't really make sense because the interesting bit is specifying the range of which commits to rewrite, which cannot be deduced by the tool. Further, it's not obvious how the tool will deduce the values of b"Me" and b"[email protected]" either? Maybe your answer is consulting the git config values of user.name and user.email, which may work for your niche usecase, but what about those doing filtering on behalf of someone else? Besides, regardless of the values of user.name, user.email, and the refs to operate on, this command line is already quite simple at just a few lines, so I don't see why it needs a special option for it.

Kind of going on a sidenote, but if you really do mean to rewrite ALL commits, then I think the usecase is broken and you should just disable the hook for the initial import. Trying to use filter-repo for that usecase is like trying to use a jackhammer for a problem where tweezers is the right tool; you might have forcibly fit things together in the end, but you've made a colossal mess of everything to force it into place. But that's a sidenote...

Now, everything above said, I may have misunderstood something about the usecase or the request, which might invalidate part of what I wrote above. Did I miss something?

@sm-Fifteen
Copy link
Author

@sm-Fifteen: You've made the invocation unnecessarily complex. The --path, --to-subdirectory-filter, and --tag-rename options you provided have nothing to do with your stated objective

@newren: The command in my OP was just a bit of added context to give an idea of the problem I was trying to use this for. I'm aware all arguments besides --commit-callback are not specifically related to the issue at hand.

although the if-statement ends up being somewhat meaningless; you could just assign unconditionally since it wouldn't hurt for the commits where you are already the committer, which further simplifies the command to:

git filter-repo --commit-callback ' commit.committer_name = b"Me" commit.committer_email = b"[[email protected]](mailto:[email protected])" '

I'm a little surprised that you say you want to rewrite all the commits to be owned by you, though. Don't you mean all "new" commits, since some point in history that have already been pushed? Or is this a fork of an open source repository, or a copy of a repository that used to be tracked in some other system that you are trying to push to BitBucket? If the latter, the hook is stupid and should be disabled for your initial uploading (if not just disabled entirely, though I never won that battle at my previous workplace where they had a similar stupid hook; but they were totally willing to disable it for initial imports).

I was trying to merge 5 repositories that used to be maintained separately into 5 subdirectories inside a single repo while preserving git history on all files. Almost all commits on these had been created by me, but there were a handful (15 or so out of 200, and restricted to 2 of those repos) that were authored by someone else, so that the Bitbucket instance I was working with refused to take them.

The condition is probably superfluous, you're right. I can't recall if I added that check just out of habit, or if it didn't have weird side effects to just to blindly set it everywhere. My notes from the time don't mention it.

If you do mean to just rewrite all the new commits, then the --reset-committer option doesn't really make sense because the interesting bit is specifying the range of which commits to rewrite, which cannot be deduced by the tool. Further, it's not obvious how the tool will deduce the values of b"Me" and b"[email protected]" either? Maybe your answer is consulting the git config values of user.name and user.email, which may work for your niche usecase, but what about those doing filtering on behalf of someone else? Besides, regardless of the values of user.name, user.email, and the refs to operate on, this command line is already quite simple at just a few lines, so I don't see why it needs a special option for it.

Well, like I said in my original post, the use case here is to deal with git servers that would forbid you from pushing any commit where you (the user logged in to update the remote) are not at least the listed committer (even without this, I just think it's more honest to point out that the commits did get modified by someone other than the original author). Stock git commands like amend and rebase will always set you as the committer (while preserving the original author), so a mean of automatically doing the same with filter-repo would have been conveinient. I can't speak for how niche or not this use-case is, as this is the first time I've had to use either filter-branch or filter-repo. Having to write custom Python code did the trick as an escape hatch, but I'm not sure if that I would want it to be a reccomended way of using the tool.

Kind of going on a sidenote, but if you really do mean to rewrite ALL commits, then I think the usecase is broken and you should just disable the hook for the initial import. Trying to use filter-repo for that usecase is like trying to use a jackhammer for a problem where tweezers is the right tool; you might have forcibly fit things together in the end, but you've made a colossal mess of everything to force it into place. But that's a sidenote...

I believe this was some kind of server-wide setting; I don't think I could have disabled it. I also disagree that the use case was "broken", it's just about creating a new repo with an empty origin commit, and then merging 5 separate trees with their own origin commits into subdirectories of that repo one by one. I wouldn't call it a jackhammer for what should have needed tweezers, the end result was extremely clean (and both tig and the Bitbucket commit visualizer handle this 6-origins-octopus-repo just fine). The comitter reset wouldn't have been needed had all the commits on all repos been initially authored by me, and I didn't want to pretend they were, as the whole point of the procedure was to preserve the git history of these previous repos.

Now, everything above said, I may have misunderstood something about the usecase or the request, which might invalidate part of what I wrote above. Did I miss something?

Well, just for clarity, I'll restate it plainly. I had a bunch of repos that I had created and maintained over the previous year and now needed to regroup under one repo, as directories, while preserving their respective histories. Fetching all the head commits and merging them wouldn't have worked, as there would have been conflicts, and moving the files just before merging would have basically lost the file history because most git frontends don't keep track of changes across moved files.

I needed to do the following for each repo:

  • Retroactively move all the files of each repo to a subdirectory as if they had always been there (--to-subdirectory-filter)
  • Add a prefix to each commit to make it clear what subdirectory they were targeting at the time (--message-callback)
  • Set myself as the comitter on all 15-20 commits not made by me so Bitbucket wouldn't think I'm trying to impersonate someone with the 200 commits I'm introducing to a repo that so far only has an initial commit (no clean way of doing this besides --commit-callback)
  • Merge the result with the new master using git merge --allow-unrelated-histories

I don't think it's such an outlandish use case, especially since adding an option like --reset-committer would merely mimic what amend and rebase already do automatically. Again, though, I'd never used repo-filter before (and I haven't needed to since), so I can't say how unusual that is. I don't do multi-repo grafting surgery on a regular basis.

@newren
Copy link
Owner

newren commented Aug 2, 2024

Ah, thanks for the extra explanations. I had a much different idea in mind (taking an existing repo and just rewriting committers so you can push), and combined with some extreme annoyance and overuse of similar hooks I saw elsewhere, I was just a bit triggered. Sorry for misunderstanding.

Having to write custom Python code did the trick as an escape hatch, but I'm not sure if that I would want it to be a reccomended way of using the tool.

I wouldn't call that an escape hatch. The tool originally started as nothing more than a library, meaning that not only was having users write some python code intended, it was the only possible way to use the tool. git-filter-repo has certainly changed dramatically since then as it has quite a few command line arguments and many users can get everything they need without writing any python, but the fact that you can do more generic things with callbacks (or even via a script that imports git_filter_repo.py) is very much a high priority usecase for the tool. That kind of flexibility was one of the things that users liked about filter-branch (though it used shell rather than python), and I wanted to have similar levels of flexibility in what I was offering as a replacement. In fact, I've been recently going through all the old issues collecting various cases where users had interesting callbacks or I suggested interesting use of callbacks, with the idea of making a document of some sort highlighting these additional usecases and how callbacks solve them.

While there may be some exceptions, in general I think that things that can be solved with a couple lines of python are probably bad candidates for a feature flag (but good candidates for this new document I want to write up).

@sm-Fifteen
Copy link
Author

Ah, thanks for the extra explanations. I had a much different idea in mind (taking an existing repo and just rewriting committers so you can push), and combined with some extreme annoyance and overuse of similar hooks I saw elsewhere, I was just a bit triggered. Sorry for misunderstanding.

No worries.

Having to write custom Python code did the trick as an escape hatch, but I'm not sure if that I would want it to be a reccomended way of using the tool.

I wouldn't call that an escape hatch. The tool originally started as nothing more than a library, meaning that not only was having users write some python code intended, it was the only possible way to use the tool. git-filter-repo has certainly changed dramatically since then as it has quite a few command line arguments and many users can get everything they need without writing any python, but the fact that you can do more generic things with callbacks (or even via a script that imports git_filter_repo.py) is very much a high priority usecase for the tool.

Oh, I'm not arguing that it should be removed or anything, just that it looks like automatically marking commits as having been created by someone other than the author on a tool designed to rewrite git history would be a good candidat for having its own option flag, even if it's just an extra line or two of Python code otherwise.

That kind of flexibility was one of the things that users liked about filter-branch (though it used shell rather than python), and I wanted to have similar levels of flexibility in what I was offering as a replacement. In fact, I've been recently going through all the old issues collecting various cases where users had interesting callbacks or I suggested interesting use of callbacks, with the idea of making a document of some sort highlighting these additional usecases and how callbacks solve them.

While there may be some exceptions, in general I think that things that can be solved with a couple lines of python are probably bad candidates for a feature flag (but good candidates for this new document I want to write up).

I guess that's fair. I figured it would be a "core-enough" scenario to warrant having its own option flag, but I won't claim to know the tool and its users better than you do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants