-
Notifications
You must be signed in to change notification settings - Fork 26.4k
keep track of unresolved value of symbolic-ref in ref iterators #1712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0ce74be
to
59acda5
Compare
e573f45
to
3f599da
Compare
3f599da
to
fe591ce
Compare
/preview |
Preview email sent as [email protected] |
There are issues in commit 9a89508: |
9a89508
to
6d5b1b6
Compare
/submit |
Submitted as [email protected] To fetch this version into
To fetch this version to local tag
|
add-interactive.c
Outdated
@@ -533,7 +533,7 @@ static int get_modified_files(struct repository *r, | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
ADMINISTRIVIA. Check the address you place on the CC: line. What
we can see for this message at
https://lore.kernel.org/git/011c10f488610b0a795a843bff66723477783761.1717694801.git.gitgitgadget@gmail.com/raw
looks like this.
Cc: "Phillip Wood [ ]" <[email protected]>,
Kristoffer Haugsbakk <[[email protected]]>,
"Jeff King [ ]" <[email protected]>,
"Patrick Steinhardt [ ]" <[email protected]>,
"=?UTF-8?Q?Jean-No=C3=ABl?= Avila [ ]" <[email protected]>,
John Cai <[email protected]>,
John Cai <[email protected]>
I fixed them manually, but it wasn't pleasant. I think we saw a
similar breakage earlier coming via GGG, but I do not recall the
details of how to cause such breakages (iow, what to avoid repeating
this).
Anyway.
"John Cai via GitGitGadget" <[email protected]> writes:
> 29 files changed, 64 insertions(+), 52 deletions(-)
Wow, the blast radius of this thing is rather big. Among these
existing callers of refs_resolve_ref_unsafe(), how many of them
will benefit from being able to pass a non NULL parameter at the end
of the series, and more importantly, in the future to take advantage
of the new feature possibly with a separate series?
I am assuming that this will benefit only a selected few and the
callers that would want to take advantage of the new feature will
remain low. Have you considered renaming refs_resolve_ref_unsafe()
to a new name (say, refs_resolve_ref_unsafe_with_referent()) and
implement the new feature (which is only triggered when the new
parameter gets a non NULL value), make refs_resolve_ref_unsafe() a
very thin wrapper that passes NULL to the new thing?
That way, you do not have to touch those existing callers that will
never benefit from the new capability in the future. You won't risk
conflicting with in flight topics semantically, either.
Or will they also benefit from the new feature in the future?
Offhand, I do not know how a caller like this ...
> diff --git a/add-interactive.c b/add-interactive.c
> index b5d6cd689a1..041d30cf2b3 100644
> --- a/add-interactive.c
> +++ b/add-interactive.c
> @@ -533,7 +533,7 @@ static int get_modified_files(struct repository *r,
> {
> struct object_id head_oid;
> int is_initial = !refs_resolve_ref_unsafe(get_main_ref_store(the_repository),
> - "HEAD", RESOLVE_REF_READING,
> + "HEAD", NULL, RESOLVE_REF_READING,
> &head_oid, NULL);
> struct collection_status s = { 0 };
> int i;
... would be helped.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Junio C Hamano <[email protected]> writes:
> "John Cai via GitGitGadget" <[email protected]> writes:
>
>> 29 files changed, 64 insertions(+), 52 deletions(-)
>
> Wow, the blast radius of this thing is rather big. Among these
> existing callers of refs_resolve_ref_unsafe(), how many of them
> will benefit from being able to pass a non NULL parameter at the end
> of the series, and more importantly, in the future to take advantage
> of the new feature possibly with a separate series?
> ...
> That way, you do not have to touch those existing callers that will
> never benefit from the new capability in the future. You won't risk
> conflicting with in flight topics semantically, either.
The same comment applies to [3/4], but I do not want to fix the Cc: header
again, so I am replying to this message.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, John Cai wrote (reply to this):
Hi Junio,
On 6 Jun 2024, at 14:21, Junio C Hamano wrote:
> ADMINISTRIVIA. Check the address you place on the CC: line. What
> we can see for this message at
>
> https://lore.kernel.org/git/011c10f488610b0a795a843bff66723477783761.1717694801.git.gitgitgadget@gmail.com/raw
>
> looks like this.
>
> Cc: "Phillip Wood [ ]" <[email protected]>,
> Kristoffer Haugsbakk <[[email protected]]>,
> "Jeff King [ ]" <[email protected]>,
> "Patrick Steinhardt [ ]" <[email protected]>,
> "=?UTF-8?Q?Jean-No=C3=ABl?= Avila [ ]" <[email protected]>,
> John Cai <[email protected]>,
> John Cai <[email protected]>
>
> I fixed them manually, but it wasn't pleasant. I think we saw a
> similar breakage earlier coming via GGG, but I do not recall the
> details of how to cause such breakages (iow, what to avoid repeating
> this).
oof, apologies. Didn't notice that. I'll be more mindful about the cc line.
>
> Anyway.
>
> "John Cai via GitGitGadget" <[email protected]> writes:
>
>> 29 files changed, 64 insertions(+), 52 deletions(-)
>
> Wow, the blast radius of this thing is rather big. Among these
> existing callers of refs_resolve_ref_unsafe(), how many of them
> will benefit from being able to pass a non NULL parameter at the end
> of the series, and more importantly, in the future to take advantage
> of the new feature possibly with a separate series?
>
> I am assuming that this will benefit only a selected few and the
> callers that would want to take advantage of the new feature will
> remain low. Have you considered renaming refs_resolve_ref_unsafe()
> to a new name (say, refs_resolve_ref_unsafe_with_referent()) and
> implement the new feature (which is only triggered when the new
> parameter gets a non NULL value), make refs_resolve_ref_unsafe() a
> very thin wrapper that passes NULL to the new thing?
>
> That way, you do not have to touch those existing callers that will
> never benefit from the new capability in the future. You won't risk
> conflicting with in flight topics semantically, either.
>
> Or will they also benefit from the new feature in the future?
>
> Offhand, I do not know how a caller like this ...
>
>> diff --git a/add-interactive.c b/add-interactive.c
>> index b5d6cd689a1..041d30cf2b3 100644
>> --- a/add-interactive.c
>> +++ b/add-interactive.c
>> @@ -533,7 +533,7 @@ static int get_modified_files(struct repository *r,
>> {
>> struct object_id head_oid;
>> int is_initial = !refs_resolve_ref_unsafe(get_main_ref_store(the_repository),
>> - "HEAD", RESOLVE_REF_READING,
>> + "HEAD", NULL, RESOLVE_REF_READING,
>> &head_oid, NULL);
>> struct collection_status s = { 0 };
>> int i;
>
> ... would be helped.
Good point. I was sensing the same thing as I made all the callsite changes so
this feedback makes it clear what we should do.
>
> Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
John Cai <[email protected]> writes:
> On 6 Jun 2024, at 14:21, Junio C Hamano wrote:
>
>> ADMINISTRIVIA. Check the address you place on the CC: line. What
>> we can see for this message at
>> ...
>> I fixed them manually, but it wasn't pleasant. I think we saw a
>> similar breakage earlier coming via GGG, but I do not recall the
>> details of how to cause such breakages (iow, what to avoid repeating
>> this).
>
> oof, apologies. Didn't notice that. I'll be more mindful about the cc line.
I found the previous occurrences of the same problem:
https://lore.kernel.org/git/[email protected]/
https://lore.kernel.org/git/[email protected]/
The last message in the thread
https://lore.kernel.org/git/CAMbn=[email protected]/
says that the original user of GGG found what was wrong in the way
the user was using GGG to send and fixed it, but unfortunately we
didn't hear exactly *what* the breakage was and *how* it was fixed.
Aryan, do you remember what the problem was and more importantly
what the fix was?
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, John Cai wrote (reply to this):
Hi Junio,
On 6 Jun 2024, at 14:23, Junio C Hamano wrote:
> Junio C Hamano <[email protected]> writes:
>
>> "John Cai via GitGitGadget" <[email protected]> writes:
>>
>>> 29 files changed, 64 insertions(+), 52 deletions(-)
>>
>> Wow, the blast radius of this thing is rather big. Among these
>> existing callers of refs_resolve_ref_unsafe(), how many of them
>> will benefit from being able to pass a non NULL parameter at the end
>> of the series, and more importantly, in the future to take advantage
>> of the new feature possibly with a separate series?
>> ...
>> That way, you do not have to touch those existing callers that will
>> never benefit from the new capability in the future. You won't risk
>> conflicting with in flight topics semantically, either.
>
> The same comment applies to [3/4], but I do not want to fix the Cc: header
> again, so I am replying to this message.
Yes, so for 3/4 I was exploring doing the same thing. However, each_repo_fn goes
pretty deep in the callstack and to provide an alternate set of functions that
use something like each_repo_referent_fn would still lead to a relatively large
blast radius, eg, something like:
diff --git a/ref-filter.c b/ref-filter.c
index f7fb0c7e0e..770d3715c2 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2631,12 +2631,12 @@ static int filter_exclude_match(struct ref_filter *filter, const char *refname)
* pattern match, so the callback still has to match each ref individually.
*/
static int for_each_fullref_in_pattern(struct ref_filter *filter,
- each_ref_fn cb,
+ each_ref_with_referent_fn cb,
void *cb_data)
{
if (filter->kind & FILTER_REFS_ROOT_REFS) {
/* In this case, we want to print all refs including root refs. */
- return refs_for_each_include_root_refs(get_main_ref_store(the_repository),
+ return refs_for_each_include_root_refs_with_referent(get_main_ref_store(the_repository),
cb, cb_data);
}
@@ -2646,7 +2646,7 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
* prefixes like "refs/heads/" etc. are stripped off,
* so we have to look at everything:
*/
- return refs_for_each_fullref_in(get_main_ref_store(the_repository),
+ return refs_for_each_fullref_in_with_referent(get_main_ref_store(the_repository),
"", NULL, cb, cb_data);
}
@@ -2656,17 +2656,17 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
* so just return everything and let the caller
* sort it out.
*/
- return refs_for_each_fullref_in(get_main_ref_store(the_repository),
+ return refs_for_each_fullref_in_with_referent(get_main_ref_store(the_repository),
"", NULL, cb, cb_data);
}
if (!filter->name_patterns[0]) {
/* no patterns; we have to look at everything */
- return refs_for_each_fullref_in(get_main_ref_store(the_repository),
+ return refs_for_each_fullref_in_with_referent(get_main_ref_store(the_repository),
"", filter->exclude.v, cb, cb_data);
}
- return refs_for_each_fullref_in_prefixes(get_main_ref_store(the_repository),
+ return refs_for_each_fullref_in_prefixes_with_referent(get_main_ref_store(the_repository),
NULL, filter->name_patterns,
filter->exclude.v,
cb, cb_data);
@@ -2781,7 +2781,7 @@ static int filter_ref_kind(struct ref_filter *filter, const char *refname)
return ref_kind_from_refname(refname);
}
-static struct ref_array_item *apply_ref_filter(const char *refname, const struct object_id *oid,
+static struct ref_array_item *apply_ref_filter(const char *refname, const char *referent, const struct object_id *oid,
int flag, struct ref_filter *filter)
{
struct ref_array_item *ref;
@@ -2850,6 +2850,7 @@ static struct ref_array_item *apply_ref_filter(const char *refname, const struct
ref->commit = commit;
ref->flag = flag;
ref->kind = kind;
+ ref->symref = referent;
return ref;
}
@@ -2863,12 +2864,12 @@ struct ref_filter_cbdata {
* A call-back given to for_each_ref(). Filter refs and keep them for
* later object processing.
*/
-static int filter_one(const char *refname, const struct object_id *oid, int flag, void *cb_data)
+static int filter_one(const char *refname, const char *referent, const struct object_id *oid, int flag, void *cb_data)
{
struct ref_filter_cbdata *ref_cbdata = cb_data;
struct ref_array_item *ref;
- ref = apply_ref_filter(refname, oid, flag, ref_cbdata->filter);
+ ref = apply_ref_filter(refname, referent, oid, flag, ref_cbdata->filter);
if (ref)
ref_array_append(ref_cbdata->array, ref);
@@ -2898,13 +2899,13 @@ struct ref_filter_and_format_cbdata {
} internal;
};
-static int filter_and_format_one(const char *refname, const struct object_id *oid, int flag, void *cb_data)
+static int filter_and_format_one(const char *refname, const char *referent, const struct object_id *oid, int flag, void *cb_data)
{
struct ref_filter_and_format_cbdata *ref_cbdata = cb_data;
struct ref_array_item *ref;
struct strbuf output = STRBUF_INIT, err = STRBUF_INIT;
- ref = apply_ref_filter(refname, oid, flag, ref_cbdata->filter);
+ ref = apply_ref_filter(refname, referent, oid, flag, ref_cbdata->filter);
if (!ref)
return 0;
@@ -3050,7 +3051,7 @@ void filter_ahead_behind(struct repository *r,
free(commits);
}
-static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref_fn fn, void *cb_data)
+static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref_with_referent_fn fn, void *cb_data)
{
int ret = 0;
@@ -3070,15 +3071,15 @@ static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref
* of filter_ref_kind().
*/
if (filter->kind == FILTER_REFS_BRANCHES)
- ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
+ ret = refs_for_each_fullref_in_with_referent(get_main_ref_store(the_repository),
"refs/heads/", NULL,
fn, cb_data);
else if (filter->kind == FILTER_REFS_REMOTES)
- ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
+ ret = refs_for_each_fullref_in_with_referent(get_main_ref_store(the_repository),
"refs/remotes/", NULL,
fn, cb_data);
else if (filter->kind == FILTER_REFS_TAGS)
- ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
+ ret = refs_for_each_fullref_in_with_referent(get_main_ref_store(the_repository),
"refs/tags/", NULL, fn,
cb_data);
else if (filter->kind & FILTER_REFS_REGULAR)
@@ -3090,7 +3091,7 @@ static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref
*/
if (!ret && !(filter->kind & FILTER_REFS_ROOT_REFS) &&
(filter->kind & FILTER_REFS_DETACHED_HEAD))
- refs_head_ref(get_main_ref_store(the_repository), fn,
+ refs_head_ref_referent(get_main_ref_store(the_repository), fn,
cb_data);
}
diff --git a/refs.c b/refs.c
index 6dcb0288cb..4366f35586 100644
--- a/refs.c
+++ b/refs.c
@@ -1529,6 +1529,19 @@ int refs_head_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
return 0;
}
+int refs_head_ref_referent(struct ref_store *refs, each_ref_with_referent_fn fn, void *cb_data)
+{
+ struct object_id oid;
+ int flag;
+ const char *referent;
+
+ if (refs_resolve_ref_unsafe_with_referent(refs, "HEAD", referent, RESOLVE_REF_READING,
+ &oid, &flag))
+ return fn("HEAD", referent, &oid, flag, cb_data);
+
+ return 0;
+}
+
struct ref_iterator *refs_ref_iterator_begin(
struct ref_store *refs,
const char *prefix,
@@ -1560,6 +1573,22 @@ struct ref_iterator *refs_ref_iterator_begin(
return iter;
}
+static int do_for_each_ref_with_referent(struct ref_store *refs, const char *prefix,
+ const char **exclude_patterns,
+ each_ref_with_referent_fn fn, int trim,
+ enum do_for_each_ref_flags flags, void *cb_data)
+{
+ struct ref_iterator *iter;
+
+ if (!refs)
+ return 0;
+
+ iter = refs_ref_iterator_begin(refs, prefix, exclude_patterns, trim,
+ flags);
+
+ return do_for_each_ref_iterator_with_referent(iter, fn, cb_data);
+}
+
static int do_for_each_ref(struct ref_store *refs, const char *prefix,
const char **exclude_patterns,
each_ref_fn fn, int trim,
@@ -1594,6 +1623,13 @@ int refs_for_each_fullref_in(struct ref_store *refs, const char *prefix,
return do_for_each_ref(refs, prefix, exclude_patterns, fn, 0, 0, cb_data);
}
+int refs_for_each_fullref_in_with_referent(struct ref_store *refs, const char *prefix,
+ const char **exclude_patterns,
+ each_ref_with_referent_fn fn, void *cb_data)
+{
+ return do_for_each_ref_with_referent(refs, prefix, exclude_patterns, fn, 0, 0, cb_data);
+}
+
int refs_for_each_replace_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
{
const char *git_replace_ref_base = ref_namespace[NAMESPACE_REPLACE].ref;
@@ -1627,6 +1663,13 @@ int refs_for_each_include_root_refs(struct ref_store *refs, each_ref_fn fn,
DO_FOR_EACH_INCLUDE_ROOT_REFS, cb_data);
}
+int refs_for_each_include_root_refs_with_referent(struct ref_store *refs, each_ref_with_referent_fn fn,
+ void *cb_data)
+{
+ return do_for_each_ref_with_referent(refs, "", NULL, fn, 0,
+ DO_FOR_EACH_INCLUDE_ROOT_REFS, cb_data);
+}
+
static int qsort_strcmp(const void *va, const void *vb)
{
const char *a = *(const char **)va;
@@ -1716,6 +1759,37 @@ int refs_for_each_fullref_in_prefixes(struct ref_store *ref_store,
return ret;
}
+int refs_for_each_fullref_in_prefixes_with_referent(struct ref_store *ref_store,
+ const char *namespace,
+ const char **patterns,
+ const char **exclude_patterns,
+ each_ref_with_referent_fn fn, void *cb_data)
+{
+ struct string_list prefixes = STRING_LIST_INIT_DUP;
+ struct string_list_item *prefix;
+ struct strbuf buf = STRBUF_INIT;
+ int ret = 0, namespace_len;
+
+ find_longest_prefixes(&prefixes, patterns);
+
+ if (namespace)
+ strbuf_addstr(&buf, namespace);
+ namespace_len = buf.len;
+
+ for_each_string_list_item(prefix, &prefixes) {
+ strbuf_addstr(&buf, prefix->string);
+ ret = refs_for_each_fullref_in_with_referent(ref_store, buf.buf,
+ exclude_patterns, fn, cb_data);
+ if (ret)
+ break;
+ strbuf_setlen(&buf, namespace_len);
+ }
+
+ string_list_clear(&prefixes, 0);
+ strbuf_release(&buf);
+ return ret;
+}
+
static int refs_read_special_head(struct ref_store *ref_store,
const char *refname, struct object_id *oid,
struct strbuf *referent, unsigned int *type,
diff --git a/refs.h b/refs.h
index ba6d0e0753..d8387c6296 100644
--- a/refs.h
+++ b/refs.h
@@ -304,6 +304,9 @@ struct ref_transaction;
typedef int each_ref_fn(const char *refname,
const struct object_id *oid, int flags, void *cb_data);
+typedef int each_ref_with_referent_fn(const char *refname, const char *referent,
+ const struct object_id *oid, int flags, void *cb_data);
+
/*
* The following functions invoke the specified callback function for
* each reference indicated. If the function ever returns a nonzero
@@ -315,6 +318,8 @@ typedef int each_ref_fn(const char *refname,
*/
int refs_head_ref(struct ref_store *refs,
each_ref_fn fn, void *cb_data);
+int refs_head_ref_referent(struct ref_store *refs,
+ each_ref_with_referent_fn fn, void *cb_data);
int refs_for_each_ref(struct ref_store *refs,
each_ref_fn fn, void *cb_data);
int refs_for_each_ref_in(struct ref_store *refs, const char *prefix,
@@ -351,6 +356,17 @@ int refs_for_each_fullref_in_prefixes(struct ref_store *refs,
const char **exclude_patterns,
each_ref_fn fn, void *cb_data);
+int refs_for_each_fullref_in_prefixes_with_referent(struct ref_store *refs,
+ const char *namespace,
+ const char **patterns,
+ const char **exclude_patterns,
+ each_ref_with_referent_fn fn, void *cb_data);
+
+int refs_for_each_fullref_in_with_referent(struct ref_store *refs, const char *prefix,
+ const char **exclude_patterns,
+ each_ref_with_referent_fn fn, void *cb_data);
+
+
/* iterates all refs that match the specified glob pattern. */
int refs_for_each_glob_ref(struct ref_store *refs, each_ref_fn fn,
const char *pattern, void *cb_data);
@@ -377,6 +393,10 @@ int refs_for_each_rawref(struct ref_store *refs, each_ref_fn fn, void *cb_data);
int refs_for_each_include_root_refs(struct ref_store *refs, each_ref_fn fn,
void *cb_data);
+int refs_for_each_include_root_refs_with_referent(struct ref_store *refs, each_ref_with_referent_fn fn,
+ void *cb_data);
+
+
/*
* Normalizes partial refs to their fully qualified form.
* Will prepend <prefix> to the <pattern> if it doesn't start with 'refs/'.
diff --git a/refs/iterator.c b/refs/iterator.c
index 26acb6bd64..26c38ec6de 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -469,3 +469,30 @@ int do_for_each_ref_iterator(struct ref_iterator *iter,
return -1;
return retval;
}
+
+int do_for_each_ref_iterator_with_referent(struct ref_iterator *iter,
+ each_ref_with_referent_fn fn, void *cb_data)
+{
+ int retval = 0, ok;
+ struct ref_iterator *old_ref_iter = current_ref_iter;
+
+ current_ref_iter = iter;
+ while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
+ retval = fn(iter->refname, iter->referent, iter->oid, iter->flags, cb_data);
+ if (retval) {
+ /*
+ * If ref_iterator_abort() returns ITER_ERROR,
+ * we ignore that error in deference to the
+ * callback function's return value.
+ */
+ ref_iterator_abort(iter);
+ goto out;
+ }
+ }
+
+out:
+ current_ref_iter = old_ref_iter;
+ if (ok == ITER_ERROR)
+ return -1;
+ return retval;
+}
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 0775451435..ebec407468 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -486,6 +486,9 @@ extern struct ref_iterator *current_ref_iter;
*/
int do_for_each_ref_iterator(struct ref_iterator *iter,
each_ref_fn fn, void *cb_data);
+int do_for_each_ref_iterator_with_referent(struct ref_iterator *iter,
+ each_ref_with_referent_fn fn, void *cb_data);
+
struct ref_store;
Which is a little bit unseemly in my view...so between the two options I would be
inclined to go with modifying each_repo_fn than create a whole subset set of
helpers. wdyt?
thanks
John
>
> Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Patrick Steinhardt <[email protected]> writes:
> I wonder whether we can future-proof the code a bit more by introducing
> a struct that contains everything we want to pass to the callback
> function.
That would hopefully make a change with a large blast a one-time
event. But at the same time, it may end up making it too opaque and
hard to verify if all the API functions are using/updating/verifying
all the members of the struct as they should. Compared to that,
unused parameters are easier to verify mechanically by compilers.
> This would also allow us to get rid of this awful `peel_iterated_oid()`
> function that relies on global state in many places, as we can put the
> peeled OID into that structure, as well.
Yes, such a benefit may justify a one-time "affect many callers"
event. Or the underlying for_each_*() friend of functions can be
updated to use a single struct and then the current "only selected
parameters that are used ar passed" API can be made into a set of
thin wrappers around it, and then callers can be converted one step
at a time, in a multi-step series, perhaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, "Kristoffer Haugsbakk" wrote (reply to this):
On Fri, Jun 7, 2024, at 00:44, Junio C Hamano wrote:
> I found the previous occurrences of the same problem:
>
> https://lore.kernel.org/git/[email protected]/
> https://lore.kernel.org/git/[email protected]/
>
> The last message in the thread
>
>
> https://lore.kernel.org/git/CAMbn=B7J4ODf9ybJQpL1bZZ7qdWSDGaLEyTmVv+ZBiSeC9T+yw@mail.gmail.com/
>
> says that the original user of GGG found what was wrong in the way
> the user was using GGG to send and fixed it, but unfortunately we
> didn't hear exactly *what* the breakage was and *how* it was fixed.
>
> Aryan, do you remember what the problem was and more importantly
> what the fix was?
>
> Thanks.
Yes, Aryan didn’t explain what the issue was. But Linus Arver (+CC) in
that first thread/link figured out what was happening in his case:[1]
> I realize now that it's because I copy/pasted the "Cc: ..." lines in the PR
> description from
> https://github.com/gitgitgadget/git/pull/1632#issue-2068188239, such
> that when I pasted those in for the PR description for this series at
> https://github.com/gitgitgadget/git/pull/1694#issue-2187804953, it
> carried over the email addresses as Markdown-formatted hyperlinks.
> Currently it reads
>
> Cc: Christian Couder [[email protected]](mailto:[email protected])
> Cc: Junio C Hamano [[email protected]](mailto:[email protected])
> Cc: Emily Shaffer [[email protected]](mailto:[email protected])
> cc: Josh Steadmon [[email protected]](mailto:[email protected])
> cc: Randall S. Becker [[email protected]](mailto:[email protected])
> cc: Christian Couder [[email protected]](mailto:[email protected])
> cc: "Kristoffer Haugsbakk" [[email protected]](mailto:[email protected])
> cc: "Kristoffer Haugsbakk" <[email protected]>
>
> when I click on "edit", where the last line must be from your manual
> fix which GGG picked up. I've cleaned up the PR description manually
> now, and for this message I'm also attempting to clean up those square
> brackets.
When I read that I assumed that Aryan had made the same mistake.
🔗 1: https://lore.kernel.org/git/[email protected]/
--
Kristoffer Haugsbakk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Kristoffer Haugsbakk" <[email protected]> writes:
>> ...
>> https://github.com/gitgitgadget/git/pull/1694#issue-2187804953, it
>> carried over the email addresses as Markdown-formatted hyperlinks.
>> Currently it reads
>>
>> Cc: Christian Couder [[email protected]](mailto:[email protected])
>>...
>> cc: "Kristoffer Haugsbakk" <[email protected]>
>>
>> when I click on "edit", where the last line must be from your manual
>> fix which GGG picked up. I've cleaned up the PR description manually
>> now, and for this message I'm also attempting to clean up those square
>> brackets.
>
> When I read that I assumed that Aryan had made the same mistake.
Ah, I see.
I wonder if there are things we can do to help reduce the pain in
future users of GGG here.
- Is there some documentation update we can make on our end,
perhaps in Documentation/MyFirstContribution (SubmittingPatches
does not talk about GGG other than refering readers to
MyFirstContribution)?
- Or the welcome message GGG adds to the pull-request that begins
with "Thanks for taking the time to contribute" can mention
something about this?
- Or possibly the handling of Cc: addresses done by GGG can be
tweaked not to cause this?
Anything else?
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Linus Arver wrote (reply to this):
Junio C Hamano <[email protected]> writes:
> "Kristoffer Haugsbakk" <[email protected]> writes:
>
>>> ...
>>> https://github.com/gitgitgadget/git/pull/1694#issue-2187804953, it
>>> carried over the email addresses as Markdown-formatted hyperlinks.
>>> Currently it reads
>>>
>>> Cc: Christian Couder [[email protected]](mailto:[email protected])
>>>...
>>> cc: "Kristoffer Haugsbakk" <[email protected]>
>>>
>>> when I click on "edit", where the last line must be from your manual
>>> fix which GGG picked up. I've cleaned up the PR description manually
>>> now, and for this message I'm also attempting to clean up those square
>>> brackets.
>>
>> When I read that I assumed that Aryan had made the same mistake.
>
> Ah, I see.
>
> I wonder if there are things we can do to help reduce the pain in
> future users of GGG here.
>
> [...]
>
> - Or the welcome message GGG adds to the pull-request that begins
> with "Thanks for taking the time to contribute" can mention
> something about this?
I've created https://github.com/gitgitgadget/gitgitgadget/pull/1644 to
address this for now.
>
> - Or possibly the handling of Cc: addresses done by GGG can be
> tweaked not to cause this?
I wanted to do this first but was deterred by my lack of familiarity
with TypeScript.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Linus Arver <[email protected]> writes:
>> - Or the welcome message GGG adds to the pull-request that begins
>> with "Thanks for taking the time to contribute" can mention
>> something about this?
>
> I've created https://github.com/gitgitgadget/gitgitgadget/pull/1644 to
> address this for now.
Thanks.
add-interactive.c
Outdated
@@ -533,7 +533,7 @@ static int get_modified_files(struct repository *r, | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff King wrote (reply to this):
On Thu, Jun 06, 2024 at 05:26:37PM +0000, John Cai via GitGitGadget wrote:
> From: John Cai <[email protected]>
>
> refs_resolve_ref_unsafe retrieves the referent, the unresolved value of
> a reference. Add a parameter to allow refs_resolve_ref_unsafe to pass up
> the value of referent to the caller so it can save this value in ref
> iterators for more efficient access.
This commit message left me with a lot of questions.
For one, it wasn't immediately obvious to me what a "referent" is. ;) I
think an example could help. If I understand, you mean that if you have
a situation like:
- refs/heads/one is a symref pointing to refs/heads/two
- refs/heads/two is a regular ref
and we resolve "one", then "two" is the referent? And the caller might
want to know that?
But I think we already pass that out as the return value from
refs_resolve_ref_unsafe(). That is how something like "rev-parse
--symbolic-full-name" works now.
But there are some subtleties. In a chain of symbolic refs (say, "two"
is a symbolic ref to "three"), we return only the final name ("three").
And you might want to know about "two".
You can pass RESOLVE_REF_NO_RECURSE to inhibit this, and get back just
"two". You can see that now with "git symbolic-ref --no-recurse". The
downside is that we never look at the referent at all, so you get only
the symref value (and no information about the actual oid, or if the
referent even exists). You would still get an oid for any non-symrefs
you examine.
So reading between the lines, you have a caller in mind which wants to
know the immediate referent in addition to the final recursive oid?
Looking at the rest of your series, I guess that caller is the one in
loose_fill_ref_dir_regular_file(), so that it can get passed to the
for-each-ref callback. But why is it right thing for it to record and
pass along the immediate referent there, and not the final one? For that
matter, would a caller ever want to see the whole chain of
one/two/three?
> @@ -1761,6 +1761,7 @@ int refs_read_symbolic_ref(struct ref_store *ref_store, const char *refname,
>
> const char *refs_resolve_ref_unsafe(struct ref_store *refs,
> const char *refname,
> + const char *referent,
> int resolve_flags,
> struct object_id *oid,
> int *flags)
Unless I am misunderstanding the purpose of your patch completely, this
"referent" is meant to be an out-parameter, right? In which case,
shouldn't it be "const char **referent"?
As the code is now:
> @@ -1822,6 +1823,9 @@ const char *refs_resolve_ref_unsafe(struct ref_store *refs,
> }
>
> *flags |= read_flags;
> + if (referent && (read_flags & REF_ISSYMREF) &&
> + sb_refname.len > 0)
> + referent = sb_refname.buf;
>
> if (!(read_flags & REF_ISSYMREF)) {
> if (*flags & REF_BAD_NAME) {
...we'd assign the local "referent" pointer to our refname buf, but
the caller would never see that. Plus doing so would not help you
anyway, since sb_refname will be used again as we recurse. So at best,
you end up with the final name in the chain anyway. Or at worst,
sb_refname gets reallocated and "referent" is left as a dangling
pointer.
-Peff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, John Cai wrote (reply to this):
Hey Peff,
On 11 Jun 2024, at 4:50, Jeff King wrote:
> On Thu, Jun 06, 2024 at 05:26:37PM +0000, John Cai via GitGitGadget wrote:
>
>> From: John Cai <[email protected]>
>>
>> refs_resolve_ref_unsafe retrieves the referent, the unresolved value of
>> a reference. Add a parameter to allow refs_resolve_ref_unsafe to pass up
>> the value of referent to the caller so it can save this value in ref
>> iterators for more efficient access.
>
> This commit message left me with a lot of questions.
>
> For one, it wasn't immediately obvious to me what a "referent" is. ;) I
> think an example could help. If I understand, you mean that if you have
> a situation like:
>
> - refs/heads/one is a symref pointing to refs/heads/two
> - refs/heads/two is a regular ref
>
> and we resolve "one", then "two" is the referent? And the caller might
> want to know that?
>
> But I think we already pass that out as the return value from
> refs_resolve_ref_unsafe(). That is how something like "rev-parse
> --symbolic-full-name" works now.
Yes, exactly. I think you're right that it'd be preferable to just use the
output of refs_resolve_ref_unsafe() to get the value of the referent.
>
> But there are some subtleties. In a chain of symbolic refs (say, "two"
> is a symbolic ref to "three"), we return only the final name ("three").
> And you might want to know about "two".
>
> You can pass RESOLVE_REF_NO_RECURSE to inhibit this, and get back just
> "two". You can see that now with "git symbolic-ref --no-recurse". The
> downside is that we never look at the referent at all, so you get only
> the symref value (and no information about the actual oid, or if the
> referent even exists). You would still get an oid for any non-symrefs
> you examine.
>
> So reading between the lines, you have a caller in mind which wants to
> know the immediate referent in addition to the final recursive oid?
The goal is to keep track of the value that %(symref) would need in the
iterator so that a separate call doesn't need to be made.
>
> Looking at the rest of your series, I guess that caller is the one in
> loose_fill_ref_dir_regular_file(), so that it can get passed to the
> for-each-ref callback. But why is it right thing for it to record and
> pass along the immediate referent there, and not the final one? For that
> matter, would a caller ever want to see the whole chain of
> one/two/three?
Right, the final referent is the right one to pass down.
>
>> @@ -1761,6 +1761,7 @@ int refs_read_symbolic_ref(struct ref_store *ref_store, const char *refname,
>>
>> const char *refs_resolve_ref_unsafe(struct ref_store *refs,
>> const char *refname,
>> + const char *referent,
>> int resolve_flags,
>> struct object_id *oid,
>> int *flags)
>
> Unless I am misunderstanding the purpose of your patch completely, this
> "referent" is meant to be an out-parameter, right? In which case,
> shouldn't it be "const char **referent"?
>
> As the code is now:
>
>> @@ -1822,6 +1823,9 @@ const char *refs_resolve_ref_unsafe(struct ref_store *refs,
>> }
>>
>> *flags |= read_flags;
>> + if (referent && (read_flags & REF_ISSYMREF) &&
>> + sb_refname.len > 0)
>> + referent = sb_refname.buf;
>>
>> if (!(read_flags & REF_ISSYMREF)) {
>> if (*flags & REF_BAD_NAME) {
>
> ...we'd assign the local "referent" pointer to our refname buf, but
> the caller would never see that. Plus doing so would not help you
> anyway, since sb_refname will be used again as we recurse. So at best,
> you end up with the final name in the chain anyway. Or at worst,
> sb_refname gets reallocated and "referent" is left as a dangling
> pointer.
Going to include changes to remove the out-parameter which will simplify things
quite a bit.
>
> -Peff
thanks for the review!
John
@@ -243,8 +243,9 @@ static void loose_fill_ref_dir_regular_file(struct files_ref_store *refs, | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff King wrote (reply to this):
On Thu, Jun 06, 2024 at 05:26:38PM +0000, John Cai via GitGitGadget wrote:
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index bf2ffe062ea..a963d796a29 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -243,8 +243,9 @@ static void loose_fill_ref_dir_regular_file(struct files_ref_store *refs,
> {
> struct object_id oid;
> int flag;
> + const char* referent = NULL;
>
> - if (!refs_resolve_ref_unsafe(&refs->base, refname, NULL, RESOLVE_REF_READING,
> + if (!refs_resolve_ref_unsafe(&refs->base, refname, referent, RESOLVE_REF_READING,
> &oid, &flag)) {
> oidclr(&oid);
> flag |= REF_ISBROKEN;
Here we pass in NULL, so the code in refs_resolve_ref_unsafe() won't do
anything. And our copy of "referent" here will remain NULL, so the rest
of this patch also does nothing. Again, I think that the function should
take a "char **", and you'd pass in &referent here?
Though if we are OK with surfacing just the final value in a
multi-element chain, then you could just use the existing return value,
like:
referent = refs_resolve_ref_unsafe(&refs->base, refname,
RESOLVE_REF_REAEDING, &oid, &flags);
if (!referent) {
oidclr(&oid);
flag |= REF_ISBROKEN;
}
and then later pass "referent" to create_ref_entry() if flags contains
REF_ISSYMREF (or since we pass it the flags, it could do that check
itself).
-Peff
User |
User |
adb273b
to
3e147e7
Compare
/preview |
Preview email sent as [email protected] |
/preview |
Preview email sent as [email protected] |
/submit |
Submitted as [email protected] To fetch this version into
To fetch this version to local tag
|
@@ -245,9 +245,11 @@ static void loose_fill_ref_dir_regular_file(struct files_ref_store *refs, | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"John Cai via GitGitGadget" <[email protected]> writes:
> @@ -245,9 +245,11 @@ static void loose_fill_ref_dir_regular_file(struct files_ref_store *refs,
> {
> struct object_id oid;
> int flag;
> -
> - if (!refs_resolve_ref_unsafe(&refs->base, refname, RESOLVE_REF_READING,
> - &oid, &flag)) {
> + const char* referent = refs_resolve_ref_unsafe(&refs->base,
Style. The asterisk sticks to the pointer variable, not the base type.
> + refname,
> + RESOLVE_REF_READING,
> + &oid, &flag);
> + if (!referent) {
> oidclr(&oid, the_repository->hash_algo);
> flag |= REF_ISBROKEN;
> } else if (is_null_oid(&oid)) {
> @@ -268,7 +270,8 @@ static void loose_fill_ref_dir_regular_file(struct files_ref_store *refs,
> oidclr(&oid, the_repository->hash_algo);
> flag |= REF_BAD_NAME | REF_ISBROKEN;
> }
> - add_entry_to_dir(dir, create_ref_entry(refname, &oid, flag));
> +
> + add_entry_to_dir(dir, create_ref_entry(refname, referent, &oid, flag));
> }
This is a very straight-forward change, given the matching change to
the ref-entry, which now has a referent member.
> @@ -886,6 +889,9 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
> iter->base.refname = iter->iter0->refname;
> iter->base.oid = iter->iter0->oid;
> iter->base.flags = iter->iter0->flags;
> + if (iter->iter0->flags & REF_ISSYMREF)
> + iter->base.referent = iter->iter0->referent;
Presumably base.referent is initialized to NULL so this "if"
statement does not need an else clause?
I am primarily wondering why this needs to be conditional. We are
making verbatim copy of the flags word from *iter->iter0 to
iter->base; if iter0 is symref we want to mark base also as symref,
If iter0 is not a symref, then we want to mark base also not a
symref, presumably. So wouldn't it make sense to just say
iter->base.referent = iter->iter0->referent;
here, regardless of what iter->iter0->flags say about symref-ness of
the thing? Because ...
> diff --git a/refs/iterator.c b/refs/iterator.c
> index d355ebf0d59..26acb6bd640 100644
> --- a/refs/iterator.c
> +++ b/refs/iterator.c
> @@ -7,6 +7,7 @@
> #include "refs.h"
> #include "refs/refs-internal.h"
> #include "iterator.h"
> +#include "strbuf.h"
>
> int ref_iterator_advance(struct ref_iterator *ref_iterator)
> {
> @@ -29,6 +30,7 @@ void base_ref_iterator_init(struct ref_iterator *iter,
> {
> iter->vtable = vtable;
> iter->refname = NULL;
> + iter->referent = NULL;
> iter->oid = NULL;
> iter->flags = 0;
> }
> @@ -199,6 +201,7 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
> }
>
> if (selection & ITER_YIELD_CURRENT) {
> + iter->base.referent = (*iter->current)->referent;
> iter->base.refname = (*iter->current)->refname;
> iter->base.oid = (*iter->current)->oid;
> iter->base.flags = (*iter->current)->flags;
... other parts of the API seem to follow that philosophy.
In other words, the invariant of this code is that .referent is
non-NULL if and only if the ref is a symref, and that invariant is
trusted without checking with .flags member. But the earlier hunk
that copied iter0 to base did not seem to be using that invariant,
which made it look inconsistent.
> struct ref_entry *create_ref_entry(const char *refname,
> + const char *referent,
> const struct object_id *oid, int flag)
> {
> struct ref_entry *ref;
> @@ -41,6 +43,10 @@ struct ref_entry *create_ref_entry(const char *refname,
> FLEX_ALLOC_STR(ref, name, refname);
> oidcpy(&ref->u.value.oid, oid);
> ref->flag = flag;
> +
> + if (flag & REF_ISSYMREF)
> + ref->u.value.referent = xstrdup_or_null(referent);
OK. ref_value now has referent next to the existing oid, but that
member only makes sense when flag says it is a symref.
Curiously, that information is missing from ref_value struct, so by
looking at a ref_value alone, we cannot tell if we should trust the
value in the .referent member?
Makes me wonder if we should follow the same "ignore what the flag
says when filling the .referent member; if the ref is not a symref,
the referent variable is NULL, and if it is, referent is never NULL"
pattern? Then ref->u.value.referent is _always_ defined---the
current code says "the u.value.referent member is undefined for ref
that is not a symref", but with the suggested change, it will be
"the u.value.referent member is NULL for ref that is not a symref,
and for a symref, it is the value of the symref".
> return ref;
> }
>
> @@ -66,6 +72,7 @@ static void free_ref_entry(struct ref_entry *entry)
> */
> clear_ref_dir(&entry->u.subdir);
> }
> + free(entry->u.value.referent);
And that would match what is done here. We do not say "entry->flag
says it is not a symref, so do not bother freeing u.value.referent".
> @@ -431,6 +438,7 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
> level->index = -1;
> } else {
> iter->base.refname = entry->name;
> + iter->base.referent = entry->u.value.referent;
> iter->base.oid = &entry->u.value.oid;
> iter->base.flags = entry->flag;
> return ITER_OK;
> diff --git a/refs/ref-cache.h b/refs/ref-cache.h
> index 31ebe24f6cf..5f04e518c37 100644
> --- a/refs/ref-cache.h
> +++ b/refs/ref-cache.h
> @@ -42,6 +42,7 @@ struct ref_value {
> * referred to by the last reference in the symlink chain.
> */
> struct object_id oid;
> + char *referent;
> };
>
> /*
> @@ -173,6 +174,7 @@ struct ref_entry *create_dir_entry(struct ref_cache *cache,
> const char *dirname, size_t len);
>
> struct ref_entry *create_ref_entry(const char *refname,
> + const char *referent,
> const struct object_id *oid, int flag);
>
> /*
> diff --git a/refs/refs-internal.h b/refs/refs-internal.h
> index fa975d69aaa..117ec233848 100644
> --- a/refs/refs-internal.h
> +++ b/refs/refs-internal.h
> @@ -299,6 +299,7 @@ enum do_for_each_ref_flags {
> struct ref_iterator {
> struct ref_iterator_vtable *vtable;
> const char *refname;
> + const char *referent;
> const struct object_id *oid;
> unsigned int flags;
> };
> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> index fbe74c239d3..9f724c3d632 100644
> --- a/refs/reftable-backend.c
> +++ b/refs/reftable-backend.c
> @@ -494,8 +494,12 @@ static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
> the_repository->hash_algo);
> break;
> case REFTABLE_REF_SYMREF:
> - if (!refs_resolve_ref_unsafe(&iter->refs->base, iter->ref.refname,
> - RESOLVE_REF_READING, &iter->oid, &flags))
> + iter->base.referent = refs_resolve_ref_unsafe(&iter->refs->base,
> + iter->ref.refname,
> + RESOLVE_REF_READING,
> + &iter->oid,
> + &flags);
> + if (!iter->base.referent)
> oidclr(&iter->oid, the_repository->hash_algo);
> break;
> default:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Patrick Steinhardt wrote (reply to this):
On Thu, Aug 01, 2024 at 09:41:03AM -0700, Junio C Hamano wrote:
> "John Cai via GitGitGadget" <[email protected]> writes:
> > @@ -886,6 +889,9 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
> > iter->base.refname = iter->iter0->refname;
> > iter->base.oid = iter->iter0->oid;
> > iter->base.flags = iter->iter0->flags;
> > + if (iter->iter0->flags & REF_ISSYMREF)
> > + iter->base.referent = iter->iter0->referent;
>
> Presumably base.referent is initialized to NULL so this "if"
> statement does not need an else clause?
This function typically ends up being called in a loop though. So
without the else clause, wouldn't we potentially leak the value of a
preceding ref into subsequent iterations like this?
> I am primarily wondering why this needs to be conditional. We are
> making verbatim copy of the flags word from *iter->iter0 to
> iter->base; if iter0 is symref we want to mark base also as symref,
> If iter0 is not a symref, then we want to mark base also not a
> symref, presumably. So wouldn't it make sense to just say
>
> iter->base.referent = iter->iter0->referent;
>
> here, regardless of what iter->iter0->flags say about symref-ness of
> the thing? Because ...
> > diff --git a/refs/iterator.c b/refs/iterator.c
> > index d355ebf0d59..26acb6bd640 100644
> > --- a/refs/iterator.c
> > +++ b/refs/iterator.c
> > @@ -7,6 +7,7 @@
> > #include "refs.h"
> > #include "refs/refs-internal.h"
> > #include "iterator.h"
> > +#include "strbuf.h"
> >
> > int ref_iterator_advance(struct ref_iterator *ref_iterator)
> > {
> > @@ -29,6 +30,7 @@ void base_ref_iterator_init(struct ref_iterator *iter,
> > {
> > iter->vtable = vtable;
> > iter->refname = NULL;
> > + iter->referent = NULL;
> > iter->oid = NULL;
> > iter->flags = 0;
> > }
> > @@ -199,6 +201,7 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
> > }
> >
> > if (selection & ITER_YIELD_CURRENT) {
> > + iter->base.referent = (*iter->current)->referent;
> > iter->base.refname = (*iter->current)->refname;
> > iter->base.oid = (*iter->current)->oid;
> > iter->base.flags = (*iter->current)->flags;
>
> ... other parts of the API seem to follow that philosophy.
>
> In other words, the invariant of this code is that .referent is
> non-NULL if and only if the ref is a symref, and that invariant is
> trusted without checking with .flags member. But the earlier hunk
> that copied iter0 to base did not seem to be using that invariant,
> which made it look inconsistent.
Agreed.
> > struct ref_entry *create_ref_entry(const char *refname,
> > + const char *referent,
> > const struct object_id *oid, int flag)
> > {
> > struct ref_entry *ref;
> > @@ -41,6 +43,10 @@ struct ref_entry *create_ref_entry(const char *refname,
> > FLEX_ALLOC_STR(ref, name, refname);
> > oidcpy(&ref->u.value.oid, oid);
> > ref->flag = flag;
> > +
> > + if (flag & REF_ISSYMREF)
> > + ref->u.value.referent = xstrdup_or_null(referent);
>
> OK. ref_value now has referent next to the existing oid, but that
> member only makes sense when flag says it is a symref.
>
> Curiously, that information is missing from ref_value struct, so by
> looking at a ref_value alone, we cannot tell if we should trust the
> value in the .referent member?
>
> Makes me wonder if we should follow the same "ignore what the flag
> says when filling the .referent member; if the ref is not a symref,
> the referent variable is NULL, and if it is, referent is never NULL"
> pattern? Then ref->u.value.referent is _always_ defined---the
> current code says "the u.value.referent member is undefined for ref
> that is not a symref", but with the suggested change, it will be
> "the u.value.referent member is NULL for ref that is not a symref,
> and for a symref, it is the value of the symref".
Yeah, I think that would be preferable indeed.
Patrick
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Patrick Steinhardt <[email protected]> writes:
> On Thu, Aug 01, 2024 at 09:41:03AM -0700, Junio C Hamano wrote:
>> "John Cai via GitGitGadget" <[email protected]> writes:
>> > @@ -886,6 +889,9 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
>> > iter->base.refname = iter->iter0->refname;
>> > iter->base.oid = iter->iter0->oid;
>> > iter->base.flags = iter->iter0->flags;
>> > + if (iter->iter0->flags & REF_ISSYMREF)
>> > + iter->base.referent = iter->iter0->referent;
>>
>> Presumably base.referent is initialized to NULL so this "if"
>> statement does not need an else clause?
>
> This function typically ends up being called in a loop though. So
> without the else clause, wouldn't we potentially leak the value of a
> preceding ref into subsequent iterations like this?
OK, so this does need to clear it when we tell the caller we have
non SYMREF, as we do want to show NULL as base.referent to the
caller in such a case. Thanks.
It does reinforce my larger point, which was:
>> Makes me wonder if we should follow the same "ignore what the flag
>> says when filling the .referent member; if the ref is not a symref,
>> the referent variable is NULL, and if it is, referent is never NULL"
>> pattern? Then ref->u.value.referent is _always_ defined---the
>> current code says "the u.value.referent member is undefined for ref
>> that is not a symref", but with the suggested change, it will be
>> "the u.value.referent member is NULL for ref that is not a symref,
>> and for a symref, it is the value of the symref".
>
> Yeah, I think that would be preferable indeed.
In other words, with .referent member introduced, checking for
(.flags & REF_ISSYMREF) becomes a redundant&duplicated bit of
information, as the bit should exactly match the non-NULL ness of
the .referent member.
Thanks.
@@ -2783,7 +2783,7 @@ static int filter_ref_kind(struct ref_filter *filter, const char *refname) | |||
return ref_kind_from_refname(refname); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"John Cai via GitGitGadget" <[email protected]> writes:
> @@ -2852,6 +2852,8 @@ static struct ref_array_item *apply_ref_filter(const char *refname, const struct
> ref->commit = commit;
> ref->flag = flag;
> ref->kind = kind;
> + if (flag & REF_ISSYMREF)
> + ref->symref = xstrdup_or_null(referent);
The same reaction as [1/3]. Doesn't the null-ness of referent
convey the same information as the ISSYMREF bit in flag? IOW,
can't we do this assignment unconditionally?
@@ -245,9 +245,11 @@ static void loose_fill_ref_dir_regular_file(struct files_ref_store *refs, | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"John Cai via GitGitGadget" <[email protected]> writes:
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index aa52d9be7c7..5ed69c23f74 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -245,9 +245,11 @@ static void loose_fill_ref_dir_regular_file(struct files_ref_store *refs,
> {
> struct object_id oid;
> int flag;
> -
> - if (!refs_resolve_ref_unsafe(&refs->base, refname, RESOLVE_REF_READING,
Here, we had a nice blank line that separated the decls and the
first statement.
> - &oid, &flag)) {
> + const char *referent = refs_resolve_ref_unsafe(&refs->base,
> + refname,
> + RESOLVE_REF_READING,
> + &oid, &flag);
> + if (!referent) {
We lost it here, though.
> oidclr(&oid, the_repository->hash_algo);
> flag |= REF_ISBROKEN;
> } else if (is_null_oid(&oid)) {
> @@ -268,7 +270,11 @@ static void loose_fill_ref_dir_regular_file(struct files_ref_store *refs,
> oidclr(&oid, the_repository->hash_algo);
> flag |= REF_BAD_NAME | REF_ISBROKEN;
> }
> - add_entry_to_dir(dir, create_ref_entry(refname, &oid, flag));
> +
> + if (!(flag & REF_ISSYMREF))
> + referent = NULL;
OK, this is new in this round. The idea is that everybody else can
rely on the invariant that the referent being NULL is equivalent to
REF_ISSYMREF bit in flag word being off from here on.
> + add_entry_to_dir(dir, create_ref_entry(refname, referent, &oid, flag));
> }
>
> /*
> @@ -886,6 +892,11 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
> iter->base.refname = iter->iter0->refname;
> iter->base.oid = iter->iter0->oid;
> iter->base.flags = iter->iter0->flags;
> + if (iter->iter0->flags & REF_ISSYMREF)
> + iter->base.referent = iter->iter0->referent;
> + else
> + iter->base.referent = NULL;
> return ITER_OK;
> }
Hmph, why not an unconditional
iter->base.referent = iter->iter0->referent;
instead? This code is making sure (iter->base.flags & REF_ISSYMREF)
is directly linked to non-NULL-ness or iter->base.referent, and we
want to make everybody take it as invariant. Shouldn't this code
also rely on the same invariant? If iter-iter0->referent is NULL,
iter->iter0->flag has REF_ISSYMREF bit off, and vice versa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, John Cai wrote (reply to this):
Hi Junio,
On 7 Aug 2024, at 17:40, Junio C Hamano wrote:
> "John Cai via GitGitGadget" <[email protected]> writes:
>
>> diff --git a/refs/files-backend.c b/refs/files-backend.c
>> index aa52d9be7c7..5ed69c23f74 100644
>> --- a/refs/files-backend.c
>> +++ b/refs/files-backend.c
>> @@ -245,9 +245,11 @@ static void loose_fill_ref_dir_regular_file(struct files_ref_store *refs,
>> {
>> struct object_id oid;
>> int flag;
>> -
>> - if (!refs_resolve_ref_unsafe(&refs->base, refname, RESOLVE_REF_READING,
>
> Here, we had a nice blank line that separated the decls and the
> first statement.
>
>> - &oid, &flag)) {
>> + const char *referent = refs_resolve_ref_unsafe(&refs->base,
>> + refname,
>> + RESOLVE_REF_READING,
>> + &oid, &flag);
>> + if (!referent) {
>
> We lost it here, though.
Ah will restore the blank line.
>
>> oidclr(&oid, the_repository->hash_algo);
>> flag |= REF_ISBROKEN;
>> } else if (is_null_oid(&oid)) {
>> @@ -268,7 +270,11 @@ static void loose_fill_ref_dir_regular_file(struct files_ref_store *refs,
>> oidclr(&oid, the_repository->hash_algo);
>> flag |= REF_BAD_NAME | REF_ISBROKEN;
>> }
>> - add_entry_to_dir(dir, create_ref_entry(refname, &oid, flag));
>> +
>> + if (!(flag & REF_ISSYMREF))
>> + referent = NULL;
>
> OK, this is new in this round. The idea is that everybody else can
> rely on the invariant that the referent being NULL is equivalent to
> REF_ISSYMREF bit in flag word being off from here on.
>
>> + add_entry_to_dir(dir, create_ref_entry(refname, referent, &oid, flag));
>> }
>>
>> /*
>> @@ -886,6 +892,11 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
>> iter->base.refname = iter->iter0->refname;
>> iter->base.oid = iter->iter0->oid;
>> iter->base.flags = iter->iter0->flags;
>> + if (iter->iter0->flags & REF_ISSYMREF)
>> + iter->base.referent = iter->iter0->referent;
>> + else
>> + iter->base.referent = NULL;
>> return ITER_OK;
>> }
>
> Hmph, why not an unconditional
>
> iter->base.referent = iter->iter0->referent;
>
> instead? This code is making sure (iter->base.flags & REF_ISSYMREF)
> is directly linked to non-NULL-ness or iter->base.referent, and we
> want to make everybody take it as invariant. Shouldn't this code
> also rely on the same invariant? If iter-iter0->referent is NULL,
> iter->iter0->flag has REF_ISSYMREF bit off, and vice versa.
That's a good point. Even though this code will be used in a loop, if we just
set it every time unconditionally then we won't end up with leftover values. Wil
l adjust in the next version.
This patch series was integrated into seen via 1d3d5ed. |
This branch is now known as |
This patch series was integrated into seen via 1c7ed05. |
There was a status update in the "Cooking" section about the branch The refs API has been taught to give symref target information to the users of ref iterators, allowing for-each-ref and friends to avoid an extra ref_resolve_* API call per a symbolic ref. Waiting for review response. source: <[email protected]> |
4617dc2
to
d919007
Compare
Since ref iterators do not hold onto the direct value of a reference without resolving it, the only way to get ahold of a direct value of a symbolic ref is to make a separate call to refs_read_symbolic_ref. To make accessing the direct value of a symbolic ref more efficient, let's save the direct value of the ref in the iterators for both the files backend and the reftable backend. Signed-off-by: John Cai <[email protected]>
Add a parameter to each_ref_fn so that callers to the ref APIs that use this function as a callback can have acess to the unresolved value of a symbolic ref. Signed-off-by: John Cai <[email protected]>
With a previous commit, the reference the symbolic ref points to is saved in the ref iterator records. Instead of making a separate call to resolve_refdup() each time, we can just populate the ref_array_item with the value from the iterator. Signed-off-by: John Cai <[email protected]>
d919007
to
9f609f4
Compare
/submit |
Submitted as [email protected] To fetch this version into
To fetch this version to local tag
|
On the Git mailing list, Junio C Hamano wrote (reply to this): "John Cai via GitGitGadget" <[email protected]> writes:
> John Cai (3):
> refs: keep track of unresolved reference value in iterators
> refs: add referent to each_ref_fn
> ref-filter: populate symref from iterator
Looks good. Let me mark the topic for 'next'. |
This patch series was integrated into seen via 2678c13. |
This patch series was integrated into next via 3183f3d. |
This patch series was integrated into seen via c715a26. |
There was a status update in the "Cooking" section about the branch The refs API has been taught to give symref target information to the users of ref iterators, allowing for-each-ref and friends to avoid an extra ref_resolve_* API call per a symbolic ref. Will merge to 'master'. source: <[email protected]> |
There was a status update in the "Cooking" section about the branch The refs API has been taught to give symref target information to the users of ref iterators, allowing for-each-ref and friends to avoid an extra ref_resolve_* API call per a symbolic ref. Will merge to 'master'. source: <[email protected]> |
There was a status update in the "Graduated to 'master'" section about the branch The refs API has been taught to give symref target information to the users of ref iterators, allowing for-each-ref and friends to avoid an extra ref_resolve_* API call per a symbolic ref. source: <[email protected]> |
This patch series was integrated into seen via e7f86cb. |
This patch series was integrated into master via e7f86cb. |
This patch series was integrated into next via e7f86cb. |
Closed via e7f86cb. |
@@ -245,9 +245,12 @@ static void loose_fill_ref_dir_regular_file(struct files_ref_store *refs, | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, shejialuo wrote (reply to this):
On Fri, Aug 09, 2024 at 03:37:49PM +0000, John Cai via GitGitGadget wrote:
[snip]
> @@ -66,6 +69,7 @@ static void free_ref_entry(struct ref_entry *entry)
> */
> clear_ref_dir(&entry->u.subdir);
> }
> + free(entry->u.value.referent);
> free(entry);
> }
>
Today, I am learning the source code of the "ref-cache.[ch]". I feel
rather confused here. And I think this usage is wrong.
"free_ref_entry" will do the following things:
1. If "entry" is a directory, it will call "clear_ref_dir" which will
call "free_ref_entry" for every loose ref.
2. If "entry" is a loose ref, it will call `free(entry->u.value.referent)`
and `free(entry)`.
The problem is if "entry" is a directory, we will also execute the
following statement:
free(entry->u.value.referent);
This does not make sense. We should never access the "entry->u.value" if
"entry" is a directory. So, I think the correct usage should be:
if (entry->flag & REF_DIR) {
...
clear_ref_dir(...);
} else {
free(entry->u.value.referent);
}
Thanks,
Jialuo
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
User |
For reftable development, it's useful to be able to print out the direct value of a symbolic reference before resolution. This is currently possible with git-for-each-ref, but since the iterators do not keep track of the value of the symbolic ref, a separate call needs to be made each time making it inefficient.
Address this inefficiency by keeping track of the value of the symbolic reference in the ref iterator. This patch series also ends with a commit to use this value in the iterator through callbacks in ref-filter.c.
This series started with [1] but I decided to send a separate patch series since it is substantially different.
Benchmarking shows that with these changes, we experience a speedup in git-for-each-ref(1) on a repository with many symbolic refs:
$ hyperfine --warmup 5 "git for-each-ref --format='%(refname) %(objectname) %(symref)'" "~/Projects/git/git for-each-ref --format='%(refname) %(objectname) %(symref)'"
Benchmark 1: git for-each-ref --format='%(refname) %(objectname) %(symref)'
Time (mean ± σ): 905.1 ms ± 13.2 ms [User: 56.3 ms, System: 628.6 ms]
Range (min … max): 893.4 ms … 936.9 ms 10 runs
Benchmark 2: ~/Projects/git/git for-each-ref --format='%(refname) %(objectname) %(symref)'
Time (mean ± σ): 482.2 ms ± 26.4 ms [User: 34.7 ms, System: 410.6 ms]
Range (min … max): 459.4 ms … 541.8 ms 10 runs
Summary
~/Projects/git/git for-each-ref --format='%(refname) %(objectname) %(symref)' ran
1.88 ± 0.11 times faster than git for-each-ref --format='%(refname) %(objectname) %(symref)'
Changes since V3:
Changes since V2:
Changes since V1
cc: Phillip Wood [email protected]
cc: Kristoffer Haugsbakk [email protected]
cc: Jeff King [email protected]
cc: Patrick Steinhardt [email protected]
cc: Jean-Noël Avila [email protected]
cc: Linus Arver [email protected]
cc: shejialuo [email protected]