Skip to content

keep track of unresolved value of symbolic-ref in ref iterators #1712

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

john-cai
Copy link
Contributor

@john-cai john-cai commented Apr 21, 2024

For reftable development, it's useful to be able to print out the direct value of a symbolic reference before resolution. This is currently possible with git-for-each-ref, but since the iterators do not keep track of the value of the symbolic ref, a separate call needs to be made each time making it inefficient.

Address this inefficiency by keeping track of the value of the symbolic reference in the ref iterator. This patch series also ends with a commit to use this value in the iterator through callbacks in ref-filter.c.

This series started with [1] but I decided to send a separate patch series since it is substantially different.

Benchmarking shows that with these changes, we experience a speedup in git-for-each-ref(1) on a repository with many symbolic refs:

$ hyperfine --warmup 5 "git for-each-ref --format='%(refname) %(objectname) %(symref)'" "~/Projects/git/git for-each-ref --format='%(refname) %(objectname) %(symref)'"
Benchmark 1: git for-each-ref --format='%(refname) %(objectname) %(symref)'
Time (mean ± σ): 905.1 ms ± 13.2 ms [User: 56.3 ms, System: 628.6 ms]
Range (min … max): 893.4 ms … 936.9 ms 10 runs

Benchmark 2: ~/Projects/git/git for-each-ref --format='%(refname) %(objectname) %(symref)'
Time (mean ± σ): 482.2 ms ± 26.4 ms [User: 34.7 ms, System: 410.6 ms]
Range (min … max): 459.4 ms … 541.8 ms 10 runs

Summary
~/Projects/git/git for-each-ref --format='%(refname) %(objectname) %(symref)' ran
1.88 ± 0.11 times faster than git for-each-ref --format='%(refname) %(objectname) %(symref)'

Changes since V3:

  • Added blank line to separate declarations from statements
  • unconditionally set referent member in base iterator in files backend

Changes since V2:

  • Style tweaks
  • Added a NEEDSWORK comment around fallback code
  • Stay consistent in the meaning of a non-NULL referent member in the iterator

Changes since V1

  • Use the return value from refs_resolve_ref_unsafe instead of using an out parameter
  1. https://lore.kernel.org/git/[email protected]/

cc: Phillip Wood [email protected]
cc: Kristoffer Haugsbakk [email protected]
cc: Jeff King [email protected]
cc: Patrick Steinhardt [email protected]
cc: Jean-Noël Avila [email protected]
cc: Linus Arver [email protected]
cc: shejialuo [email protected]

@john-cai john-cai force-pushed the jc/symbolic-ref-in-iterator branch from 0ce74be to 59acda5 Compare May 24, 2024 14:11
@john-cai john-cai force-pushed the jc/symbolic-ref-in-iterator branch 5 times, most recently from e573f45 to 3f599da Compare June 6, 2024 12:53
@john-cai john-cai changed the title keep track of unresolved value of symbolic-ref in iterator keep track of unresolved value of symbolic-ref in ref iterators Jun 6, 2024
@john-cai john-cai force-pushed the jc/symbolic-ref-in-iterator branch from 3f599da to fe591ce Compare June 6, 2024 15:00
@john-cai
Copy link
Contributor Author

john-cai commented Jun 6, 2024

/preview

Copy link

Preview email sent as [email protected]

Copy link

There are issues in commit 9a89508:
use const referent in ref cache
Commit checks stopped - the message is too short
Commit not signed off

@john-cai john-cai force-pushed the jc/symbolic-ref-in-iterator branch from 9a89508 to 6d5b1b6 Compare June 6, 2024 16:04
@john-cai
Copy link
Contributor Author

john-cai commented Jun 6, 2024

/submit

Copy link

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-git-1712/john-cai/jc/symbolic-ref-in-iterator-v1

To fetch this version to local tag pr-git-1712/john-cai/jc/symbolic-ref-in-iterator-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-git-1712/john-cai/jc/symbolic-ref-in-iterator-v1

@@ -533,7 +533,7 @@ static int get_modified_files(struct repository *r,
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

ADMINISTRIVIA.  Check the address you place on the CC: line.  What
we can see for this message at

https://lore.kernel.org/git/011c10f488610b0a795a843bff66723477783761.1717694801.git.gitgitgadget@gmail.com/raw

looks like this.

    Cc: "Phillip Wood [ ]" <[email protected]>,
        Kristoffer Haugsbakk <[[email protected]]>,
        "Jeff King [ ]" <[email protected]>,
        "Patrick Steinhardt [ ]" <[email protected]>,
        "=?UTF-8?Q?Jean-No=C3=ABl?= Avila [ ]" <[email protected]>,
        John Cai <[email protected]>,
        John Cai <[email protected]>

I fixed them manually, but it wasn't pleasant.  I think we saw a
similar breakage earlier coming via GGG, but I do not recall the
details of how to cause such breakages (iow, what to avoid repeating
this).

Anyway.

"John Cai via GitGitGadget" <[email protected]> writes:

>  29 files changed, 64 insertions(+), 52 deletions(-)

Wow, the blast radius of this thing is rather big.  Among these
existing callers of refs_resolve_ref_unsafe(), how many of them
will benefit from being able to pass a non NULL parameter at the end
of the series, and more importantly, in the future to take advantage
of the new feature possibly with a separate series?

I am assuming that this will benefit only a selected few and the
callers that would want to take advantage of the new feature will
remain low.  Have you considered renaming refs_resolve_ref_unsafe()
to a new name (say, refs_resolve_ref_unsafe_with_referent()) and
implement the new feature (which is only triggered when the new
parameter gets a non NULL value), make refs_resolve_ref_unsafe() a
very thin wrapper that passes NULL to the new thing?

That way, you do not have to touch those existing callers that will
never benefit from the new capability in the future.  You won't risk
conflicting with in flight topics semantically, either.

Or will they also benefit from the new feature in the future?

Offhand, I do not know how a caller like this ...

> diff --git a/add-interactive.c b/add-interactive.c
> index b5d6cd689a1..041d30cf2b3 100644
> --- a/add-interactive.c
> +++ b/add-interactive.c
> @@ -533,7 +533,7 @@ static int get_modified_files(struct repository *r,
>  {
>  	struct object_id head_oid;
>  	int is_initial = !refs_resolve_ref_unsafe(get_main_ref_store(the_repository),
> -						  "HEAD", RESOLVE_REF_READING,
> +						  "HEAD", NULL, RESOLVE_REF_READING,
>  						  &head_oid, NULL);
>  	struct collection_status s = { 0 };
>  	int i;

... would be helped.

Thanks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Junio C Hamano <[email protected]> writes:

> "John Cai via GitGitGadget" <[email protected]> writes:
>
>>  29 files changed, 64 insertions(+), 52 deletions(-)
>
> Wow, the blast radius of this thing is rather big.  Among these
> existing callers of refs_resolve_ref_unsafe(), how many of them
> will benefit from being able to pass a non NULL parameter at the end
> of the series, and more importantly, in the future to take advantage
> of the new feature possibly with a separate series?
> ...
> That way, you do not have to touch those existing callers that will
> never benefit from the new capability in the future.  You won't risk
> conflicting with in flight topics semantically, either.

The same comment applies to [3/4], but I do not want to fix the Cc: header
again, so I am replying to this message.

Thanks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, John Cai wrote (reply to this):

Hi Junio,

On 6 Jun 2024, at 14:21, Junio C Hamano wrote:

> ADMINISTRIVIA.  Check the address you place on the CC: line.  What
> we can see for this message at
>
> https://lore.kernel.org/git/011c10f488610b0a795a843bff66723477783761.1717694801.git.gitgitgadget@gmail.com/raw
>
> looks like this.
>
>     Cc: "Phillip Wood [ ]" <[email protected]>,
>         Kristoffer Haugsbakk <[[email protected]]>,
>         "Jeff King [ ]" <[email protected]>,
>         "Patrick Steinhardt [ ]" <[email protected]>,
>         "=?UTF-8?Q?Jean-No=C3=ABl?= Avila [ ]" <[email protected]>,
>         John Cai <[email protected]>,
>         John Cai <[email protected]>
>
> I fixed them manually, but it wasn't pleasant.  I think we saw a
> similar breakage earlier coming via GGG, but I do not recall the
> details of how to cause such breakages (iow, what to avoid repeating
> this).

oof, apologies. Didn't notice that. I'll be more mindful about the cc line.

>
> Anyway.
>
> "John Cai via GitGitGadget" <[email protected]> writes:
>
>>  29 files changed, 64 insertions(+), 52 deletions(-)
>
> Wow, the blast radius of this thing is rather big.  Among these
> existing callers of refs_resolve_ref_unsafe(), how many of them
> will benefit from being able to pass a non NULL parameter at the end
> of the series, and more importantly, in the future to take advantage
> of the new feature possibly with a separate series?
>
> I am assuming that this will benefit only a selected few and the
> callers that would want to take advantage of the new feature will
> remain low.  Have you considered renaming refs_resolve_ref_unsafe()
> to a new name (say, refs_resolve_ref_unsafe_with_referent()) and
> implement the new feature (which is only triggered when the new
> parameter gets a non NULL value), make refs_resolve_ref_unsafe() a
> very thin wrapper that passes NULL to the new thing?
>
> That way, you do not have to touch those existing callers that will
> never benefit from the new capability in the future.  You won't risk
> conflicting with in flight topics semantically, either.
>
> Or will they also benefit from the new feature in the future?
>
> Offhand, I do not know how a caller like this ...
>
>> diff --git a/add-interactive.c b/add-interactive.c
>> index b5d6cd689a1..041d30cf2b3 100644
>> --- a/add-interactive.c
>> +++ b/add-interactive.c
>> @@ -533,7 +533,7 @@ static int get_modified_files(struct repository *r,
>>  {
>>  	struct object_id head_oid;
>>  	int is_initial = !refs_resolve_ref_unsafe(get_main_ref_store(the_repository),
>> -						  "HEAD", RESOLVE_REF_READING,
>> +						  "HEAD", NULL, RESOLVE_REF_READING,
>>  						  &head_oid, NULL);
>>  	struct collection_status s = { 0 };
>>  	int i;
>
> ... would be helped.

Good point. I was sensing the same thing as I made all the callsite changes so
this feedback makes it clear what we should do.

>
> Thanks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

John Cai <[email protected]> writes:

> On 6 Jun 2024, at 14:21, Junio C Hamano wrote:
>
>> ADMINISTRIVIA.  Check the address you place on the CC: line.  What
>> we can see for this message at
>> ...
>> I fixed them manually, but it wasn't pleasant.  I think we saw a
>> similar breakage earlier coming via GGG, but I do not recall the
>> details of how to cause such breakages (iow, what to avoid repeating
>> this).
>
> oof, apologies. Didn't notice that. I'll be more mindful about the cc line.

I found the previous occurrences of the same problem:

  https://lore.kernel.org/git/[email protected]/
  https://lore.kernel.org/git/[email protected]/

The last message in the thread

  https://lore.kernel.org/git/CAMbn=[email protected]/

says that the original user of GGG found what was wrong in the way
the user was using GGG to send and fixed it, but unfortunately we
didn't hear exactly *what* the breakage was and *how* it was fixed.

Aryan, do you remember what the problem was and more importantly
what the fix was?

Thanks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, John Cai wrote (reply to this):

Hi Junio,

On 6 Jun 2024, at 14:23, Junio C Hamano wrote:

> Junio C Hamano <[email protected]> writes:
>
>> "John Cai via GitGitGadget" <[email protected]> writes:
>>
>>>  29 files changed, 64 insertions(+), 52 deletions(-)
>>
>> Wow, the blast radius of this thing is rather big.  Among these
>> existing callers of refs_resolve_ref_unsafe(), how many of them
>> will benefit from being able to pass a non NULL parameter at the end
>> of the series, and more importantly, in the future to take advantage
>> of the new feature possibly with a separate series?
>> ...
>> That way, you do not have to touch those existing callers that will
>> never benefit from the new capability in the future.  You won't risk
>> conflicting with in flight topics semantically, either.
>
> The same comment applies to [3/4], but I do not want to fix the Cc: header
> again, so I am replying to this message.

Yes, so for 3/4 I was exploring doing the same thing. However, each_repo_fn goes
pretty deep in the callstack and to provide an alternate set of functions that
use something like each_repo_referent_fn would still lead to a relatively large
blast radius, eg, something like:


diff --git a/ref-filter.c b/ref-filter.c
index f7fb0c7e0e..770d3715c2 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2631,12 +2631,12 @@ static int filter_exclude_match(struct ref_filter *filter, const char *refname)
  * pattern match, so the callback still has to match each ref individually.
  */
 static int for_each_fullref_in_pattern(struct ref_filter *filter,
-				       each_ref_fn cb,
+				       each_ref_with_referent_fn cb,
 				       void *cb_data)
 {
 	if (filter->kind & FILTER_REFS_ROOT_REFS) {
 		/* In this case, we want to print all refs including root refs. */
-		return refs_for_each_include_root_refs(get_main_ref_store(the_repository),
+		return refs_for_each_include_root_refs_with_referent(get_main_ref_store(the_repository),
 						       cb, cb_data);
 	}

@@ -2646,7 +2646,7 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 		 * prefixes like "refs/heads/" etc. are stripped off,
 		 * so we have to look at everything:
 		 */
-		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
+		return refs_for_each_fullref_in_with_referent(get_main_ref_store(the_repository),
 						"", NULL, cb, cb_data);
 	}

@@ -2656,17 +2656,17 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 		 * so just return everything and let the caller
 		 * sort it out.
 		 */
-		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
+		return refs_for_each_fullref_in_with_referent(get_main_ref_store(the_repository),
 						"", NULL, cb, cb_data);
 	}

 	if (!filter->name_patterns[0]) {
 		/* no patterns; we have to look at everything */
-		return refs_for_each_fullref_in(get_main_ref_store(the_repository),
+		return refs_for_each_fullref_in_with_referent(get_main_ref_store(the_repository),
 						 "", filter->exclude.v, cb, cb_data);
 	}

-	return refs_for_each_fullref_in_prefixes(get_main_ref_store(the_repository),
+	return refs_for_each_fullref_in_prefixes_with_referent(get_main_ref_store(the_repository),
 						 NULL, filter->name_patterns,
 						 filter->exclude.v,
 						 cb, cb_data);
@@ -2781,7 +2781,7 @@ static int filter_ref_kind(struct ref_filter *filter, const char *refname)
 	return ref_kind_from_refname(refname);
 }

-static struct ref_array_item *apply_ref_filter(const char *refname, const struct object_id *oid,
+static struct ref_array_item *apply_ref_filter(const char *refname, const char *referent, const struct object_id *oid,
 			    int flag, struct ref_filter *filter)
 {
 	struct ref_array_item *ref;
@@ -2850,6 +2850,7 @@ static struct ref_array_item *apply_ref_filter(const char *refname, const struct
 	ref->commit = commit;
 	ref->flag = flag;
 	ref->kind = kind;
+	ref->symref = referent;

 	return ref;
 }
@@ -2863,12 +2864,12 @@ struct ref_filter_cbdata {
  * A call-back given to for_each_ref().  Filter refs and keep them for
  * later object processing.
  */
-static int filter_one(const char *refname, const struct object_id *oid, int flag, void *cb_data)
+static int filter_one(const char *refname, const char *referent, const struct object_id *oid, int flag, void *cb_data)
 {
 	struct ref_filter_cbdata *ref_cbdata = cb_data;
 	struct ref_array_item *ref;

-	ref = apply_ref_filter(refname, oid, flag, ref_cbdata->filter);
+	ref = apply_ref_filter(refname, referent, oid, flag, ref_cbdata->filter);
 	if (ref)
 		ref_array_append(ref_cbdata->array, ref);

@@ -2898,13 +2899,13 @@ struct ref_filter_and_format_cbdata {
 	} internal;
 };

-static int filter_and_format_one(const char *refname, const struct object_id *oid, int flag, void *cb_data)
+static int filter_and_format_one(const char *refname, const char *referent, const struct object_id *oid, int flag, void *cb_data)
 {
 	struct ref_filter_and_format_cbdata *ref_cbdata = cb_data;
 	struct ref_array_item *ref;
 	struct strbuf output = STRBUF_INIT, err = STRBUF_INIT;

-	ref = apply_ref_filter(refname, oid, flag, ref_cbdata->filter);
+	ref = apply_ref_filter(refname, referent, oid, flag, ref_cbdata->filter);
 	if (!ref)
 		return 0;

@@ -3050,7 +3051,7 @@ void filter_ahead_behind(struct repository *r,
 	free(commits);
 }

-static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref_fn fn, void *cb_data)
+static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref_with_referent_fn fn, void *cb_data)
 {
 	int ret = 0;

@@ -3070,15 +3071,15 @@ static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref
 		 * of filter_ref_kind().
 		 */
 		if (filter->kind == FILTER_REFS_BRANCHES)
-			ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
+			ret = refs_for_each_fullref_in_with_referent(get_main_ref_store(the_repository),
 						       "refs/heads/", NULL,
 						       fn, cb_data);
 		else if (filter->kind == FILTER_REFS_REMOTES)
-			ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
+			ret = refs_for_each_fullref_in_with_referent(get_main_ref_store(the_repository),
 						       "refs/remotes/", NULL,
 						       fn, cb_data);
 		else if (filter->kind == FILTER_REFS_TAGS)
-			ret = refs_for_each_fullref_in(get_main_ref_store(the_repository),
+			ret = refs_for_each_fullref_in_with_referent(get_main_ref_store(the_repository),
 						       "refs/tags/", NULL, fn,
 						       cb_data);
 		else if (filter->kind & FILTER_REFS_REGULAR)
@@ -3090,7 +3091,7 @@ static int do_filter_refs(struct ref_filter *filter, unsigned int type, each_ref
 		 */
 		if (!ret && !(filter->kind & FILTER_REFS_ROOT_REFS) &&
 		    (filter->kind & FILTER_REFS_DETACHED_HEAD))
-			refs_head_ref(get_main_ref_store(the_repository), fn,
+			refs_head_ref_referent(get_main_ref_store(the_repository), fn,
 				      cb_data);
 	}

diff --git a/refs.c b/refs.c
index 6dcb0288cb..4366f35586 100644
--- a/refs.c
+++ b/refs.c
@@ -1529,6 +1529,19 @@ int refs_head_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 	return 0;
 }

+int refs_head_ref_referent(struct ref_store *refs, each_ref_with_referent_fn fn, void *cb_data)
+{
+	struct object_id oid;
+	int flag;
+	const char *referent;
+
+	if (refs_resolve_ref_unsafe_with_referent(refs, "HEAD", referent, RESOLVE_REF_READING,
+				    &oid, &flag))
+		return fn("HEAD", referent, &oid, flag, cb_data);
+
+	return 0;
+}
+
 struct ref_iterator *refs_ref_iterator_begin(
 		struct ref_store *refs,
 		const char *prefix,
@@ -1560,6 +1573,22 @@ struct ref_iterator *refs_ref_iterator_begin(
 	return iter;
 }

+static int do_for_each_ref_with_referent(struct ref_store *refs, const char *prefix,
+			   const char **exclude_patterns,
+			   each_ref_with_referent_fn fn, int trim,
+			   enum do_for_each_ref_flags flags, void *cb_data)
+{
+	struct ref_iterator *iter;
+
+	if (!refs)
+		return 0;
+
+	iter = refs_ref_iterator_begin(refs, prefix, exclude_patterns, trim,
+				       flags);
+
+	return do_for_each_ref_iterator_with_referent(iter, fn, cb_data);
+}
+
 static int do_for_each_ref(struct ref_store *refs, const char *prefix,
 			   const char **exclude_patterns,
 			   each_ref_fn fn, int trim,
@@ -1594,6 +1623,13 @@ int refs_for_each_fullref_in(struct ref_store *refs, const char *prefix,
 	return do_for_each_ref(refs, prefix, exclude_patterns, fn, 0, 0, cb_data);
 }

+int refs_for_each_fullref_in_with_referent(struct ref_store *refs, const char *prefix,
+			     const char **exclude_patterns,
+			     each_ref_with_referent_fn fn, void *cb_data)
+{
+	return do_for_each_ref_with_referent(refs, prefix, exclude_patterns, fn, 0, 0, cb_data);
+}
+
 int refs_for_each_replace_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
 	const char *git_replace_ref_base = ref_namespace[NAMESPACE_REPLACE].ref;
@@ -1627,6 +1663,13 @@ int refs_for_each_include_root_refs(struct ref_store *refs, each_ref_fn fn,
 			       DO_FOR_EACH_INCLUDE_ROOT_REFS, cb_data);
 }

+int refs_for_each_include_root_refs_with_referent(struct ref_store *refs, each_ref_with_referent_fn fn,
+				    void *cb_data)
+{
+	return do_for_each_ref_with_referent(refs, "", NULL, fn, 0,
+			       DO_FOR_EACH_INCLUDE_ROOT_REFS, cb_data);
+}
+
 static int qsort_strcmp(const void *va, const void *vb)
 {
 	const char *a = *(const char **)va;
@@ -1716,6 +1759,37 @@ int refs_for_each_fullref_in_prefixes(struct ref_store *ref_store,
 	return ret;
 }

+int refs_for_each_fullref_in_prefixes_with_referent(struct ref_store *ref_store,
+				      const char *namespace,
+				      const char **patterns,
+				      const char **exclude_patterns,
+				      each_ref_with_referent_fn fn, void *cb_data)
+{
+	struct string_list prefixes = STRING_LIST_INIT_DUP;
+	struct string_list_item *prefix;
+	struct strbuf buf = STRBUF_INIT;
+	int ret = 0, namespace_len;
+
+	find_longest_prefixes(&prefixes, patterns);
+
+	if (namespace)
+		strbuf_addstr(&buf, namespace);
+	namespace_len = buf.len;
+
+	for_each_string_list_item(prefix, &prefixes) {
+		strbuf_addstr(&buf, prefix->string);
+		ret = refs_for_each_fullref_in_with_referent(ref_store, buf.buf,
+					       exclude_patterns, fn, cb_data);
+		if (ret)
+			break;
+		strbuf_setlen(&buf, namespace_len);
+	}
+
+	string_list_clear(&prefixes, 0);
+	strbuf_release(&buf);
+	return ret;
+}
+
 static int refs_read_special_head(struct ref_store *ref_store,
 				  const char *refname, struct object_id *oid,
 				  struct strbuf *referent, unsigned int *type,
diff --git a/refs.h b/refs.h
index ba6d0e0753..d8387c6296 100644
--- a/refs.h
+++ b/refs.h
@@ -304,6 +304,9 @@ struct ref_transaction;
 typedef int each_ref_fn(const char *refname,
 			const struct object_id *oid, int flags, void *cb_data);

+typedef int each_ref_with_referent_fn(const char *refname, const char *referent,
+			const struct object_id *oid, int flags, void *cb_data);
+
 /*
  * The following functions invoke the specified callback function for
  * each reference indicated.  If the function ever returns a nonzero
@@ -315,6 +318,8 @@ typedef int each_ref_fn(const char *refname,
  */
 int refs_head_ref(struct ref_store *refs,
 		  each_ref_fn fn, void *cb_data);
+int refs_head_ref_referent(struct ref_store *refs,
+		  each_ref_with_referent_fn fn, void *cb_data);
 int refs_for_each_ref(struct ref_store *refs,
 		      each_ref_fn fn, void *cb_data);
 int refs_for_each_ref_in(struct ref_store *refs, const char *prefix,
@@ -351,6 +356,17 @@ int refs_for_each_fullref_in_prefixes(struct ref_store *refs,
 				      const char **exclude_patterns,
 				      each_ref_fn fn, void *cb_data);

+int refs_for_each_fullref_in_prefixes_with_referent(struct ref_store *refs,
+				      const char *namespace,
+				      const char **patterns,
+				      const char **exclude_patterns,
+				      each_ref_with_referent_fn fn, void *cb_data);
+
+int refs_for_each_fullref_in_with_referent(struct ref_store *refs, const char *prefix,
+			     const char **exclude_patterns,
+			     each_ref_with_referent_fn fn, void *cb_data);
+
+
 /* iterates all refs that match the specified glob pattern. */
 int refs_for_each_glob_ref(struct ref_store *refs, each_ref_fn fn,
 			   const char *pattern, void *cb_data);
@@ -377,6 +393,10 @@ int refs_for_each_rawref(struct ref_store *refs, each_ref_fn fn, void *cb_data);
 int refs_for_each_include_root_refs(struct ref_store *refs, each_ref_fn fn,
 				    void *cb_data);

+int refs_for_each_include_root_refs_with_referent(struct ref_store *refs, each_ref_with_referent_fn fn,
+				    void *cb_data);
+
+
 /*
  * Normalizes partial refs to their fully qualified form.
  * Will prepend <prefix> to the <pattern> if it doesn't start with 'refs/'.
diff --git a/refs/iterator.c b/refs/iterator.c
index 26acb6bd64..26c38ec6de 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -469,3 +469,30 @@ int do_for_each_ref_iterator(struct ref_iterator *iter,
 		return -1;
 	return retval;
 }
+
+int do_for_each_ref_iterator_with_referent(struct ref_iterator *iter,
+			     each_ref_with_referent_fn fn, void *cb_data)
+{
+	int retval = 0, ok;
+	struct ref_iterator *old_ref_iter = current_ref_iter;
+
+	current_ref_iter = iter;
+	while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
+		retval = fn(iter->refname, iter->referent, iter->oid, iter->flags, cb_data);
+		if (retval) {
+			/*
+			 * If ref_iterator_abort() returns ITER_ERROR,
+			 * we ignore that error in deference to the
+			 * callback function's return value.
+			 */
+			ref_iterator_abort(iter);
+			goto out;
+		}
+	}
+
+out:
+	current_ref_iter = old_ref_iter;
+	if (ok == ITER_ERROR)
+		return -1;
+	return retval;
+}
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 0775451435..ebec407468 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -486,6 +486,9 @@ extern struct ref_iterator *current_ref_iter;
  */
 int do_for_each_ref_iterator(struct ref_iterator *iter,
 			     each_ref_fn fn, void *cb_data);
+int do_for_each_ref_iterator_with_referent(struct ref_iterator *iter,
+			     each_ref_with_referent_fn fn, void *cb_data);
+

 struct ref_store;


Which is a little bit unseemly in my view...so between the two options I would be
inclined to go with modifying each_repo_fn than create a whole subset set of
helpers. wdyt?

thanks
John

>
> Thanks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Patrick Steinhardt <[email protected]> writes:

> I wonder whether we can future-proof the code a bit more by introducing
> a struct that contains everything we want to pass to the callback
> function.

That would hopefully make a change with a large blast a one-time
event.  But at the same time, it may end up making it too opaque and
hard to verify if all the API functions are using/updating/verifying
all the members of the struct as they should.  Compared to that,
unused parameters are easier to verify mechanically by compilers.

> This would also allow us to get rid of this awful `peel_iterated_oid()`
> function that relies on global state in many places, as we can put the
> peeled OID into that structure, as well.

Yes, such a benefit may justify a one-time "affect many callers"
event.  Or the underlying for_each_*() friend of functions can be
updated to use a single struct and then the current "only selected
parameters that are used ar passed" API can be made into a set of
thin wrappers around it, and then callers can be converted one step
at a time, in a multi-step series, perhaps.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, "Kristoffer Haugsbakk" wrote (reply to this):

On Fri, Jun 7, 2024, at 00:44, Junio C Hamano wrote:
> I found the previous occurrences of the same problem:
>
>   https://lore.kernel.org/git/[email protected]/
>   https://lore.kernel.org/git/[email protected]/
>
> The last message in the thread
>
>
> https://lore.kernel.org/git/CAMbn=B7J4ODf9ybJQpL1bZZ7qdWSDGaLEyTmVv+ZBiSeC9T+yw@mail.gmail.com/
>
> says that the original user of GGG found what was wrong in the way
> the user was using GGG to send and fixed it, but unfortunately we
> didn't hear exactly *what* the breakage was and *how* it was fixed.
>
> Aryan, do you remember what the problem was and more importantly
> what the fix was?
>
> Thanks.

Yes, Aryan didn’t explain what the issue was. But Linus Arver (+CC) in
that first thread/link figured out what was happening in his case:[1]

> I realize now that it's because I copy/pasted the "Cc: ..." lines in the PR
> description from
> https://github.com/gitgitgadget/git/pull/1632#issue-2068188239, such
> that when I pasted those in for the PR description for this series at
> https://github.com/gitgitgadget/git/pull/1694#issue-2187804953, it
> carried over the email addresses as Markdown-formatted hyperlinks.
> Currently it reads
>
>    Cc: Christian Couder [[email protected]](mailto:[email protected])
>    Cc: Junio C Hamano [[email protected]](mailto:[email protected])
>    Cc: Emily Shaffer [[email protected]](mailto:[email protected])
>    cc: Josh Steadmon [[email protected]](mailto:[email protected])
>    cc: Randall S. Becker [[email protected]](mailto:[email protected])
>    cc: Christian Couder [[email protected]](mailto:[email protected])
>    cc: "Kristoffer Haugsbakk" [[email protected]](mailto:[email protected])
>    cc: "Kristoffer Haugsbakk" <[email protected]>
>
> when I click on "edit", where the last line must be from your manual
> fix which GGG picked up. I've cleaned up the PR description manually
> now, and for this message I'm also attempting to clean up those square
> brackets.

When I read that I assumed that Aryan had made the same mistake.

🔗 1: https://lore.kernel.org/git/[email protected]/

-- 
Kristoffer Haugsbakk

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Kristoffer Haugsbakk" <[email protected]> writes:

>> ...
>> https://github.com/gitgitgadget/git/pull/1694#issue-2187804953, it
>> carried over the email addresses as Markdown-formatted hyperlinks.
>> Currently it reads
>>
>>    Cc: Christian Couder [[email protected]](mailto:[email protected])
>>...
>>    cc: "Kristoffer Haugsbakk" <[email protected]>
>>
>> when I click on "edit", where the last line must be from your manual
>> fix which GGG picked up. I've cleaned up the PR description manually
>> now, and for this message I'm also attempting to clean up those square
>> brackets.
>
> When I read that I assumed that Aryan had made the same mistake.

Ah, I see.

I wonder if there are things we can do to help reduce the pain in
future users of GGG here.

 - Is there some documentation update we can make on our end,
   perhaps in Documentation/MyFirstContribution (SubmittingPatches
   does not talk about GGG other than refering readers to
   MyFirstContribution)?

 - Or the welcome message GGG adds to the pull-request that begins
   with "Thanks for taking the time to contribute" can mention
   something about this?

 - Or possibly the handling of Cc: addresses done by GGG can be
   tweaked not to cause this?

Anything else?

Thanks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Linus Arver wrote (reply to this):

Junio C Hamano <[email protected]> writes:

> "Kristoffer Haugsbakk" <[email protected]> writes:
>
>>> ...
>>> https://github.com/gitgitgadget/git/pull/1694#issue-2187804953, it
>>> carried over the email addresses as Markdown-formatted hyperlinks.
>>> Currently it reads
>>>
>>>    Cc: Christian Couder [[email protected]](mailto:[email protected])
>>>...
>>>    cc: "Kristoffer Haugsbakk" <[email protected]>
>>>
>>> when I click on "edit", where the last line must be from your manual
>>> fix which GGG picked up. I've cleaned up the PR description manually
>>> now, and for this message I'm also attempting to clean up those square
>>> brackets.
>>
>> When I read that I assumed that Aryan had made the same mistake.
>
> Ah, I see.
>
> I wonder if there are things we can do to help reduce the pain in
> future users of GGG here.
>
>  [...]
>
>  - Or the welcome message GGG adds to the pull-request that begins
>    with "Thanks for taking the time to contribute" can mention
>    something about this?

I've created https://github.com/gitgitgadget/gitgitgadget/pull/1644 to
address this for now.
>
>  - Or possibly the handling of Cc: addresses done by GGG can be
>    tweaked not to cause this?

I wanted to do this first but was deterred by my lack of familiarity
with TypeScript.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Linus Arver <[email protected]> writes:

>>  - Or the welcome message GGG adds to the pull-request that begins
>>    with "Thanks for taking the time to contribute" can mention
>>    something about this?
>
> I've created https://github.com/gitgitgadget/gitgitgadget/pull/1644 to
> address this for now.

Thanks.

@@ -533,7 +533,7 @@ static int get_modified_files(struct repository *r,
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, Jun 06, 2024 at 05:26:37PM +0000, John Cai via GitGitGadget wrote:

> From: John Cai <[email protected]>
> 
> refs_resolve_ref_unsafe retrieves the referent, the unresolved value of
> a reference. Add a parameter to allow refs_resolve_ref_unsafe to pass up
> the value of referent to the caller so it can save this value in ref
> iterators for more efficient access.

This commit message left me with a lot of questions.

For one, it wasn't immediately obvious to me what a "referent" is. ;) I
think an example could help. If I understand, you mean that if you have
a situation like:

  - refs/heads/one is a symref pointing to refs/heads/two
  - refs/heads/two is a regular ref

and we resolve "one", then "two" is the referent? And the caller might
want to know that?

But I think we already pass that out as the return value from
refs_resolve_ref_unsafe(). That is how something like "rev-parse
--symbolic-full-name" works now.

But there are some subtleties. In a chain of symbolic refs (say, "two"
is a symbolic ref to "three"), we return only the final name ("three").
And you might want to know about "two".

You can pass RESOLVE_REF_NO_RECURSE to inhibit this, and get back just
"two". You can see that now with "git symbolic-ref --no-recurse". The
downside is that we never look at the referent at all, so you get only
the symref value (and no information about the actual oid, or if the
referent even exists). You would still get an oid for any non-symrefs
you examine.

So reading between the lines, you have a caller in mind which wants to
know the immediate referent in addition to the final recursive oid?

Looking at the rest of your series, I guess that caller is the one in
loose_fill_ref_dir_regular_file(), so that it can get passed to the
for-each-ref callback. But why is it right thing for it to record and
pass along the immediate referent there, and not the final one? For that
matter, would a caller ever want to see the whole chain of
one/two/three?

> @@ -1761,6 +1761,7 @@ int refs_read_symbolic_ref(struct ref_store *ref_store, const char *refname,
>  
>  const char *refs_resolve_ref_unsafe(struct ref_store *refs,
>  				    const char *refname,
> +				    const char *referent,
>  				    int resolve_flags,
>  				    struct object_id *oid,
>  				    int *flags)

Unless I am misunderstanding the purpose of your patch completely, this
"referent" is meant to be an out-parameter, right? In which case,
shouldn't it be "const char **referent"?

As the code is now:

> @@ -1822,6 +1823,9 @@ const char *refs_resolve_ref_unsafe(struct ref_store *refs,
>  		}
>  
>  		*flags |= read_flags;
> +		if (referent && (read_flags & REF_ISSYMREF) &&
> +		    sb_refname.len > 0)
> +			referent = sb_refname.buf;
>  
>  		if (!(read_flags & REF_ISSYMREF)) {
>  			if (*flags & REF_BAD_NAME) {

...we'd assign the local "referent" pointer to our refname buf, but
the caller would never see that. Plus doing so would not help you
anyway, since sb_refname will be used again as we recurse. So at best,
you end up with the final name in the chain anyway. Or at worst,
sb_refname gets reallocated and "referent" is left as a dangling
pointer.

-Peff

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, John Cai wrote (reply to this):

Hey Peff,

On 11 Jun 2024, at 4:50, Jeff King wrote:

> On Thu, Jun 06, 2024 at 05:26:37PM +0000, John Cai via GitGitGadget wrote:
>
>> From: John Cai <[email protected]>
>>
>> refs_resolve_ref_unsafe retrieves the referent, the unresolved value of
>> a reference. Add a parameter to allow refs_resolve_ref_unsafe to pass up
>> the value of referent to the caller so it can save this value in ref
>> iterators for more efficient access.
>
> This commit message left me with a lot of questions.
>
> For one, it wasn't immediately obvious to me what a "referent" is. ;) I
> think an example could help. If I understand, you mean that if you have
> a situation like:
>
>   - refs/heads/one is a symref pointing to refs/heads/two
>   - refs/heads/two is a regular ref
>
> and we resolve "one", then "two" is the referent? And the caller might
> want to know that?
>
> But I think we already pass that out as the return value from
> refs_resolve_ref_unsafe(). That is how something like "rev-parse
> --symbolic-full-name" works now.

Yes, exactly. I think you're right that it'd be preferable to just use the
output of refs_resolve_ref_unsafe() to get the value of the referent.
>
> But there are some subtleties. In a chain of symbolic refs (say, "two"
> is a symbolic ref to "three"), we return only the final name ("three").
> And you might want to know about "two".
>
> You can pass RESOLVE_REF_NO_RECURSE to inhibit this, and get back just
> "two". You can see that now with "git symbolic-ref --no-recurse". The
> downside is that we never look at the referent at all, so you get only
> the symref value (and no information about the actual oid, or if the
> referent even exists). You would still get an oid for any non-symrefs
> you examine.
>
> So reading between the lines, you have a caller in mind which wants to
> know the immediate referent in addition to the final recursive oid?

The goal is to keep track of the value that %(symref) would need in the
iterator so that a separate call doesn't need to be made.

>
> Looking at the rest of your series, I guess that caller is the one in
> loose_fill_ref_dir_regular_file(), so that it can get passed to the
> for-each-ref callback. But why is it right thing for it to record and
> pass along the immediate referent there, and not the final one? For that
> matter, would a caller ever want to see the whole chain of
> one/two/three?

Right, the final referent is the right one to pass down.

>
>> @@ -1761,6 +1761,7 @@ int refs_read_symbolic_ref(struct ref_store *ref_store, const char *refname,
>>
>>  const char *refs_resolve_ref_unsafe(struct ref_store *refs,
>>  				    const char *refname,
>> +				    const char *referent,
>>  				    int resolve_flags,
>>  				    struct object_id *oid,
>>  				    int *flags)
>
> Unless I am misunderstanding the purpose of your patch completely, this
> "referent" is meant to be an out-parameter, right? In which case,
> shouldn't it be "const char **referent"?
>
> As the code is now:
>
>> @@ -1822,6 +1823,9 @@ const char *refs_resolve_ref_unsafe(struct ref_store *refs,
>>  		}
>>
>>  		*flags |= read_flags;
>> +		if (referent && (read_flags & REF_ISSYMREF) &&
>> +		    sb_refname.len > 0)
>> +			referent = sb_refname.buf;
>>
>>  		if (!(read_flags & REF_ISSYMREF)) {
>>  			if (*flags & REF_BAD_NAME) {
>
> ...we'd assign the local "referent" pointer to our refname buf, but
> the caller would never see that. Plus doing so would not help you
> anyway, since sb_refname will be used again as we recurse. So at best,
> you end up with the final name in the chain anyway. Or at worst,
> sb_refname gets reallocated and "referent" is left as a dangling
> pointer.

Going to include changes to remove the out-parameter which will simplify things
quite a bit.

>
> -Peff

thanks for the review!
John

@@ -243,8 +243,9 @@ static void loose_fill_ref_dir_regular_file(struct files_ref_store *refs,
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, Jun 06, 2024 at 05:26:38PM +0000, John Cai via GitGitGadget wrote:

> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index bf2ffe062ea..a963d796a29 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -243,8 +243,9 @@ static void loose_fill_ref_dir_regular_file(struct files_ref_store *refs,
>  {
>  	struct object_id oid;
>  	int flag;
> +	const char* referent = NULL;
>  
> -	if (!refs_resolve_ref_unsafe(&refs->base, refname, NULL, RESOLVE_REF_READING,
> +	if (!refs_resolve_ref_unsafe(&refs->base, refname, referent, RESOLVE_REF_READING,
>  				     &oid, &flag)) {
>  		oidclr(&oid);
>  		flag |= REF_ISBROKEN;

Here we pass in NULL, so the code in refs_resolve_ref_unsafe() won't do
anything. And our copy of "referent" here will remain NULL, so the rest
of this patch also does nothing. Again, I think that the function should
take a "char **", and you'd pass in &referent here?

Though if we are OK with surfacing just the final value in a
multi-element chain, then you could just use the existing return value,
like:

  referent = refs_resolve_ref_unsafe(&refs->base, refname,
				     RESOLVE_REF_REAEDING, &oid, &flags);
  if (!referent) {
          oidclr(&oid);
          flag |= REF_ISBROKEN;
  }

and then later pass "referent" to create_ref_entry() if flags contains
REF_ISSYMREF (or since we pass it the flags, it could do that check
itself).

-Peff

Copy link

User "Kristoffer Haugsbakk" <[email protected]> has been added to the cc: list.

Copy link

User Linus Arver <[email protected]> has been added to the cc: list.

@john-cai john-cai force-pushed the jc/symbolic-ref-in-iterator branch 3 times, most recently from adb273b to 3e147e7 Compare July 30, 2024 19:02
@john-cai
Copy link
Contributor Author

john-cai commented Aug 1, 2024

/preview

Copy link

Preview email sent as [email protected]

@john-cai
Copy link
Contributor Author

john-cai commented Aug 1, 2024

/preview

Copy link

Preview email sent as [email protected]

@john-cai
Copy link
Contributor Author

john-cai commented Aug 1, 2024

/submit

Copy link

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-git-1712/john-cai/jc/symbolic-ref-in-iterator-v2

To fetch this version to local tag pr-git-1712/john-cai/jc/symbolic-ref-in-iterator-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-git-1712/john-cai/jc/symbolic-ref-in-iterator-v2

@@ -245,9 +245,11 @@ static void loose_fill_ref_dir_regular_file(struct files_ref_store *refs,
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"John Cai via GitGitGadget" <[email protected]> writes:

> @@ -245,9 +245,11 @@ static void loose_fill_ref_dir_regular_file(struct files_ref_store *refs,
>  {
>  	struct object_id oid;
>  	int flag;
> -
> -	if (!refs_resolve_ref_unsafe(&refs->base, refname, RESOLVE_REF_READING,
> -				     &oid, &flag)) {
> +	const char* referent = refs_resolve_ref_unsafe(&refs->base,

Style.  The asterisk sticks to the pointer variable, not the base type.

> +						       refname,
> +						       RESOLVE_REF_READING,
> +						       &oid, &flag);
> +	if (!referent) {
>  		oidclr(&oid, the_repository->hash_algo);
>  		flag |= REF_ISBROKEN;
>  	} else if (is_null_oid(&oid)) {
> @@ -268,7 +270,8 @@ static void loose_fill_ref_dir_regular_file(struct files_ref_store *refs,
>  		oidclr(&oid, the_repository->hash_algo);
>  		flag |= REF_BAD_NAME | REF_ISBROKEN;
>  	}
> -	add_entry_to_dir(dir, create_ref_entry(refname, &oid, flag));
> +
> +	add_entry_to_dir(dir, create_ref_entry(refname, referent, &oid, flag));
>  }

This is a very straight-forward change, given the matching change to
the ref-entry, which now has a referent member.

> @@ -886,6 +889,9 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
>  		iter->base.refname = iter->iter0->refname;
>  		iter->base.oid = iter->iter0->oid;
>  		iter->base.flags = iter->iter0->flags;
> +		if (iter->iter0->flags & REF_ISSYMREF)
> +			iter->base.referent = iter->iter0->referent;

Presumably base.referent is initialized to NULL so this "if"
statement does not need an else clause?  

I am primarily wondering why this needs to be conditional.  We are
making verbatim copy of the flags word from *iter->iter0 to
iter->base; if iter0 is symref we want to mark base also as symref,
If iter0 is not a symref, then we want to mark base also not a
symref, presumably.  So wouldn't it make sense to just say

		iter->base.referent = iter->iter0->referent;

here, regardless of what iter->iter0->flags say about symref-ness of
the thing?  Because ...

> diff --git a/refs/iterator.c b/refs/iterator.c
> index d355ebf0d59..26acb6bd640 100644
> --- a/refs/iterator.c
> +++ b/refs/iterator.c
> @@ -7,6 +7,7 @@
>  #include "refs.h"
>  #include "refs/refs-internal.h"
>  #include "iterator.h"
> +#include "strbuf.h"
>  
>  int ref_iterator_advance(struct ref_iterator *ref_iterator)
>  {
> @@ -29,6 +30,7 @@ void base_ref_iterator_init(struct ref_iterator *iter,
>  {
>  	iter->vtable = vtable;
>  	iter->refname = NULL;
> +	iter->referent = NULL;
>  	iter->oid = NULL;
>  	iter->flags = 0;
>  }
> @@ -199,6 +201,7 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
>  		}
>  
>  		if (selection & ITER_YIELD_CURRENT) {
> +			iter->base.referent = (*iter->current)->referent;
>  			iter->base.refname = (*iter->current)->refname;
>  			iter->base.oid = (*iter->current)->oid;
>  			iter->base.flags = (*iter->current)->flags;

... other parts of the API seem to follow that philosophy.

In other words, the invariant of this code is that .referent is
non-NULL if and only if the ref is a symref, and that invariant is
trusted without checking with .flags member.  But the earlier hunk
that copied iter0 to base did not seem to be using that invariant,
which made it look inconsistent.

>  struct ref_entry *create_ref_entry(const char *refname,
> +				   const char *referent,
>  				   const struct object_id *oid, int flag)
>  {
>  	struct ref_entry *ref;
> @@ -41,6 +43,10 @@ struct ref_entry *create_ref_entry(const char *refname,
>  	FLEX_ALLOC_STR(ref, name, refname);
>  	oidcpy(&ref->u.value.oid, oid);
>  	ref->flag = flag;
> +
> +	if (flag & REF_ISSYMREF)
> +		ref->u.value.referent = xstrdup_or_null(referent);

OK.  ref_value now has referent next to the existing oid, but that
member only makes sense when flag says it is a symref.

Curiously, that information is missing from ref_value struct, so by
looking at a ref_value alone, we cannot tell if we should trust the
value in the .referent member?

Makes me wonder if we should follow the same "ignore what the flag
says when filling the .referent member; if the ref is not a symref,
the referent variable is NULL, and if it is, referent is never NULL"
pattern?  Then ref->u.value.referent is _always_ defined---the
current code says "the u.value.referent member is undefined for ref
that is not a symref", but with the suggested change, it will be
"the u.value.referent member is NULL for ref that is not a symref,
and for a symref, it is the value of the symref".

>  	return ref;
>  }
>  
> @@ -66,6 +72,7 @@ static void free_ref_entry(struct ref_entry *entry)
>  		 */
>  		clear_ref_dir(&entry->u.subdir);
>  	}
> +	free(entry->u.value.referent);

And that would match what is done here.  We do not say "entry->flag
says it is not a symref, so do not bother freeing u.value.referent".

> @@ -431,6 +438,7 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
>  			level->index = -1;
>  		} else {
>  			iter->base.refname = entry->name;
> +			iter->base.referent = entry->u.value.referent;
>  			iter->base.oid = &entry->u.value.oid;
>  			iter->base.flags = entry->flag;
>  			return ITER_OK;
> diff --git a/refs/ref-cache.h b/refs/ref-cache.h
> index 31ebe24f6cf..5f04e518c37 100644
> --- a/refs/ref-cache.h
> +++ b/refs/ref-cache.h
> @@ -42,6 +42,7 @@ struct ref_value {
>  	 * referred to by the last reference in the symlink chain.
>  	 */
>  	struct object_id oid;
> +	char *referent;
>  };
>  
>  /*
> @@ -173,6 +174,7 @@ struct ref_entry *create_dir_entry(struct ref_cache *cache,
>  				   const char *dirname, size_t len);
>  
>  struct ref_entry *create_ref_entry(const char *refname,
> +				   const char *referent,
>  				   const struct object_id *oid, int flag);
>  
>  /*
> diff --git a/refs/refs-internal.h b/refs/refs-internal.h
> index fa975d69aaa..117ec233848 100644
> --- a/refs/refs-internal.h
> +++ b/refs/refs-internal.h
> @@ -299,6 +299,7 @@ enum do_for_each_ref_flags {
>  struct ref_iterator {
>  	struct ref_iterator_vtable *vtable;
>  	const char *refname;
> +	const char *referent;
>  	const struct object_id *oid;
>  	unsigned int flags;
>  };
> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> index fbe74c239d3..9f724c3d632 100644
> --- a/refs/reftable-backend.c
> +++ b/refs/reftable-backend.c
> @@ -494,8 +494,12 @@ static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
>  				the_repository->hash_algo);
>  			break;
>  		case REFTABLE_REF_SYMREF:
> -			if (!refs_resolve_ref_unsafe(&iter->refs->base, iter->ref.refname,
> -						     RESOLVE_REF_READING, &iter->oid, &flags))
> +			iter->base.referent = refs_resolve_ref_unsafe(&iter->refs->base,
> +								      iter->ref.refname,
> +								      RESOLVE_REF_READING,
> +								      &iter->oid,
> +								      &flags);
> +			if (!iter->base.referent)
>  				oidclr(&iter->oid, the_repository->hash_algo);
>  			break;
>  		default:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Patrick Steinhardt wrote (reply to this):

On Thu, Aug 01, 2024 at 09:41:03AM -0700, Junio C Hamano wrote:
> "John Cai via GitGitGadget" <[email protected]> writes:
> > @@ -886,6 +889,9 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
> >  		iter->base.refname = iter->iter0->refname;
> >  		iter->base.oid = iter->iter0->oid;
> >  		iter->base.flags = iter->iter0->flags;
> > +		if (iter->iter0->flags & REF_ISSYMREF)
> > +			iter->base.referent = iter->iter0->referent;
> 
> Presumably base.referent is initialized to NULL so this "if"
> statement does not need an else clause?

This function typically ends up being called in a loop though. So
without the else clause, wouldn't we potentially leak the value of a
preceding ref into subsequent iterations like this?

> I am primarily wondering why this needs to be conditional.  We are
> making verbatim copy of the flags word from *iter->iter0 to
> iter->base; if iter0 is symref we want to mark base also as symref,
> If iter0 is not a symref, then we want to mark base also not a
> symref, presumably.  So wouldn't it make sense to just say
> 
> 		iter->base.referent = iter->iter0->referent;
> 
> here, regardless of what iter->iter0->flags say about symref-ness of
> the thing?  Because ...

> > diff --git a/refs/iterator.c b/refs/iterator.c
> > index d355ebf0d59..26acb6bd640 100644
> > --- a/refs/iterator.c
> > +++ b/refs/iterator.c
> > @@ -7,6 +7,7 @@
> >  #include "refs.h"
> >  #include "refs/refs-internal.h"
> >  #include "iterator.h"
> > +#include "strbuf.h"
> >  
> >  int ref_iterator_advance(struct ref_iterator *ref_iterator)
> >  {
> > @@ -29,6 +30,7 @@ void base_ref_iterator_init(struct ref_iterator *iter,
> >  {
> >  	iter->vtable = vtable;
> >  	iter->refname = NULL;
> > +	iter->referent = NULL;
> >  	iter->oid = NULL;
> >  	iter->flags = 0;
> >  }
> > @@ -199,6 +201,7 @@ static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
> >  		}
> >  
> >  		if (selection & ITER_YIELD_CURRENT) {
> > +			iter->base.referent = (*iter->current)->referent;
> >  			iter->base.refname = (*iter->current)->refname;
> >  			iter->base.oid = (*iter->current)->oid;
> >  			iter->base.flags = (*iter->current)->flags;
> 
> ... other parts of the API seem to follow that philosophy.
> 
> In other words, the invariant of this code is that .referent is
> non-NULL if and only if the ref is a symref, and that invariant is
> trusted without checking with .flags member.  But the earlier hunk
> that copied iter0 to base did not seem to be using that invariant,
> which made it look inconsistent.

Agreed.

> >  struct ref_entry *create_ref_entry(const char *refname,
> > +				   const char *referent,
> >  				   const struct object_id *oid, int flag)
> >  {
> >  	struct ref_entry *ref;
> > @@ -41,6 +43,10 @@ struct ref_entry *create_ref_entry(const char *refname,
> >  	FLEX_ALLOC_STR(ref, name, refname);
> >  	oidcpy(&ref->u.value.oid, oid);
> >  	ref->flag = flag;
> > +
> > +	if (flag & REF_ISSYMREF)
> > +		ref->u.value.referent = xstrdup_or_null(referent);
> 
> OK.  ref_value now has referent next to the existing oid, but that
> member only makes sense when flag says it is a symref.
> 
> Curiously, that information is missing from ref_value struct, so by
> looking at a ref_value alone, we cannot tell if we should trust the
> value in the .referent member?
> 
> Makes me wonder if we should follow the same "ignore what the flag
> says when filling the .referent member; if the ref is not a symref,
> the referent variable is NULL, and if it is, referent is never NULL"
> pattern?  Then ref->u.value.referent is _always_ defined---the
> current code says "the u.value.referent member is undefined for ref
> that is not a symref", but with the suggested change, it will be
> "the u.value.referent member is NULL for ref that is not a symref,
> and for a symref, it is the value of the symref".

Yeah, I think that would be preferable indeed.

Patrick

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Patrick Steinhardt <[email protected]> writes:

> On Thu, Aug 01, 2024 at 09:41:03AM -0700, Junio C Hamano wrote:
>> "John Cai via GitGitGadget" <[email protected]> writes:
>> > @@ -886,6 +889,9 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
>> >  		iter->base.refname = iter->iter0->refname;
>> >  		iter->base.oid = iter->iter0->oid;
>> >  		iter->base.flags = iter->iter0->flags;
>> > +		if (iter->iter0->flags & REF_ISSYMREF)
>> > +			iter->base.referent = iter->iter0->referent;
>> 
>> Presumably base.referent is initialized to NULL so this "if"
>> statement does not need an else clause?
>
> This function typically ends up being called in a loop though. So
> without the else clause, wouldn't we potentially leak the value of a
> preceding ref into subsequent iterations like this?

OK, so this does need to clear it when we tell the caller we have
non SYMREF, as we do want to show NULL as base.referent to the
caller in such a case.  Thanks.

It does reinforce my larger point, which was:

>> Makes me wonder if we should follow the same "ignore what the flag
>> says when filling the .referent member; if the ref is not a symref,
>> the referent variable is NULL, and if it is, referent is never NULL"
>> pattern?  Then ref->u.value.referent is _always_ defined---the
>> current code says "the u.value.referent member is undefined for ref
>> that is not a symref", but with the suggested change, it will be
>> "the u.value.referent member is NULL for ref that is not a symref,
>> and for a symref, it is the value of the symref".
>
> Yeah, I think that would be preferable indeed.

In other words, with .referent member introduced, checking for
(.flags & REF_ISSYMREF) becomes a redundant&duplicated bit of
information, as the bit should exactly match the non-NULL ness of
the .referent member.

Thanks.

@@ -2783,7 +2783,7 @@ static int filter_ref_kind(struct ref_filter *filter, const char *refname)
return ref_kind_from_refname(refname);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"John Cai via GitGitGadget" <[email protected]> writes:

> @@ -2852,6 +2852,8 @@ static struct ref_array_item *apply_ref_filter(const char *refname, const struct
>  	ref->commit = commit;
>  	ref->flag = flag;
>  	ref->kind = kind;
> +	if (flag & REF_ISSYMREF)
> +		ref->symref = xstrdup_or_null(referent);

The same reaction as [1/3].  Doesn't the null-ness of referent
convey the same information as the ISSYMREF bit in flag?  IOW,
can't we do this assignment unconditionally?

@@ -245,9 +245,11 @@ static void loose_fill_ref_dir_regular_file(struct files_ref_store *refs,
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"John Cai via GitGitGadget" <[email protected]> writes:

> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index aa52d9be7c7..5ed69c23f74 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -245,9 +245,11 @@ static void loose_fill_ref_dir_regular_file(struct files_ref_store *refs,
>  {
>  	struct object_id oid;
>  	int flag;
> -
> -	if (!refs_resolve_ref_unsafe(&refs->base, refname, RESOLVE_REF_READING,

Here, we had a nice blank line that separated the decls and the
first statement.

> -				     &oid, &flag)) {
> +	const char *referent = refs_resolve_ref_unsafe(&refs->base,
> +						       refname,
> +						       RESOLVE_REF_READING,
> +						       &oid, &flag);
> +	if (!referent) {

We lost it here, though.

>  		oidclr(&oid, the_repository->hash_algo);
>  		flag |= REF_ISBROKEN;
>  	} else if (is_null_oid(&oid)) {
> @@ -268,7 +270,11 @@ static void loose_fill_ref_dir_regular_file(struct files_ref_store *refs,
>  		oidclr(&oid, the_repository->hash_algo);
>  		flag |= REF_BAD_NAME | REF_ISBROKEN;
>  	}
> -	add_entry_to_dir(dir, create_ref_entry(refname, &oid, flag));
> +
> +	if (!(flag & REF_ISSYMREF))
> +		referent = NULL;

OK, this is new in this round.  The idea is that everybody else can
rely on the invariant that the referent being NULL is equivalent to
REF_ISSYMREF bit in flag word being off from here on.

> +	add_entry_to_dir(dir, create_ref_entry(refname, referent, &oid, flag));
>  }
>  
>  /*
> @@ -886,6 +892,11 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
>  		iter->base.refname = iter->iter0->refname;
>  		iter->base.oid = iter->iter0->oid;
>  		iter->base.flags = iter->iter0->flags;
> +		if (iter->iter0->flags & REF_ISSYMREF)
> +			iter->base.referent = iter->iter0->referent;
> +		else
> +			iter->base.referent = NULL;
>  		return ITER_OK;
>  	}

Hmph, why not an unconditional

		iter->base.referent = iter->iter0->referent;

instead?  This code is making sure (iter->base.flags & REF_ISSYMREF)
is directly linked to non-NULL-ness or iter->base.referent, and we
want to make everybody take it as invariant.  Shouldn't this code
also rely on the same invariant?  If iter-iter0->referent is NULL,
iter->iter0->flag has REF_ISSYMREF bit off, and vice versa.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, John Cai wrote (reply to this):

Hi Junio,

On 7 Aug 2024, at 17:40, Junio C Hamano wrote:

> "John Cai via GitGitGadget" <[email protected]> writes:
>
>> diff --git a/refs/files-backend.c b/refs/files-backend.c
>> index aa52d9be7c7..5ed69c23f74 100644
>> --- a/refs/files-backend.c
>> +++ b/refs/files-backend.c
>> @@ -245,9 +245,11 @@ static void loose_fill_ref_dir_regular_file(struct files_ref_store *refs,
>>  {
>>  	struct object_id oid;
>>  	int flag;
>> -
>> -	if (!refs_resolve_ref_unsafe(&refs->base, refname, RESOLVE_REF_READING,
>
> Here, we had a nice blank line that separated the decls and the
> first statement.
>
>> -				     &oid, &flag)) {
>> +	const char *referent = refs_resolve_ref_unsafe(&refs->base,
>> +						       refname,
>> +						       RESOLVE_REF_READING,
>> +						       &oid, &flag);
>> +	if (!referent) {
>
> We lost it here, though.

Ah will restore the blank line.

>
>>  		oidclr(&oid, the_repository->hash_algo);
>>  		flag |= REF_ISBROKEN;
>>  	} else if (is_null_oid(&oid)) {
>> @@ -268,7 +270,11 @@ static void loose_fill_ref_dir_regular_file(struct files_ref_store *refs,
>>  		oidclr(&oid, the_repository->hash_algo);
>>  		flag |= REF_BAD_NAME | REF_ISBROKEN;
>>  	}
>> -	add_entry_to_dir(dir, create_ref_entry(refname, &oid, flag));
>> +
>> +	if (!(flag & REF_ISSYMREF))
>> +		referent = NULL;
>
> OK, this is new in this round.  The idea is that everybody else can
> rely on the invariant that the referent being NULL is equivalent to
> REF_ISSYMREF bit in flag word being off from here on.
>
>> +	add_entry_to_dir(dir, create_ref_entry(refname, referent, &oid, flag));
>>  }
>>
>>  /*
>> @@ -886,6 +892,11 @@ static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
>>  		iter->base.refname = iter->iter0->refname;
>>  		iter->base.oid = iter->iter0->oid;
>>  		iter->base.flags = iter->iter0->flags;
>> +		if (iter->iter0->flags & REF_ISSYMREF)
>> +			iter->base.referent = iter->iter0->referent;
>> +		else
>> +			iter->base.referent = NULL;
>>  		return ITER_OK;
>>  	}
>
> Hmph, why not an unconditional
>
> 		iter->base.referent = iter->iter0->referent;
>
> instead?  This code is making sure (iter->base.flags & REF_ISSYMREF)
> is directly linked to non-NULL-ness or iter->base.referent, and we
> want to make everybody take it as invariant.  Shouldn't this code
> also rely on the same invariant?  If iter-iter0->referent is NULL,
> iter->iter0->flag has REF_ISSYMREF bit off, and vice versa.

That's a good point. Even though this code will be used in a loop, if we just
set it every time unconditionally then we won't end up with leftover values. Wil
l adjust in the next version.

Copy link

This patch series was integrated into seen via 1d3d5ed.

Copy link

This branch is now known as jc/refs-symref-referent.

Copy link

This patch series was integrated into seen via 1c7ed05.

Copy link

There was a status update in the "Cooking" section about the branch jc/refs-symref-referent on the Git mailing list:

The refs API has been taught to give symref target information to
the users of ref iterators, allowing for-each-ref and friends to
avoid an extra ref_resolve_* API call per a symbolic ref.

Waiting for review response.
source: <[email protected]>

@john-cai john-cai force-pushed the jc/symbolic-ref-in-iterator branch 2 times, most recently from 4617dc2 to d919007 Compare August 9, 2024 14:49
Since ref iterators do not hold onto the direct value of a reference
without resolving it, the only way to get ahold of a direct value of a
symbolic ref is to make a separate call to refs_read_symbolic_ref.

To make accessing the direct value of a symbolic ref more efficient,
let's save the direct value of the ref in the iterators for both the
files backend and the reftable backend.

Signed-off-by: John Cai <[email protected]>
Add a parameter to each_ref_fn so that callers to the ref APIs
that use this function as a callback can have acess to the
unresolved value of a symbolic ref.

Signed-off-by: John Cai <[email protected]>
With a previous commit, the reference the symbolic ref points to is saved
in the ref iterator records. Instead of making a separate call to
resolve_refdup() each time, we can just populate the ref_array_item with
the value from the iterator.

Signed-off-by: John Cai <[email protected]>
@john-cai john-cai force-pushed the jc/symbolic-ref-in-iterator branch from d919007 to 9f609f4 Compare August 9, 2024 15:31
@john-cai
Copy link
Contributor Author

john-cai commented Aug 9, 2024

/submit

Copy link

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-git-1712/john-cai/jc/symbolic-ref-in-iterator-v4

To fetch this version to local tag pr-git-1712/john-cai/jc/symbolic-ref-in-iterator-v4:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-git-1712/john-cai/jc/symbolic-ref-in-iterator-v4

Copy link

On the Git mailing list, Junio C Hamano wrote (reply to this):

"John Cai via GitGitGadget" <[email protected]> writes:

> John Cai (3):
>   refs: keep track of unresolved reference value in iterators
>   refs: add referent to each_ref_fn
>   ref-filter: populate symref from iterator

Looks good.  Let me mark the topic for 'next'.

Copy link

This patch series was integrated into seen via 2678c13.

Copy link

This patch series was integrated into next via 3183f3d.

@gitgitgadget-git gitgitgadget-git bot added the next label Aug 9, 2024
Copy link

This patch series was integrated into seen via c715a26.

Copy link

There was a status update in the "Cooking" section about the branch jc/refs-symref-referent on the Git mailing list:

The refs API has been taught to give symref target information to
the users of ref iterators, allowing for-each-ref and friends to
avoid an extra ref_resolve_* API call per a symbolic ref.

Will merge to 'master'.
source: <[email protected]>

Copy link

There was a status update in the "Cooking" section about the branch jc/refs-symref-referent on the Git mailing list:

The refs API has been taught to give symref target information to
the users of ref iterators, allowing for-each-ref and friends to
avoid an extra ref_resolve_* API call per a symbolic ref.

Will merge to 'master'.
source: <[email protected]>

Copy link

There was a status update in the "Graduated to 'master'" section about the branch jc/refs-symref-referent on the Git mailing list:

The refs API has been taught to give symref target information to
the users of ref iterators, allowing for-each-ref and friends to
avoid an extra ref_resolve_* API call per a symbolic ref.
source: <[email protected]>

Copy link

This patch series was integrated into seen via e7f86cb.

Copy link

This patch series was integrated into master via e7f86cb.

Copy link

This patch series was integrated into next via e7f86cb.

Copy link

Closed via e7f86cb.

@@ -245,9 +245,12 @@ static void loose_fill_ref_dir_regular_file(struct files_ref_store *refs,
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, shejialuo wrote (reply to this):

On Fri, Aug 09, 2024 at 03:37:49PM +0000, John Cai via GitGitGadget wrote:

[snip]

> @@ -66,6 +69,7 @@ static void free_ref_entry(struct ref_entry *entry)
>  		 */
>  		clear_ref_dir(&entry->u.subdir);
>  	}
> +	free(entry->u.value.referent);
>  	free(entry);
>  }
>  

Today, I am learning the source code of the "ref-cache.[ch]". I feel
rather confused here. And I think this usage is wrong.

"free_ref_entry" will do the following things:

1. If "entry" is a directory, it will call "clear_ref_dir" which will
call "free_ref_entry" for every loose ref.
2. If "entry" is a loose ref, it will call `free(entry->u.value.referent)`
and `free(entry)`.

The problem is if "entry" is a directory, we will also execute the
following statement:

    free(entry->u.value.referent);

This does not make sense. We should never access the "entry->u.value" if
"entry" is a directory. So, I think the correct usage should be:

    if (entry->flag & REF_DIR) {
        ...
        clear_ref_dir(...);
    } else {
        free(entry->u.value.referent);
    }

Thanks,
Jialuo

This comment was marked as spam.

Copy link

User shejialuo <[email protected]> has been added to the cc: list.

@MXzander

This comment was marked as spam.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants