Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PATH WALK IV: Add 'git survey' command #1821

Open
wants to merge 16 commits into
base: api-upstream
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,7 @@
/git-submodule
/git-submodule--helper
/git-subtree
/git-survey
/git-svn
/git-switch
/git-symbolic-ref
Expand Down
2 changes: 2 additions & 0 deletions Documentation/config.txt
Original file line number Diff line number Diff line change
Expand Up @@ -534,6 +534,8 @@ include::config/status.txt[]

include::config/submodule.txt[]

include::config/survey.txt[]

include::config/tag.txt[]

include::config/tar.txt[]
Expand Down
14 changes: 14 additions & 0 deletions Documentation/config/survey.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
survey.*::
These variables adjust the default behavior of the `git survey`
command. The intention is that this command could be run in the
background with these options.
+
--
verbose::
This boolean value implies the `--[no-]verbose` option.
progress::
This boolean value implies the `--[no-]progress` option.
top::
This integer value implies `--top=<N>`, specifying the
number of entries in the detail tables.
--
83 changes: 83 additions & 0 deletions Documentation/git-survey.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
git-survey(1)
=============

NAME
----
git-survey - EXPERIMENTAL: Measure various repository dimensions of scale

SYNOPSIS
--------
[verse]
(EXPERIMENTAL!) `git survey` <options>

DESCRIPTION
-----------

Survey the repository and measure various dimensions of scale.

As repositories grow to "monorepo" size, certain data shapes can cause
performance problems. `git-survey` attempts to measure and report on
known problem areas.

Ref Selection and Reachable Objects
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In this first analysis phase, `git survey` will iterate over the set of
requested branches, tags, and other refs and treewalk over all of the
reachable commits, trees, and blobs and generate various statistics.

OPTIONS
-------

--progress::
Show progress. This is automatically enabled when interactive.

Ref Selection
~~~~~~~~~~~~~

The following options control the set of refs that `git survey` will examine.
By default, `git survey` will look at tags, local branches, and remote refs.
If any of the following options are given, the default set is cleared and
only refs for the given options are added.

--all-refs::
Use all refs. This includes local branches, tags, remote refs,
notes, and stashes. This option overrides all of the following.

--branches::
Add local branches (`refs/heads/`) to the set.

--tags::
Add tags (`refs/tags/`) to the set.

--remotes::
Add remote branches (`refs/remote/`) to the set.

--detached::
Add HEAD to the set.

--other::
Add notes (`refs/notes/`) and stashes (`refs/stash/`) to the set.

OUTPUT
------

By default, `git survey` will print information about the repository in a
human-readable format that includes overviews and tables.

References Summary
~~~~~~~~~~~~~~~~~~

The references summary includes a count of each kind of reference,
including branches, remote refs, and tags (split by "all" and
"annotated").

Reachable Object Summary
~~~~~~~~~~~~~~~~~~~~~~~~

The reachable object summary shows the total number of each kind of Git
object, including tags, commits, trees, and blobs.

GIT
---
Part of the linkgit:git[1] suite
64 changes: 64 additions & 0 deletions Documentation/technical/api-path-walk.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
Path-Walk API
=============

The path-walk API is used to walk reachable objects, but to visit objects
in batches based on a common path they appear in, or by type.

For example, all reachable commits are visited in a group. All tags are
visited in a group. Then, all root trees are visited. At some point, all
blobs reachable via a path `my/dir/to/A` are visited. When there are
multiple paths possible to reach the same object, then only one of those
paths is used to visit the object.

Basics
------

To use the path-walk API, include `path-walk.h` and call
`walk_objects_by_path()` with a customized `path_walk_info` struct. The
struct is used to set all of the options for how the walk should proceed.
Let's dig into the different options and their use.

`path_fn` and `path_fn_data`::
The most important option is the `path_fn` option, which is a
function pointer to the callback that can execute logic on the
object IDs for objects grouped by type and path. This function
also receives a `data` value that corresponds to the
`path_fn_data` member, for providing custom data structures to
this callback function.

`revs`::
To configure the exact details of the reachable set of objects,
use the `revs` member and initialize it using the revision
machinery in `revision.h`. Initialize `revs` using calls such as
`setup_revisions()` or `parse_revision_opt()`. Do not call
`prepare_revision_walk()`, as that will be called within
`walk_objects_by_path()`.
+
It is also important that you do not specify the `--objects` flag for the
`revs` struct. The revision walk should only be used to walk commits, and
the objects will be walked in a separate way based on those starting
commits.

`commits`, `blobs`, `trees`, `tags`::
By default, these members are enabled and signal that the path-walk
API should call the `path_fn` on objects of these types. Specialized
applications could disable some options to make it simpler to walk
the objects or to have fewer calls to `path_fn`.
+
While it is possible to walk only commits in this way, consumers would be
better off using the revision walk API instead.

`prune_all_uninteresting`::
By default, all reachable paths are emitted by the path-walk API.
This option allows consumers to declare that they are not
interested in paths where all included objects are marked with the
`UNINTERESTING` flag. This requires using the `boundary` option in
the revision walk so that the walk emits commits marked with the
`UNINTERESTING` flag.

Examples
--------

See example usages in:
`t/helper/test-path-walk.c`,
`builtin/survey.c`
3 changes: 3 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -818,6 +818,7 @@ TEST_BUILTINS_OBJS += test-parse-options.o
TEST_BUILTINS_OBJS += test-parse-pathspec-file.o
TEST_BUILTINS_OBJS += test-partial-clone.o
TEST_BUILTINS_OBJS += test-path-utils.o
TEST_BUILTINS_OBJS += test-path-walk.o
TEST_BUILTINS_OBJS += test-pcre2-config.o
TEST_BUILTINS_OBJS += test-pkt-line.o
TEST_BUILTINS_OBJS += test-proc-receive.o
Expand Down Expand Up @@ -1094,6 +1095,7 @@ LIB_OBJS += parse-options.o
LIB_OBJS += patch-delta.o
LIB_OBJS += patch-ids.o
LIB_OBJS += path.o
LIB_OBJS += path-walk.o
LIB_OBJS += pathspec.o
LIB_OBJS += pkt-line.o
LIB_OBJS += preload-index.o
Expand Down Expand Up @@ -1305,6 +1307,7 @@ BUILTIN_OBJS += builtin/sparse-checkout.o
BUILTIN_OBJS += builtin/stash.o
BUILTIN_OBJS += builtin/stripspace.o
BUILTIN_OBJS += builtin/submodule--helper.o
BUILTIN_OBJS += builtin/survey.o
BUILTIN_OBJS += builtin/symbolic-ref.o
BUILTIN_OBJS += builtin/tag.o
BUILTIN_OBJS += builtin/unpack-file.o
Expand Down
1 change: 1 addition & 0 deletions builtin.h
Original file line number Diff line number Diff line change
Expand Up @@ -231,6 +231,7 @@ int cmd_status(int argc, const char **argv, const char *prefix, struct repositor
int cmd_stash(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_stripspace(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_submodule__helper(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_survey(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_switch(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_symbolic_ref(int argc, const char **argv, const char *prefix, struct repository *repo);
int cmd_tag(int argc, const char **argv, const char *prefix, struct repository *repo);
Expand Down
Loading