diff --git a/.gitignore b/.gitignore index 6687bd6db4c0a6..8a775af2635fe7 100644 --- a/.gitignore +++ b/.gitignore @@ -165,6 +165,7 @@ /git-submodule /git-submodule--helper /git-subtree +/git-survey /git-svn /git-switch /git-symbolic-ref diff --git a/Documentation/config.txt b/Documentation/config.txt index 8c0b3ed8075214..4ee8b693022ef7 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -534,6 +534,8 @@ include::config/status.txt[] include::config/submodule.txt[] +include::config/survey.txt[] + include::config/tag.txt[] include::config/tar.txt[] diff --git a/Documentation/config/survey.txt b/Documentation/config/survey.txt new file mode 100644 index 00000000000000..9e594a2092f225 --- /dev/null +++ b/Documentation/config/survey.txt @@ -0,0 +1,14 @@ +survey.*:: + These variables adjust the default behavior of the `git survey` + command. The intention is that this command could be run in the + background with these options. ++ +-- + verbose:: + This boolean value implies the `--[no-]verbose` option. + progress:: + This boolean value implies the `--[no-]progress` option. + top:: + This integer value implies `--top=`, specifying the + number of entries in the detail tables. +-- diff --git a/Documentation/git-survey.txt b/Documentation/git-survey.txt new file mode 100644 index 00000000000000..894c7be3053eb9 --- /dev/null +++ b/Documentation/git-survey.txt @@ -0,0 +1,83 @@ +git-survey(1) +============= + +NAME +---- +git-survey - EXPERIMENTAL: Measure various repository dimensions of scale + +SYNOPSIS +-------- +[verse] +(EXPERIMENTAL!) `git survey` + +DESCRIPTION +----------- + +Survey the repository and measure various dimensions of scale. + +As repositories grow to "monorepo" size, certain data shapes can cause +performance problems. `git-survey` attempts to measure and report on +known problem areas. + +Ref Selection and Reachable Objects +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +In this first analysis phase, `git survey` will iterate over the set of +requested branches, tags, and other refs and treewalk over all of the +reachable commits, trees, and blobs and generate various statistics. + +OPTIONS +------- + +--progress:: + Show progress. This is automatically enabled when interactive. + +Ref Selection +~~~~~~~~~~~~~ + +The following options control the set of refs that `git survey` will examine. +By default, `git survey` will look at tags, local branches, and remote refs. +If any of the following options are given, the default set is cleared and +only refs for the given options are added. + +--all-refs:: + Use all refs. This includes local branches, tags, remote refs, + notes, and stashes. This option overrides all of the following. + +--branches:: + Add local branches (`refs/heads/`) to the set. + +--tags:: + Add tags (`refs/tags/`) to the set. + +--remotes:: + Add remote branches (`refs/remote/`) to the set. + +--detached:: + Add HEAD to the set. + +--other:: + Add notes (`refs/notes/`) and stashes (`refs/stash/`) to the set. + +OUTPUT +------ + +By default, `git survey` will print information about the repository in a +human-readable format that includes overviews and tables. + +References Summary +~~~~~~~~~~~~~~~~~~ + +The references summary includes a count of each kind of reference, +including branches, remote refs, and tags (split by "all" and +"annotated"). + +Reachable Object Summary +~~~~~~~~~~~~~~~~~~~~~~~~ + +The reachable object summary shows the total number of each kind of Git +object, including tags, commits, trees, and blobs. + +GIT +--- +Part of the linkgit:git[1] suite diff --git a/Documentation/technical/api-path-walk.txt b/Documentation/technical/api-path-walk.txt new file mode 100644 index 00000000000000..21011048d7c7f6 --- /dev/null +++ b/Documentation/technical/api-path-walk.txt @@ -0,0 +1,64 @@ +Path-Walk API +============= + +The path-walk API is used to walk reachable objects, but to visit objects +in batches based on a common path they appear in, or by type. + +For example, all reachable commits are visited in a group. All tags are +visited in a group. Then, all root trees are visited. At some point, all +blobs reachable via a path `my/dir/to/A` are visited. When there are +multiple paths possible to reach the same object, then only one of those +paths is used to visit the object. + +Basics +------ + +To use the path-walk API, include `path-walk.h` and call +`walk_objects_by_path()` with a customized `path_walk_info` struct. The +struct is used to set all of the options for how the walk should proceed. +Let's dig into the different options and their use. + +`path_fn` and `path_fn_data`:: + The most important option is the `path_fn` option, which is a + function pointer to the callback that can execute logic on the + object IDs for objects grouped by type and path. This function + also receives a `data` value that corresponds to the + `path_fn_data` member, for providing custom data structures to + this callback function. + +`revs`:: + To configure the exact details of the reachable set of objects, + use the `revs` member and initialize it using the revision + machinery in `revision.h`. Initialize `revs` using calls such as + `setup_revisions()` or `parse_revision_opt()`. Do not call + `prepare_revision_walk()`, as that will be called within + `walk_objects_by_path()`. ++ +It is also important that you do not specify the `--objects` flag for the +`revs` struct. The revision walk should only be used to walk commits, and +the objects will be walked in a separate way based on those starting +commits. + +`commits`, `blobs`, `trees`, `tags`:: + By default, these members are enabled and signal that the path-walk + API should call the `path_fn` on objects of these types. Specialized + applications could disable some options to make it simpler to walk + the objects or to have fewer calls to `path_fn`. ++ +While it is possible to walk only commits in this way, consumers would be +better off using the revision walk API instead. + +`prune_all_uninteresting`:: + By default, all reachable paths are emitted by the path-walk API. + This option allows consumers to declare that they are not + interested in paths where all included objects are marked with the + `UNINTERESTING` flag. This requires using the `boundary` option in + the revision walk so that the walk emits commits marked with the + `UNINTERESTING` flag. + +Examples +-------- + +See example usages in: + `t/helper/test-path-walk.c`, + `builtin/survey.c` diff --git a/Makefile b/Makefile index 7344a7f7257af6..176d9aa14ebf6c 100644 --- a/Makefile +++ b/Makefile @@ -818,6 +818,7 @@ TEST_BUILTINS_OBJS += test-parse-options.o TEST_BUILTINS_OBJS += test-parse-pathspec-file.o TEST_BUILTINS_OBJS += test-partial-clone.o TEST_BUILTINS_OBJS += test-path-utils.o +TEST_BUILTINS_OBJS += test-path-walk.o TEST_BUILTINS_OBJS += test-pcre2-config.o TEST_BUILTINS_OBJS += test-pkt-line.o TEST_BUILTINS_OBJS += test-proc-receive.o @@ -1094,6 +1095,7 @@ LIB_OBJS += parse-options.o LIB_OBJS += patch-delta.o LIB_OBJS += patch-ids.o LIB_OBJS += path.o +LIB_OBJS += path-walk.o LIB_OBJS += pathspec.o LIB_OBJS += pkt-line.o LIB_OBJS += preload-index.o @@ -1305,6 +1307,7 @@ BUILTIN_OBJS += builtin/sparse-checkout.o BUILTIN_OBJS += builtin/stash.o BUILTIN_OBJS += builtin/stripspace.o BUILTIN_OBJS += builtin/submodule--helper.o +BUILTIN_OBJS += builtin/survey.o BUILTIN_OBJS += builtin/symbolic-ref.o BUILTIN_OBJS += builtin/tag.o BUILTIN_OBJS += builtin/unpack-file.o diff --git a/builtin.h b/builtin.h index f7b166b33484d3..0739d969ad26b5 100644 --- a/builtin.h +++ b/builtin.h @@ -231,6 +231,7 @@ int cmd_status(int argc, const char **argv, const char *prefix, struct repositor int cmd_stash(int argc, const char **argv, const char *prefix, struct repository *repo); int cmd_stripspace(int argc, const char **argv, const char *prefix, struct repository *repo); int cmd_submodule__helper(int argc, const char **argv, const char *prefix, struct repository *repo); +int cmd_survey(int argc, const char **argv, const char *prefix, struct repository *repo); int cmd_switch(int argc, const char **argv, const char *prefix, struct repository *repo); int cmd_symbolic_ref(int argc, const char **argv, const char *prefix, struct repository *repo); int cmd_tag(int argc, const char **argv, const char *prefix, struct repository *repo); diff --git a/builtin/survey.c b/builtin/survey.c new file mode 100644 index 00000000000000..eee0f8d7abb65b --- /dev/null +++ b/builtin/survey.c @@ -0,0 +1,935 @@ +/* + * This macro is necessary to access 'default_abbrev' + * within REV_INFO_INIT. + */ +#define USE_THE_REPOSITORY_VARIABLE + +#include "builtin.h" +#include "config.h" +#include "environment.h" +#include "hex.h" +#include "object.h" +#include "object-name.h" +#include "object-store-ll.h" +#include "parse-options.h" +#include "path-walk.h" +#include "progress.h" +#include "ref-filter.h" +#include "refs.h" +#include "revision.h" +#include "strbuf.h" +#include "strvec.h" +#include "tag.h" +#include "trace2.h" +#include "color.h" + +static const char * const survey_usage[] = { + N_("(EXPERIMENTAL!) git survey "), + NULL, +}; + +struct survey_refs_wanted { + int want_all_refs; /* special override */ + + int want_branches; + int want_tags; + int want_remotes; + int want_detached; + int want_other; /* see FILTER_REFS_OTHERS -- refs/notes/, refs/stash/ */ +}; + +static struct survey_refs_wanted default_ref_options = { + .want_all_refs = 1, +}; + +struct survey_opts { + int verbose; + int show_progress; + int top_nr; + struct survey_refs_wanted refs; +}; + +struct survey_report_ref_summary { + size_t refs_nr; + size_t branches_nr; + size_t remote_refs_nr; + size_t tags_nr; + size_t tags_annotated_nr; + size_t others_nr; + size_t unknown_nr; +}; + +struct survey_report_object_summary { + size_t commits_nr; + size_t tags_nr; + size_t trees_nr; + size_t blobs_nr; +}; + +/** + * For some category given by 'label', count the number of objects + * that match that label along with the on-disk size and the size + * after decompressing (both with delta bases and zlib). + */ +struct survey_report_object_size_summary { + char *label; + size_t nr; + size_t disk_size; + size_t inflated_size; + size_t num_missing; +}; + +typedef int (*survey_top_cmp)(void *v1, void *v2); + +static int cmp_by_nr(void *v1, void *v2) +{ + struct survey_report_object_size_summary *s1 = v1; + struct survey_report_object_size_summary *s2 = v2; + + if (s1->nr < s2->nr) + return -1; + if (s1->nr > s2->nr) + return 1; + return 0; +} + +static int cmp_by_disk_size(void *v1, void *v2) +{ + struct survey_report_object_size_summary *s1 = v1; + struct survey_report_object_size_summary *s2 = v2; + + if (s1->disk_size < s2->disk_size) + return -1; + if (s1->disk_size > s2->disk_size) + return 1; + return 0; +} + +static int cmp_by_inflated_size(void *v1, void *v2) +{ + struct survey_report_object_size_summary *s1 = v1; + struct survey_report_object_size_summary *s2 = v2; + + if (s1->inflated_size < s2->inflated_size) + return -1; + if (s1->inflated_size > s2->inflated_size) + return 1; + return 0; +} + +/** + * Store a list of "top" categories by some sorting function. When + * inserting a new category, reorder the list and free the one that + * got ejected (if any). + */ +struct survey_report_top_table { + const char *name; + survey_top_cmp cmp_fn; + size_t nr; + size_t alloc; + + /** + * 'data' stores an array of structs and must be cast into + * the proper array type before evaluating an index. + */ + void *data; +}; + +static void init_top_sizes(struct survey_report_top_table *top, + size_t limit, const char *name, + survey_top_cmp cmp) +{ + struct survey_report_object_size_summary *sz_array; + + top->name = name; + top->cmp_fn = cmp; + top->alloc = limit; + top->nr = 0; + + CALLOC_ARRAY(sz_array, limit); + top->data = sz_array; +} + +MAYBE_UNUSED +static void clear_top_sizes(struct survey_report_top_table *top) +{ + struct survey_report_object_size_summary *sz_array = top->data; + + for (size_t i = 0; i < top->nr; i++) + free(sz_array[i].label); + free(sz_array); +} + +static void maybe_insert_into_top_size(struct survey_report_top_table *top, + struct survey_report_object_size_summary *summary) +{ + struct survey_report_object_size_summary *sz_array = top->data; + size_t pos = top->nr; + + /* Compare against list from the bottom. */ + while (pos > 0 && top->cmp_fn(&sz_array[pos - 1], summary) < 0) + pos--; + + /* Not big enough! */ + if (pos >= top->alloc) + return; + + /* We need to shift the data. */ + if (top->nr == top->alloc) + free(sz_array[top->nr - 1].label); + else + top->nr++; + + for (size_t i = top->nr - 1; i > pos; i--) + memcpy(&sz_array[i], &sz_array[i - 1], sizeof(*sz_array)); + + memcpy(&sz_array[pos], summary, sizeof(*summary)); + sz_array[pos].label = xstrdup(summary->label); +} + +/** + * This struct contains all of the information that needs to be printed + * at the end of the exploration of the repository and its references. + */ +struct survey_report { + struct survey_report_ref_summary refs; + struct survey_report_object_summary reachable_objects; + + struct survey_report_object_size_summary *by_type; + + struct survey_report_top_table *top_paths_by_count; + struct survey_report_top_table *top_paths_by_disk; + struct survey_report_top_table *top_paths_by_inflate; +}; + +#define REPORT_TYPE_COMMIT 0 +#define REPORT_TYPE_TREE 1 +#define REPORT_TYPE_BLOB 2 +#define REPORT_TYPE_TAG 3 +#define REPORT_TYPE_COUNT 4 + +struct survey_context { + struct repository *repo; + + /* Options that control what is done. */ + struct survey_opts opts; + + /* Info for output only. */ + struct survey_report report; + + /* + * The rest of the members are about enabling the activity + * of the 'git survey' command, including ref listings, object + * pointers, and progress. + */ + + struct progress *progress; + size_t progress_nr; + size_t progress_total; + + struct strvec refs; + struct ref_array ref_array; +}; + +static void clear_survey_context(struct survey_context *ctx) +{ + ref_array_clear(&ctx->ref_array); + strvec_clear(&ctx->refs); +} + +struct survey_table { + const char *table_name; + struct strvec header; + struct strvec *rows; + size_t rows_nr; + size_t rows_alloc; +}; + +#define SURVEY_TABLE_INIT { \ + .header = STRVEC_INIT, \ +} + +static void clear_table(struct survey_table *table) +{ + strvec_clear(&table->header); + for (size_t i = 0; i < table->rows_nr; i++) + strvec_clear(&table->rows[i]); + free(table->rows); +} + +static void insert_table_rowv(struct survey_table *table, ...) +{ + va_list ap; + char *arg; + ALLOC_GROW(table->rows, table->rows_nr + 1, table->rows_alloc); + + memset(&table->rows[table->rows_nr], 0, sizeof(struct strvec)); + + va_start(ap, table); + while ((arg = va_arg(ap, char *))) + strvec_push(&table->rows[table->rows_nr], arg); + va_end(ap); + + table->rows_nr++; +} + +#define SECTION_SEGMENT "========================================" +#define SECTION_SEGMENT_LEN 40 +static const char *section_line = SECTION_SEGMENT + SECTION_SEGMENT + SECTION_SEGMENT + SECTION_SEGMENT; +static const size_t section_len = 4 * SECTION_SEGMENT_LEN; + +static void print_table_title(const char *name, size_t *widths, size_t nr) +{ + size_t width = 3 * (nr - 1); + size_t min_width = strlen(name); + + for (size_t i = 0; i < nr; i++) + width += widths[i]; + + if (width < min_width) + width = min_width; + + if (width > section_len) + width = section_len; + + printf("\n%s\n%.*s\n", name, (int)width, section_line); +} + +static void print_row_plaintext(struct strvec *row, size_t *widths) +{ + static struct strbuf line = STRBUF_INIT; + strbuf_setlen(&line, 0); + + for (size_t i = 0; i < row->nr; i++) { + const char *str = row->v[i]; + size_t len = strlen(str); + if (i) + strbuf_add(&line, " | ", 3); + strbuf_addchars(&line, ' ', widths[i] - len); + strbuf_add(&line, str, len); + } + printf("%s\n", line.buf); +} + +static void print_divider_plaintext(size_t *widths, size_t nr) +{ + static struct strbuf line = STRBUF_INIT; + strbuf_setlen(&line, 0); + + for (size_t i = 0; i < nr; i++) { + if (i) + strbuf_add(&line, "-+-", 3); + strbuf_addchars(&line, '-', widths[i]); + } + printf("%s\n", line.buf); +} + +static void print_table_plaintext(struct survey_table *table) +{ + size_t *column_widths; + size_t columns_nr = table->header.nr; + CALLOC_ARRAY(column_widths, columns_nr); + + for (size_t i = 0; i < columns_nr; i++) { + column_widths[i] = strlen(table->header.v[i]); + + for (size_t j = 0; j < table->rows_nr; j++) { + size_t rowlen = strlen(table->rows[j].v[i]); + if (column_widths[i] < rowlen) + column_widths[i] = rowlen; + } + } + + print_table_title(table->table_name, column_widths, columns_nr); + print_row_plaintext(&table->header, column_widths); + print_divider_plaintext(column_widths, columns_nr); + + for (size_t j = 0; j < table->rows_nr; j++) + print_row_plaintext(&table->rows[j], column_widths); + + free(column_widths); +} + +static void survey_report_plaintext_refs(struct survey_context *ctx) +{ + struct survey_report_ref_summary *refs = &ctx->report.refs; + struct survey_table table = SURVEY_TABLE_INIT; + + table.table_name = _("REFERENCES SUMMARY"); + + strvec_push(&table.header, _("Ref Type")); + strvec_push(&table.header, _("Count")); + + if (ctx->opts.refs.want_all_refs || ctx->opts.refs.want_branches) { + char *fmt = xstrfmt("%"PRIuMAX"", (uintmax_t)refs->branches_nr); + insert_table_rowv(&table, _("Branches"), fmt, NULL); + free(fmt); + } + + if (ctx->opts.refs.want_all_refs || ctx->opts.refs.want_remotes) { + char *fmt = xstrfmt("%"PRIuMAX"", (uintmax_t)refs->remote_refs_nr); + insert_table_rowv(&table, _("Remote refs"), fmt, NULL); + free(fmt); + } + + if (ctx->opts.refs.want_all_refs || ctx->opts.refs.want_tags) { + char *fmt = xstrfmt("%"PRIuMAX"", (uintmax_t)refs->tags_nr); + insert_table_rowv(&table, _("Tags (all)"), fmt, NULL); + free(fmt); + fmt = xstrfmt("%"PRIuMAX"", (uintmax_t)refs->tags_annotated_nr); + insert_table_rowv(&table, _("Tags (annotated)"), fmt, NULL); + free(fmt); + } + + print_table_plaintext(&table); + clear_table(&table); +} + +static void survey_report_plaintext_reachable_object_summary(struct survey_context *ctx) +{ + struct survey_report_object_summary *objs = &ctx->report.reachable_objects; + struct survey_table table = SURVEY_TABLE_INIT; + char *fmt; + + table.table_name = _("REACHABLE OBJECT SUMMARY"); + + strvec_push(&table.header, _("Object Type")); + strvec_push(&table.header, _("Count")); + + fmt = xstrfmt("%"PRIuMAX"", (uintmax_t)objs->tags_nr); + insert_table_rowv(&table, _("Tags"), fmt, NULL); + free(fmt); + + fmt = xstrfmt("%"PRIuMAX"", (uintmax_t)objs->commits_nr); + insert_table_rowv(&table, _("Commits"), fmt, NULL); + free(fmt); + + fmt = xstrfmt("%"PRIuMAX"", (uintmax_t)objs->trees_nr); + insert_table_rowv(&table, _("Trees"), fmt, NULL); + free(fmt); + + fmt = xstrfmt("%"PRIuMAX"", (uintmax_t)objs->blobs_nr); + insert_table_rowv(&table, _("Blobs"), fmt, NULL); + free(fmt); + + print_table_plaintext(&table); + clear_table(&table); +} + +static void survey_report_object_sizes(const char *title, + const char *categories, + struct survey_report_object_size_summary *summary, + size_t summary_nr) +{ + struct survey_table table = SURVEY_TABLE_INIT; + table.table_name = title; + + strvec_push(&table.header, categories); + strvec_push(&table.header, _("Count")); + strvec_push(&table.header, _("Disk Size")); + strvec_push(&table.header, _("Inflated Size")); + + for (size_t i = 0; i < summary_nr; i++) { + char *label_str = xstrdup(summary[i].label); + char *nr_str = xstrfmt("%"PRIuMAX, (uintmax_t)summary[i].nr); + char *disk_str = xstrfmt("%"PRIuMAX, (uintmax_t)summary[i].disk_size); + char *inflate_str = xstrfmt("%"PRIuMAX, (uintmax_t)summary[i].inflated_size); + + insert_table_rowv(&table, label_str, nr_str, + disk_str, inflate_str, NULL); + + free(label_str); + free(nr_str); + free(disk_str); + free(inflate_str); + } + + print_table_plaintext(&table); + clear_table(&table); +} + +static void survey_report_plaintext_sorted_size( + struct survey_report_top_table *top) +{ + survey_report_object_sizes(top->name, _("Path"), + top->data, top->nr); +} + +static void survey_report_plaintext(struct survey_context *ctx) +{ + printf("GIT SURVEY for \"%s\"\n", ctx->repo->worktree); + printf("-----------------------------------------------------\n"); + survey_report_plaintext_refs(ctx); + survey_report_plaintext_reachable_object_summary(ctx); + survey_report_object_sizes(_("TOTAL OBJECT SIZES BY TYPE"), + _("Object Type"), + ctx->report.by_type, + REPORT_TYPE_COUNT); + + survey_report_plaintext_sorted_size( + &ctx->report.top_paths_by_count[REPORT_TYPE_TREE]); + survey_report_plaintext_sorted_size( + &ctx->report.top_paths_by_count[REPORT_TYPE_BLOB]); + + survey_report_plaintext_sorted_size( + &ctx->report.top_paths_by_disk[REPORT_TYPE_TREE]); + survey_report_plaintext_sorted_size( + &ctx->report.top_paths_by_disk[REPORT_TYPE_BLOB]); + + survey_report_plaintext_sorted_size( + &ctx->report.top_paths_by_inflate[REPORT_TYPE_TREE]); + survey_report_plaintext_sorted_size( + &ctx->report.top_paths_by_inflate[REPORT_TYPE_BLOB]); +} + +/* + * After parsing the command line arguments, figure out which refs we + * should scan. + * + * If ANY were given in positive sense, then we ONLY include them and + * do not use the builtin values. + */ +static void fixup_refs_wanted(struct survey_context *ctx) +{ + struct survey_refs_wanted *rw = &ctx->opts.refs; + + /* + * `--all-refs` overrides and enables everything. + */ + if (rw->want_all_refs == 1) { + rw->want_branches = 1; + rw->want_tags = 1; + rw->want_remotes = 1; + rw->want_detached = 1; + rw->want_other = 1; + return; + } + + /* + * If none of the `--` were given, we assume all + * of the builtin unspecified values. + */ + if (rw->want_branches == -1 && + rw->want_tags == -1 && + rw->want_remotes == -1 && + rw->want_detached == -1 && + rw->want_other == -1) { + *rw = default_ref_options; + return; + } + + /* + * Since we only allow positive boolean values on the command + * line, we will only have true values where they specified + * a `--`. + * + * So anything that still has an unspecified value should be + * set to false. + */ + if (rw->want_branches == -1) + rw->want_branches = 0; + if (rw->want_tags == -1) + rw->want_tags = 0; + if (rw->want_remotes == -1) + rw->want_remotes = 0; + if (rw->want_detached == -1) + rw->want_detached = 0; + if (rw->want_other == -1) + rw->want_other = 0; +} + +static int survey_load_config_cb(const char *var, const char *value, + const struct config_context *cctx, void *pvoid) +{ + struct survey_context *ctx = pvoid; + + if (!strcmp(var, "survey.verbose")) { + ctx->opts.verbose = git_config_bool(var, value); + return 0; + } + if (!strcmp(var, "survey.progress")) { + ctx->opts.show_progress = git_config_bool(var, value); + return 0; + } + if (!strcmp(var, "survey.top")) { + ctx->opts.top_nr = git_config_bool(var, value); + return 0; + } + + return git_default_config(var, value, cctx, pvoid); +} + +static void survey_load_config(struct survey_context *ctx) +{ + repo_config(ctx->repo, survey_load_config_cb, ctx); +} + +static void do_load_refs(struct survey_context *ctx, + struct ref_array *ref_array) +{ + struct ref_filter filter = REF_FILTER_INIT; + struct ref_sorting *sorting; + struct string_list sorting_options = STRING_LIST_INIT_DUP; + + string_list_append(&sorting_options, "objectname"); + sorting = ref_sorting_options(&sorting_options); + + if (ctx->opts.refs.want_detached) + strvec_push(&ctx->refs, "HEAD"); + + if (ctx->opts.refs.want_all_refs) { + strvec_push(&ctx->refs, "refs/"); + } else { + if (ctx->opts.refs.want_branches) + strvec_push(&ctx->refs, "refs/heads/"); + if (ctx->opts.refs.want_tags) + strvec_push(&ctx->refs, "refs/tags/"); + if (ctx->opts.refs.want_remotes) + strvec_push(&ctx->refs, "refs/remotes/"); + if (ctx->opts.refs.want_other) { + strvec_push(&ctx->refs, "refs/notes/"); + strvec_push(&ctx->refs, "refs/stash/"); + } + } + + filter.name_patterns = ctx->refs.v; + filter.ignore_case = 0; + filter.match_as_path = 1; + + if (ctx->opts.show_progress) { + ctx->progress_total = 0; + ctx->progress = start_progress(_("Scanning refs..."), 0); + } + + filter_refs(ref_array, &filter, FILTER_REFS_KIND_MASK); + + if (ctx->opts.show_progress) { + ctx->progress_total = ref_array->nr; + display_progress(ctx->progress, ctx->progress_total); + } + + ref_array_sort(sorting, ref_array); + + stop_progress(&ctx->progress); + ref_filter_clear(&filter); + ref_sorting_release(sorting); +} + +/* + * The REFS phase: + * + * Load the set of requested refs and assess them for scalablity problems. + * Use that set to start a treewalk to all reachable objects and assess + * them. + * + * This data will give us insights into the repository itself (the number + * of refs, the size and shape of the DAG, the number and size of the + * objects). + * + * Theoretically, this data is independent of the on-disk representation + * (e.g. independent of packing concerns). + */ +static void survey_phase_refs(struct survey_context *ctx) +{ + trace2_region_enter("survey", "phase/refs", ctx->repo); + do_load_refs(ctx, &ctx->ref_array); + + ctx->report.refs.refs_nr = ctx->ref_array.nr; + for (size_t i = 0; i < ctx->ref_array.nr; i++) { + unsigned long size; + struct ref_array_item *item = ctx->ref_array.items[i]; + + switch (item->kind) { + case FILTER_REFS_TAGS: + ctx->report.refs.tags_nr++; + if (oid_object_info(ctx->repo, + &item->objectname, + &size) == OBJ_TAG) + ctx->report.refs.tags_annotated_nr++; + break; + + case FILTER_REFS_BRANCHES: + ctx->report.refs.branches_nr++; + break; + + case FILTER_REFS_REMOTES: + ctx->report.refs.remote_refs_nr++; + break; + + case FILTER_REFS_OTHERS: + ctx->report.refs.others_nr++; + break; + + default: + ctx->report.refs.unknown_nr++; + break; + } + } + + trace2_region_leave("survey", "phase/refs", ctx->repo); +} + +static void increment_object_counts( + struct survey_report_object_summary *summary, + enum object_type type, + size_t nr) +{ + switch (type) { + case OBJ_COMMIT: + summary->commits_nr += nr; + break; + + case OBJ_TREE: + summary->trees_nr += nr; + break; + + case OBJ_BLOB: + summary->blobs_nr += nr; + break; + + case OBJ_TAG: + summary->tags_nr += nr; + break; + + default: + break; + } +} + +static void increment_totals(struct survey_context *ctx, + struct oid_array *oids, + struct survey_report_object_size_summary *summary) +{ + for (size_t i = 0; i < oids->nr; i++) { + struct object_info oi = OBJECT_INFO_INIT; + unsigned oi_flags = OBJECT_INFO_FOR_PREFETCH; + unsigned long object_length = 0; + off_t disk_sizep = 0; + enum object_type type; + + oi.typep = &type; + oi.sizep = &object_length; + oi.disk_sizep = &disk_sizep; + + if (oid_object_info_extended(ctx->repo, &oids->oid[i], + &oi, oi_flags) < 0) { + summary->num_missing++; + } else { + summary->nr++; + summary->disk_size += disk_sizep; + summary->inflated_size += object_length; + } + } +} + +static void increment_object_totals(struct survey_context *ctx, + struct oid_array *oids, + enum object_type type, + const char *path) +{ + struct survey_report_object_size_summary *total; + struct survey_report_object_size_summary summary = { 0 }; + + increment_totals(ctx, oids, &summary); + + switch (type) { + case OBJ_COMMIT: + total = &ctx->report.by_type[REPORT_TYPE_COMMIT]; + break; + + case OBJ_TREE: + total = &ctx->report.by_type[REPORT_TYPE_TREE]; + break; + + case OBJ_BLOB: + total = &ctx->report.by_type[REPORT_TYPE_BLOB]; + break; + + case OBJ_TAG: + total = &ctx->report.by_type[REPORT_TYPE_TAG]; + break; + + default: + BUG("No other type allowed"); + } + + total->nr += summary.nr; + total->disk_size += summary.disk_size; + total->inflated_size += summary.inflated_size; + total->num_missing += summary.num_missing; + + if (type == OBJ_TREE || type == OBJ_BLOB) { + int index = type == OBJ_TREE ? + REPORT_TYPE_TREE : REPORT_TYPE_BLOB; + struct survey_report_top_table *top; + + /* + * Temporarily store (const char *) here, but it will + * be duped if inserted and will not be freed. + */ + summary.label = (char *)path; + + top = ctx->report.top_paths_by_count; + maybe_insert_into_top_size(&top[index], &summary); + + top = ctx->report.top_paths_by_disk; + maybe_insert_into_top_size(&top[index], &summary); + + top = ctx->report.top_paths_by_inflate; + maybe_insert_into_top_size(&top[index], &summary); + } +} + +static int survey_objects_path_walk_fn(const char *path, + struct oid_array *oids, + enum object_type type, + void *data) +{ + struct survey_context *ctx = data; + + increment_object_counts(&ctx->report.reachable_objects, + type, oids->nr); + increment_object_totals(ctx, oids, type, path); + + ctx->progress_nr += oids->nr; + display_progress(ctx->progress, ctx->progress_nr); + + return 0; +} + +static void initialize_report(struct survey_context *ctx) +{ + CALLOC_ARRAY(ctx->report.by_type, REPORT_TYPE_COUNT); + ctx->report.by_type[REPORT_TYPE_COMMIT].label = xstrdup(_("Commits")); + ctx->report.by_type[REPORT_TYPE_TREE].label = xstrdup(_("Trees")); + ctx->report.by_type[REPORT_TYPE_BLOB].label = xstrdup(_("Blobs")); + ctx->report.by_type[REPORT_TYPE_TAG].label = xstrdup(_("Tags")); + + CALLOC_ARRAY(ctx->report.top_paths_by_count, REPORT_TYPE_COUNT); + init_top_sizes(&ctx->report.top_paths_by_count[REPORT_TYPE_TREE], + ctx->opts.top_nr, _("TOP DIRECTORIES BY COUNT"), cmp_by_nr); + init_top_sizes(&ctx->report.top_paths_by_count[REPORT_TYPE_BLOB], + ctx->opts.top_nr, _("TOP FILES BY COUNT"), cmp_by_nr); + + CALLOC_ARRAY(ctx->report.top_paths_by_disk, REPORT_TYPE_COUNT); + init_top_sizes(&ctx->report.top_paths_by_disk[REPORT_TYPE_TREE], + ctx->opts.top_nr, _("TOP DIRECTORIES BY DISK SIZE"), cmp_by_disk_size); + init_top_sizes(&ctx->report.top_paths_by_disk[REPORT_TYPE_BLOB], + ctx->opts.top_nr, _("TOP FILES BY DISK SIZE"), cmp_by_disk_size); + + CALLOC_ARRAY(ctx->report.top_paths_by_inflate, REPORT_TYPE_COUNT); + init_top_sizes(&ctx->report.top_paths_by_inflate[REPORT_TYPE_TREE], + ctx->opts.top_nr, _("TOP DIRECTORIES BY INFLATED SIZE"), cmp_by_inflated_size); + init_top_sizes(&ctx->report.top_paths_by_inflate[REPORT_TYPE_BLOB], + ctx->opts.top_nr, _("TOP FILES BY INFLATED SIZE"), cmp_by_inflated_size); +} + +static void survey_phase_objects(struct survey_context *ctx) +{ + struct rev_info revs = REV_INFO_INIT; + struct path_walk_info info = PATH_WALK_INFO_INIT; + unsigned int add_flags = 0; + + trace2_region_enter("survey", "phase/objects", ctx->repo); + + info.revs = &revs; + info.path_fn = survey_objects_path_walk_fn; + info.path_fn_data = ctx; + + initialize_report(ctx); + + repo_init_revisions(ctx->repo, &revs, ""); + revs.tag_objects = 1; + + ctx->progress_nr = 0; + ctx->progress_total = ctx->ref_array.nr; + if (ctx->opts.show_progress) + ctx->progress = start_progress(_("Preparing object walk"), + ctx->progress_total); + for (size_t i = 0; i < ctx->ref_array.nr; i++) { + struct ref_array_item *item = ctx->ref_array.items[i]; + add_pending_oid(&revs, NULL, &item->objectname, add_flags); + display_progress(ctx->progress, ++(ctx->progress_nr)); + } + stop_progress(&ctx->progress); + + ctx->progress_nr = 0; + ctx->progress_total = 0; + if (ctx->opts.show_progress) + ctx->progress = start_progress(_("Walking objects"), 0); + walk_objects_by_path(&info); + stop_progress(&ctx->progress); + + release_revisions(&revs); + trace2_region_leave("survey", "phase/objects", ctx->repo); +} + +int cmd_survey(int argc, const char **argv, const char *prefix, + struct repository *repo) +{ + static struct survey_context ctx = { + .opts = { + .verbose = 0, + .show_progress = -1, /* defaults to isatty(2) */ + .top_nr = 100, + + .refs.want_all_refs = -1, + + .refs.want_branches = -1, /* default these to undefined */ + .refs.want_tags = -1, + .refs.want_remotes = -1, + .refs.want_detached = -1, + .refs.want_other = -1, + }, + .refs = STRVEC_INIT, + }; + + static struct option survey_options[] = { + OPT__VERBOSE(&ctx.opts.verbose, N_("verbose output")), + OPT_BOOL(0, "progress", &ctx.opts.show_progress, N_("show progress")), + OPT_INTEGER('n', "top", &ctx.opts.top_nr, + N_("number of entries to include in detail tables")), + + OPT_BOOL_F(0, "all-refs", &ctx.opts.refs.want_all_refs, N_("include all refs"), PARSE_OPT_NONEG), + + OPT_BOOL_F(0, "branches", &ctx.opts.refs.want_branches, N_("include branches"), PARSE_OPT_NONEG), + OPT_BOOL_F(0, "tags", &ctx.opts.refs.want_tags, N_("include tags"), PARSE_OPT_NONEG), + OPT_BOOL_F(0, "remotes", &ctx.opts.refs.want_remotes, N_("include all remotes refs"), PARSE_OPT_NONEG), + OPT_BOOL_F(0, "detached", &ctx.opts.refs.want_detached, N_("include detached HEAD"), PARSE_OPT_NONEG), + OPT_BOOL_F(0, "other", &ctx.opts.refs.want_other, N_("include notes and stashes"), PARSE_OPT_NONEG), + + OPT_END(), + }; + + if (argc == 2 && !strcmp(argv[1], "-h")) + usage_with_options(survey_usage, survey_options); + + if (isatty(2)) + color_fprintf_ln(stderr, + want_color_fd(2, GIT_COLOR_AUTO) ? GIT_COLOR_YELLOW : "", + "(THIS IS EXPERIMENTAL, EXPECT THE OUTPUT FORMAT TO CHANGE!)"); + + ctx.repo = repo; + + prepare_repo_settings(ctx.repo); + survey_load_config(&ctx); + + argc = parse_options(argc, argv, prefix, survey_options, survey_usage, 0); + + if (ctx.opts.show_progress < 0) + ctx.opts.show_progress = isatty(2); + + fixup_refs_wanted(&ctx); + + survey_phase_refs(&ctx); + + survey_phase_objects(&ctx); + + survey_report_plaintext(&ctx); + + clear_survey_context(&ctx); + return 0; +} diff --git a/command-list.txt b/command-list.txt index e0bb87b3b5c278..d389561a5f1161 100644 --- a/command-list.txt +++ b/command-list.txt @@ -186,6 +186,7 @@ git-stash mainporcelain git-status mainporcelain info git-stripspace purehelpers git-submodule mainporcelain +git-survey mainporcelain git-svn foreignscminterface git-switch mainporcelain history git-symbolic-ref plumbingmanipulators diff --git a/git.c b/git.c index 2fbea24ec921e0..18c35342cdb2cd 100644 --- a/git.c +++ b/git.c @@ -629,6 +629,7 @@ static struct cmd_struct commands[] = { { "status", cmd_status, RUN_SETUP | NEED_WORK_TREE }, { "stripspace", cmd_stripspace }, { "submodule--helper", cmd_submodule__helper, RUN_SETUP }, + { "survey", cmd_survey, RUN_SETUP }, { "switch", cmd_switch, RUN_SETUP | NEED_WORK_TREE }, { "symbolic-ref", cmd_symbolic_ref, RUN_SETUP }, { "tag", cmd_tag, RUN_SETUP | DELAY_PAGER_CONFIG }, diff --git a/path-walk.c b/path-walk.c new file mode 100644 index 00000000000000..896ec0c4779101 --- /dev/null +++ b/path-walk.c @@ -0,0 +1,531 @@ +/* + * path-walk.c: implementation for path-based walks of the object graph. + */ +#include "git-compat-util.h" +#include "path-walk.h" +#include "blob.h" +#include "commit.h" +#include "dir.h" +#include "hashmap.h" +#include "hex.h" +#include "list-objects.h" +#include "object.h" +#include "oid-array.h" +#include "revision.h" +#include "string-list.h" +#include "strmap.h" +#include "tag.h" +#include "trace2.h" +#include "tree.h" +#include "tree-walk.h" + +static const char *root_path = ""; + +struct type_and_oid_list +{ + enum object_type type; + struct oid_array oids; + int maybe_interesting; +}; + +#define TYPE_AND_OID_LIST_INIT { \ + .type = OBJ_NONE, \ + .oids = OID_ARRAY_INIT \ +} + +struct path_walk_context { + /** + * Repeats of data in 'struct path_walk_info' for + * access with fewer characters. + */ + struct repository *repo; + struct rev_info *revs; + struct path_walk_info *info; + + /** + * Map a path to a 'struct type_and_oid_list' + * containing the objects discovered at that + * path. + */ + struct strmap paths_to_lists; + + /** + * Store the current list of paths in a stack, to + * facilitate depth-first-search without recursion. + * + * Use path_stack_pushed to indicate whether a path + * was previously added to path_stack. + */ + struct string_list path_stack; + struct strset path_stack_pushed; +}; + +static void push_to_stack(struct path_walk_context *ctx, + const char *path) +{ + if (strset_contains(&ctx->path_stack_pushed, path)) + return; + + strset_add(&ctx->path_stack_pushed, path); + string_list_append(&ctx->path_stack, path); +} + +static int add_children(struct path_walk_context *ctx, + const char *base_path, + struct object_id *oid) +{ + struct tree_desc desc; + struct name_entry entry; + struct strbuf path = STRBUF_INIT; + size_t base_len; + struct tree *tree = lookup_tree(ctx->repo, oid); + + if (!tree) { + error(_("failed to walk children of tree %s: not found"), + oid_to_hex(oid)); + return -1; + } else if (parse_tree_gently(tree, 1)) { + die("bad tree object %s", oid_to_hex(oid)); + } + + strbuf_addstr(&path, base_path); + base_len = path.len; + + parse_tree(tree); + init_tree_desc(&desc, &tree->object.oid, tree->buffer, tree->size); + while (tree_entry(&desc, &entry)) { + struct type_and_oid_list *list; + struct object *o; + /* Not actually true, but we will ignore submodules later. */ + enum object_type type = S_ISDIR(entry.mode) ? OBJ_TREE : OBJ_BLOB; + + /* Skip submodules. */ + if (S_ISGITLINK(entry.mode)) + continue; + + /* If the caller doesn't want blobs, then don't bother. */ + if (!ctx->info->blobs && type == OBJ_BLOB) + continue; + + if (type == OBJ_TREE) { + struct tree *child = lookup_tree(ctx->repo, &entry.oid); + o = child ? &child->object : NULL; + } else if (type == OBJ_BLOB) { + struct blob *child = lookup_blob(ctx->repo, &entry.oid); + o = child ? &child->object : NULL; + } else { + /* Wrong type? */ + continue; + } + + if (!o) /* report error?*/ + continue; + + strbuf_setlen(&path, base_len); + strbuf_add(&path, entry.path, entry.pathlen); + + /* + * Trees will end with "/" for concatenation and distinction + * from blobs at the same path. + */ + if (type == OBJ_TREE) + strbuf_addch(&path, '/'); + + if (!(list = strmap_get(&ctx->paths_to_lists, path.buf))) { + CALLOC_ARRAY(list, 1); + list->type = type; + strmap_put(&ctx->paths_to_lists, path.buf, list); + } + push_to_stack(ctx, path.buf); + + /* Skip this object if already seen. */ + if (o->flags & SEEN) + continue; + o->flags |= SEEN; + + if (!(o->flags & UNINTERESTING)) + list->maybe_interesting = 1; + oid_array_append(&list->oids, &entry.oid); + } + + free_tree_buffer(tree); + strbuf_release(&path); + return 0; +} + +/* + * For each path in paths_to_explore, walk the trees another level + * and add any found blobs to the batch (but only if they exist and + * haven't been added yet). + */ +static int walk_path(struct path_walk_context *ctx, + const char *path) +{ + struct type_and_oid_list *list; + int ret = 0; + + list = strmap_get(&ctx->paths_to_lists, path); + + if (!list) + BUG("provided path '%s' that had no associated list", path); + + if (!list->oids.nr) + return 0; + + if (ctx->info->prune_all_uninteresting) { + /* + * This is true if all objects were UNINTERESTING + * when added to the list. + */ + if (!list->maybe_interesting) + return 0; + + /* + * But it's still possible that the objects were set + * as UNINTERESTING after being added. Do a quick check. + */ + list->maybe_interesting = 0; + for (size_t i = 0; + !list->maybe_interesting && i < list->oids.nr; + i++) { + if (list->type == OBJ_TREE) { + struct tree *t = lookup_tree(ctx->repo, + &list->oids.oid[i]); + if (t && !(t->object.flags & UNINTERESTING)) + list->maybe_interesting = 1; + } else if (list->type == OBJ_BLOB) { + struct blob *b = lookup_blob(ctx->repo, + &list->oids.oid[i]); + if (b && !(b->object.flags & UNINTERESTING)) + list->maybe_interesting = 1; + } else { + /* Tags are always interesting if visited. */ + list->maybe_interesting = 1; + } + } + + /* We have confirmed that all objects are UNINTERESTING. */ + if (!list->maybe_interesting) + return 0; + } + + /* Evaluate function pointer on this data, if requested. */ + if ((list->type == OBJ_TREE && ctx->info->trees) || + (list->type == OBJ_BLOB && ctx->info->blobs) || + (list->type == OBJ_TAG && ctx->info->tags)) + ret = ctx->info->path_fn(path, &list->oids, list->type, + ctx->info->path_fn_data); + + /* Expand data for children. */ + if (list->type == OBJ_TREE) { + for (size_t i = 0; i < list->oids.nr; i++) { + ret |= add_children(ctx, + path, + &list->oids.oid[i]); + } + } + + oid_array_clear(&list->oids); + strmap_remove(&ctx->paths_to_lists, path, 1); + return ret; +} + +static void clear_strmap(struct strmap *map) +{ + struct hashmap_iter iter; + struct strmap_entry *e; + + hashmap_for_each_entry(&map->map, &iter, e, ent) { + struct type_and_oid_list *list = e->value; + oid_array_clear(&list->oids); + } + strmap_clear(map, 1); + strmap_init(map); +} + +static struct repository *edge_repo; +static struct type_and_oid_list *edge_tree_list; + +static void show_edge(struct commit *commit) +{ + struct tree *t = repo_get_commit_tree(edge_repo, commit); + + if (!t) + return; + + if (commit->object.flags & UNINTERESTING) + t->object.flags |= UNINTERESTING; + + if (t->object.flags & SEEN) + return; + t->object.flags |= SEEN; + + oid_array_append(&edge_tree_list->oids, &t->object.oid); +} + +static void setup_pending_objects(struct path_walk_info *info, + struct path_walk_context *ctx) +{ + struct type_and_oid_list *tags = NULL; + struct type_and_oid_list *tagged_blobs = NULL; + struct type_and_oid_list *root_tree_list = NULL; + + if (info->tags) + CALLOC_ARRAY(tags, 1); + if (info->blobs) + CALLOC_ARRAY(tagged_blobs, 1); + if (info->trees) + root_tree_list = strmap_get(&ctx->paths_to_lists, root_path); + + /* + * Pending objects include: + * * Commits at branch tips. + * * Annotated tags at tag tips. + * * Any kind of object at lightweight tag tips. + * * Trees and blobs in the index (with an associated path). + */ + for (size_t i = 0; i < info->revs->pending.nr; i++) { + struct object_array_entry *pending = info->revs->pending.objects + i; + struct object *obj = pending->item; + + /* Commits will be picked up by revision walk. */ + if (obj->type == OBJ_COMMIT) + continue; + + /* Navigate annotated tag object chains. */ + while (obj->type == OBJ_TAG) { + struct tag *tag = lookup_tag(info->revs->repo, &obj->oid); + if (!tag) + break; + if (tag->object.flags & SEEN) + break; + tag->object.flags |= SEEN; + + if (tags) + oid_array_append(&tags->oids, &obj->oid); + obj = tag->tagged; + } + + if (obj->type == OBJ_TAG) + continue; + + /* We are now at a non-tag object. */ + if (obj->flags & SEEN) + continue; + obj->flags |= SEEN; + + switch (obj->type) { + case OBJ_TREE: + if (!info->trees) + continue; + if (pending->path) { + struct type_and_oid_list *list; + char *path = *pending->path ? xstrfmt("%s/", pending->path) + : xstrdup(""); + if (!(list = strmap_get(&ctx->paths_to_lists, path))) { + CALLOC_ARRAY(list, 1); + list->type = OBJ_TREE; + strmap_put(&ctx->paths_to_lists, path, list); + } + oid_array_append(&list->oids, &obj->oid); + free(path); + } else { + /* assume a root tree, such as a lightweight tag. */ + oid_array_append(&root_tree_list->oids, &obj->oid); + } + break; + + case OBJ_BLOB: + if (!info->blobs) + continue; + if (pending->path) { + struct type_and_oid_list *list; + char *path = pending->path; + if (!(list = strmap_get(&ctx->paths_to_lists, path))) { + CALLOC_ARRAY(list, 1); + list->type = OBJ_BLOB; + strmap_put(&ctx->paths_to_lists, path, list); + } + oid_array_append(&list->oids, &obj->oid); + } else { + /* assume a root tree, such as a lightweight tag. */ + oid_array_append(&tagged_blobs->oids, &obj->oid); + } + break; + + case OBJ_COMMIT: + /* Make sure it is in the object walk */ + if (obj != pending->item) + add_pending_object(info->revs, obj, ""); + break; + + default: + BUG("should not see any other type here"); + } + } + + /* + * Add tag objects and tagged blobs if they exist. + */ + if (tagged_blobs) { + if (tagged_blobs->oids.nr) { + const char *tagged_blob_path = "/tagged-blobs"; + tagged_blobs->type = OBJ_BLOB; + tagged_blobs->maybe_interesting = 1; + push_to_stack(ctx, tagged_blob_path); + strmap_put(&ctx->paths_to_lists, tagged_blob_path, tagged_blobs); + } else { + oid_array_clear(&tagged_blobs->oids); + free(tagged_blobs); + } + } + if (tags) { + if (tags->oids.nr) { + const char *tag_path = "/tags"; + tags->type = OBJ_TAG; + tags->maybe_interesting = 1; + push_to_stack(ctx, tag_path); + strmap_put(&ctx->paths_to_lists, tag_path, tags); + } else { + oid_array_clear(&tags->oids); + free(tags); + } + } +} + +/** + * Given the configuration of 'info', walk the commits based on 'info->revs' and + * call 'info->path_fn' on each discovered path. + * + * Returns nonzero on an error. + */ +int walk_objects_by_path(struct path_walk_info *info) +{ + int ret = 0; + size_t commits_nr = 0, paths_nr = 0; + struct commit *c; + struct type_and_oid_list *root_tree_list; + struct type_and_oid_list *commit_list; + struct path_walk_context ctx = { + .repo = info->revs->repo, + .revs = info->revs, + .info = info, + .path_stack = STRING_LIST_INIT_DUP, + .path_stack_pushed = STRSET_INIT, + .paths_to_lists = STRMAP_INIT + }; + + trace2_region_enter("path-walk", "commit-walk", info->revs->repo); + + CALLOC_ARRAY(commit_list, 1); + commit_list->type = OBJ_COMMIT; + + if (info->tags) + info->revs->tag_objects = 1; + + /* Insert a single list for the root tree into the paths. */ + CALLOC_ARRAY(root_tree_list, 1); + root_tree_list->type = OBJ_TREE; + root_tree_list->maybe_interesting = 1; + strmap_put(&ctx.paths_to_lists, root_path, root_tree_list); + push_to_stack(&ctx, root_path); + + /* + * Set these values before preparing the walk to catch + * lightweight tags pointing to non-commits and indexed objects. + */ + info->revs->blob_objects = info->blobs; + info->revs->tree_objects = info->trees; + + if (prepare_revision_walk(info->revs)) + die(_("failed to setup revision walk")); + + /* Walk trees to mark them as UNINTERESTING. */ + edge_repo = info->revs->repo; + edge_tree_list = root_tree_list; + mark_edges_uninteresting(info->revs, show_edge, + info->prune_all_uninteresting); + edge_repo = NULL; + edge_tree_list = NULL; + + info->revs->blob_objects = info->revs->tree_objects = 0; + + trace2_region_enter("path-walk", "pending-walk", info->revs->repo); + setup_pending_objects(info, &ctx); + trace2_region_leave("path-walk", "pending-walk", info->revs->repo); + + while ((c = get_revision(info->revs))) { + struct object_id *oid; + struct tree *t; + commits_nr++; + + if (info->commits) + oid_array_append(&commit_list->oids, + &c->object.oid); + + /* If we only care about commits, then skip trees. */ + if (!info->trees && !info->blobs) + continue; + + oid = get_commit_tree_oid(c); + t = lookup_tree(info->revs->repo, oid); + + if (!t) { + warning("could not find tree %s", oid_to_hex(oid)); + continue; + } + + if (t->object.flags & SEEN) + continue; + t->object.flags |= SEEN; + oid_array_append(&root_tree_list->oids, oid); + } + + trace2_data_intmax("path-walk", ctx.repo, "commits", commits_nr); + trace2_region_leave("path-walk", "commit-walk", info->revs->repo); + + /* Track all commits. */ + if (info->commits && commit_list->oids.nr) + ret = info->path_fn("", &commit_list->oids, OBJ_COMMIT, + info->path_fn_data); + oid_array_clear(&commit_list->oids); + free(commit_list); + + trace2_region_enter("path-walk", "path-walk", info->revs->repo); + while (!ret && ctx.path_stack.nr) { + char *path = ctx.path_stack.items[ctx.path_stack.nr - 1].string; + ctx.path_stack.nr--; + paths_nr++; + + ret = walk_path(&ctx, path); + + free(path); + } + + /* Are there paths remaining? Likely they are from indexed objects. */ + if (!strmap_empty(&ctx.paths_to_lists)) { + struct hashmap_iter iter; + struct strmap_entry *entry; + + strmap_for_each_entry(&ctx.paths_to_lists, &iter, entry) + push_to_stack(&ctx, entry->key); + + while (!ret && ctx.path_stack.nr) { + char *path = ctx.path_stack.items[ctx.path_stack.nr - 1].string; + ctx.path_stack.nr--; + paths_nr++; + + ret = walk_path(&ctx, path); + + free(path); + } + } + + trace2_data_intmax("path-walk", ctx.repo, "paths", paths_nr); + trace2_region_leave("path-walk", "path-walk", info->revs->repo); + + clear_strmap(&ctx.paths_to_lists); + strset_clear(&ctx.path_stack_pushed); + string_list_clear(&ctx.path_stack, 0); + return ret; +} diff --git a/path-walk.h b/path-walk.h new file mode 100644 index 00000000000000..de0db007dc9a5f --- /dev/null +++ b/path-walk.h @@ -0,0 +1,65 @@ +/* + * path-walk.h : Methods and structures for walking the object graph in batches + * by the paths that can reach those objects. + */ +#include "object.h" /* Required for 'enum object_type'. */ + +struct rev_info; +struct oid_array; + +/** + * The type of a function pointer for the method that is called on a list of + * objects reachable at a given path. + */ +typedef int (*path_fn)(const char *path, + struct oid_array *oids, + enum object_type type, + void *data); + +struct path_walk_info { + /** + * revs provides the definitions for the commit walk, including + * which commits are UNINTERESTING or not. + */ + struct rev_info *revs; + + /** + * The caller wishes to execute custom logic on objects reachable at a + * given path. Every reachable object will be visited exactly once, and + * the first path to see an object wins. This may not be a stable choice. + */ + path_fn path_fn; + void *path_fn_data; + + /** + * Initialize which object types the path_fn should be called on. This + * could also limit the walk to skip blobs if not set. + */ + int commits; + int trees; + int blobs; + int tags; + + /** + * When 'prune_all_uninteresting' is set and a path has all objects + * marked as UNINTERESTING, then the path-walk will not visit those + * objects. It will not call path_fn on those objects and will not + * walk the children of such trees. + */ + int prune_all_uninteresting; +}; + +#define PATH_WALK_INFO_INIT { \ + .blobs = 1, \ + .trees = 1, \ + .commits = 1, \ + .tags = 1, \ +} + +/** + * Given the configuration of 'info', walk the commits based on 'info->revs' and + * call 'info->path_fn' on each discovered path. + * + * Returns nonzero on an error. + */ +int walk_objects_by_path(struct path_walk_info *info); diff --git a/t/helper/test-path-walk.c b/t/helper/test-path-walk.c new file mode 100644 index 00000000000000..7e791cfaf979b7 --- /dev/null +++ b/t/helper/test-path-walk.c @@ -0,0 +1,124 @@ +#define USE_THE_REPOSITORY_VARIABLE + +#include "test-tool.h" +#include "environment.h" +#include "hex.h" +#include "object-name.h" +#include "object.h" +#include "pretty.h" +#include "revision.h" +#include "setup.h" +#include "parse-options.h" +#include "path-walk.h" +#include "oid-array.h" + +static const char * const path_walk_usage[] = { + N_("test-tool path-walk -- "), + NULL +}; + +struct path_walk_test_data { + uintmax_t batch_nr; + + uintmax_t commit_nr; + uintmax_t tree_nr; + uintmax_t blob_nr; + uintmax_t tag_nr; +}; + +static int emit_block(const char *path, struct oid_array *oids, + enum object_type type, void *data) +{ + struct path_walk_test_data *tdata = data; + const char *typestr; + + switch (type) { + case OBJ_COMMIT: + typestr = "COMMIT"; + tdata->commit_nr += oids->nr; + break; + + case OBJ_TREE: + typestr = "TREE"; + tdata->tree_nr += oids->nr; + break; + + case OBJ_BLOB: + typestr = "BLOB"; + tdata->blob_nr += oids->nr; + break; + + case OBJ_TAG: + typestr = "TAG"; + tdata->tag_nr += oids->nr; + break; + + default: + BUG("we do not understand this type"); + } + + /* This should never be output during tests. */ + if (!oids->nr) + printf("%"PRIuMAX":%s:%s:EMPTY\n", + tdata->batch_nr, typestr, path); + + for (size_t i = 0; i < oids->nr; i++) { + struct object *o = lookup_unknown_object(the_repository, + &oids->oid[i]); + printf("%"PRIuMAX":%s:%s:%s%s\n", + tdata->batch_nr, typestr, path, + oid_to_hex(&oids->oid[i]), + o->flags & UNINTERESTING ? ":UNINTERESTING" : ""); + } + + tdata->batch_nr++; + return 0; +} + +int cmd__path_walk(int argc, const char **argv) +{ + int res; + struct rev_info revs = REV_INFO_INIT; + struct path_walk_info info = PATH_WALK_INFO_INIT; + struct path_walk_test_data data = { 0 }; + struct option options[] = { + OPT_BOOL(0, "blobs", &info.blobs, + N_("toggle inclusion of blob objects")), + OPT_BOOL(0, "commits", &info.commits, + N_("toggle inclusion of commit objects")), + OPT_BOOL(0, "tags", &info.tags, + N_("toggle inclusion of tag objects")), + OPT_BOOL(0, "trees", &info.trees, + N_("toggle inclusion of tree objects")), + OPT_BOOL(0, "prune", &info.prune_all_uninteresting, + N_("toggle pruning of uninteresting paths")), + OPT_END(), + }; + + setup_git_directory(); + revs.repo = the_repository; + + argc = parse_options(argc, argv, NULL, + options, path_walk_usage, + PARSE_OPT_KEEP_UNKNOWN_OPT | PARSE_OPT_KEEP_ARGV0); + + if (argc > 1) + setup_revisions(argc, argv, &revs, NULL); + else + usage(path_walk_usage[0]); + + info.revs = &revs; + info.path_fn = emit_block; + info.path_fn_data = &data; + + res = walk_objects_by_path(&info); + + printf("commits:%" PRIuMAX "\n" + "trees:%" PRIuMAX "\n" + "blobs:%" PRIuMAX "\n" + "tags:%" PRIuMAX "\n", + data.commit_nr, data.tree_nr, data.blob_nr, data.tag_nr); + + release_revisions(&revs); + return res; +} diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c index 1ebb69a5dc4c17..43676e7b93a43f 100644 --- a/t/helper/test-tool.c +++ b/t/helper/test-tool.c @@ -52,6 +52,7 @@ static struct test_cmd cmds[] = { { "parse-subcommand", cmd__parse_subcommand }, { "partial-clone", cmd__partial_clone }, { "path-utils", cmd__path_utils }, + { "path-walk", cmd__path_walk }, { "pcre2-config", cmd__pcre2_config }, { "pkt-line", cmd__pkt_line }, { "proc-receive", cmd__proc_receive }, diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h index 21802ac27da37f..9cfc5da6e57b00 100644 --- a/t/helper/test-tool.h +++ b/t/helper/test-tool.h @@ -45,6 +45,7 @@ int cmd__parse_pathspec_file(int argc, const char** argv); int cmd__parse_subcommand(int argc, const char **argv); int cmd__partial_clone(int argc, const char **argv); int cmd__path_utils(int argc, const char **argv); +int cmd__path_walk(int argc, const char **argv); int cmd__pcre2_config(int argc, const char **argv); int cmd__pkt_line(int argc, const char **argv); int cmd__proc_receive(int argc, const char **argv); diff --git a/t/t6601-path-walk.sh b/t/t6601-path-walk.sh new file mode 100755 index 00000000000000..d3c0015319a354 --- /dev/null +++ b/t/t6601-path-walk.sh @@ -0,0 +1,368 @@ +#!/bin/sh + +TEST_PASSES_SANITIZE_LEAK=true + +test_description='direct path-walk API tests' + +. ./test-lib.sh + +test_expect_success 'setup test repository' ' + git checkout -b base && + + # Make some objects that will only be reachable + # via non-commit tags. + mkdir child && + echo file >child/file && + git add child && + git commit -m "will abandon" && + git tag -a -m "tree" tree-tag HEAD^{tree} && + echo file2 >file2 && + git add file2 && + git commit --amend -m "will abandon" && + git tag tree-tag2 HEAD^{tree} && + + echo blob >file && + blob_oid=$(git hash-object -t blob -w --stdin file2 && + blob2_oid=$(git hash-object -t blob -w --stdin a && + echo b >left/b && + echo c >right/c && + git add . && + git commit --amend -m "first" && + git tag -m "first" first HEAD && + + echo d >right/d && + git add right && + git commit -m "second" && + git tag -a -m "second (under)" second.1 HEAD && + git tag -a -m "second (top)" second.2 second.1 && + + # Set up file/dir collision in history. + rm a && + mkdir a && + echo a >a/a && + echo bb >left/b && + git add a left && + git commit -m "third" && + git tag -a -m "third" third && + + git checkout -b topic HEAD~1 && + echo cc >right/c && + git commit -a -m "topic" && + git tag -a -m "fourth" fourth +' + +test_expect_success 'all' ' + test-tool path-walk -- --all >out && + + cat >expect <<-EOF && + 0:COMMIT::$(git rev-parse topic) + 0:COMMIT::$(git rev-parse base) + 0:COMMIT::$(git rev-parse base~1) + 0:COMMIT::$(git rev-parse base~2) + 1:TAG:/tags:$(git rev-parse refs/tags/first) + 1:TAG:/tags:$(git rev-parse refs/tags/second.1) + 1:TAG:/tags:$(git rev-parse refs/tags/second.2) + 1:TAG:/tags:$(git rev-parse refs/tags/third) + 1:TAG:/tags:$(git rev-parse refs/tags/fourth) + 1:TAG:/tags:$(git rev-parse refs/tags/tree-tag) + 1:TAG:/tags:$(git rev-parse refs/tags/blob-tag) + 2:BLOB:/tagged-blobs:$(git rev-parse refs/tags/blob-tag^{}) + 2:BLOB:/tagged-blobs:$(git rev-parse refs/tags/blob-tag2^{}) + 3:TREE::$(git rev-parse topic^{tree}) + 3:TREE::$(git rev-parse base^{tree}) + 3:TREE::$(git rev-parse base~1^{tree}) + 3:TREE::$(git rev-parse base~2^{tree}) + 3:TREE::$(git rev-parse refs/tags/tree-tag^{}) + 3:TREE::$(git rev-parse refs/tags/tree-tag2^{}) + 4:BLOB:a:$(git rev-parse base~2:a) + 5:TREE:right/:$(git rev-parse topic:right) + 5:TREE:right/:$(git rev-parse base~1:right) + 5:TREE:right/:$(git rev-parse base~2:right) + 6:BLOB:right/d:$(git rev-parse base~1:right/d) + 7:BLOB:right/c:$(git rev-parse base~2:right/c) + 7:BLOB:right/c:$(git rev-parse topic:right/c) + 8:TREE:left/:$(git rev-parse base:left) + 8:TREE:left/:$(git rev-parse base~2:left) + 9:BLOB:left/b:$(git rev-parse base~2:left/b) + 9:BLOB:left/b:$(git rev-parse base:left/b) + 10:TREE:a/:$(git rev-parse base:a) + 11:BLOB:file2:$(git rev-parse refs/tags/tree-tag2^{}:file2) + 12:TREE:child/:$(git rev-parse refs/tags/tree-tag:child) + 13:BLOB:child/file:$(git rev-parse refs/tags/tree-tag:child/file) + blobs:10 + commits:4 + tags:7 + trees:13 + EOF + + test_cmp_sorted expect out +' + +test_expect_success 'indexed objects' ' + test_when_finished git reset --hard && + + # stage change into index, adding a blob but + # also invalidating the cache-tree for the root + # and the "left" directory. + echo bogus >left/c && + git add left && + + test-tool path-walk -- --indexed-objects >out && + + cat >expect <<-EOF && + 0:BLOB:a:$(git rev-parse HEAD:a) + 1:BLOB:left/b:$(git rev-parse HEAD:left/b) + 2:BLOB:left/c:$(git rev-parse :left/c) + 3:BLOB:right/c:$(git rev-parse HEAD:right/c) + 4:BLOB:right/d:$(git rev-parse HEAD:right/d) + 5:TREE:right/:$(git rev-parse topic:right) + blobs:5 + commits:0 + tags:0 + trees:1 + EOF + + test_cmp_sorted expect out +' + +test_expect_success 'branches and indexed objects mix well' ' + test_when_finished git reset --hard && + + # stage change into index, adding a blob but + # also invalidating the cache-tree for the root + # and the "right" directory. + echo fake >right/d && + git add right && + + test-tool path-walk -- --indexed-objects --branches >out && + + cat >expect <<-EOF && + 0:COMMIT::$(git rev-parse topic) + 0:COMMIT::$(git rev-parse base) + 0:COMMIT::$(git rev-parse base~1) + 0:COMMIT::$(git rev-parse base~2) + 1:TREE::$(git rev-parse topic^{tree}) + 1:TREE::$(git rev-parse base^{tree}) + 1:TREE::$(git rev-parse base~1^{tree}) + 1:TREE::$(git rev-parse base~2^{tree}) + 2:BLOB:a:$(git rev-parse base~2:a) + 3:TREE:right/:$(git rev-parse topic:right) + 3:TREE:right/:$(git rev-parse base~1:right) + 3:TREE:right/:$(git rev-parse base~2:right) + 4:BLOB:right/d:$(git rev-parse base~1:right/d) + 4:BLOB:right/d:$(git rev-parse :right/d) + 5:BLOB:right/c:$(git rev-parse base~2:right/c) + 5:BLOB:right/c:$(git rev-parse topic:right/c) + 6:TREE:left/:$(git rev-parse base:left) + 6:TREE:left/:$(git rev-parse base~2:left) + 7:BLOB:left/b:$(git rev-parse base:left/b) + 7:BLOB:left/b:$(git rev-parse base~2:left/b) + 8:TREE:a/:$(git rev-parse refs/tags/third:a) + blobs:7 + commits:4 + tags:0 + trees:10 + EOF + + test_cmp_sorted expect out +' + +test_expect_success 'topic only' ' + test-tool path-walk -- topic >out && + + cat >expect <<-EOF && + 0:COMMIT::$(git rev-parse topic) + 0:COMMIT::$(git rev-parse base~1) + 0:COMMIT::$(git rev-parse base~2) + 1:TREE::$(git rev-parse topic^{tree}) + 1:TREE::$(git rev-parse base~1^{tree}) + 1:TREE::$(git rev-parse base~2^{tree}) + 2:TREE:right/:$(git rev-parse topic:right) + 2:TREE:right/:$(git rev-parse base~1:right) + 2:TREE:right/:$(git rev-parse base~2:right) + 3:BLOB:right/d:$(git rev-parse base~1:right/d) + 4:BLOB:right/c:$(git rev-parse base~2:right/c) + 4:BLOB:right/c:$(git rev-parse topic:right/c) + 5:TREE:left/:$(git rev-parse base~2:left) + 6:BLOB:left/b:$(git rev-parse base~2:left/b) + 7:BLOB:a:$(git rev-parse base~2:a) + blobs:5 + commits:3 + tags:0 + trees:7 + EOF + + test_cmp_sorted expect out +' + +test_expect_success 'topic, not base' ' + test-tool path-walk -- topic --not base >out && + + cat >expect <<-EOF && + 0:COMMIT::$(git rev-parse topic) + 1:TREE::$(git rev-parse topic^{tree}) + 2:TREE:right/:$(git rev-parse topic:right) + 3:BLOB:right/d:$(git rev-parse topic:right/d):UNINTERESTING + 4:BLOB:right/c:$(git rev-parse topic:right/c) + 5:TREE:left/:$(git rev-parse topic:left):UNINTERESTING + 6:BLOB:left/b:$(git rev-parse topic:left/b):UNINTERESTING + 7:BLOB:a:$(git rev-parse topic:a):UNINTERESTING + blobs:4 + commits:1 + tags:0 + trees:3 + EOF + + test_cmp_sorted expect out +' + +test_expect_success 'fourth, blob-tag2, not base' ' + test-tool path-walk -- fourth blob-tag2 --not base >out && + + cat >expect <<-EOF && + 0:COMMIT::$(git rev-parse topic) + 1:TAG:/tags:$(git rev-parse fourth) + 2:BLOB:/tagged-blobs:$(git rev-parse refs/tags/blob-tag2^{}) + 3:TREE::$(git rev-parse topic^{tree}) + 4:TREE:right/:$(git rev-parse topic:right) + 5:BLOB:right/d:$(git rev-parse base~1:right/d):UNINTERESTING + 6:BLOB:right/c:$(git rev-parse topic:right/c) + 7:TREE:left/:$(git rev-parse base~1:left):UNINTERESTING + 8:BLOB:left/b:$(git rev-parse base~1:left/b):UNINTERESTING + 9:BLOB:a:$(git rev-parse base~1:a):UNINTERESTING + blobs:5 + commits:1 + tags:1 + trees:3 + EOF + + test_cmp_sorted expect out +' + +test_expect_success 'topic, not base, only blobs' ' + test-tool path-walk --no-trees --no-commits \ + -- topic --not base >out && + + cat >expect <<-EOF && + 0:BLOB:right/d:$(git rev-parse topic:right/d):UNINTERESTING + 1:BLOB:right/c:$(git rev-parse topic:right/c) + 2:BLOB:left/b:$(git rev-parse topic:left/b):UNINTERESTING + 3:BLOB:a:$(git rev-parse topic:a):UNINTERESTING + blobs:4 + commits:0 + tags:0 + trees:0 + EOF + + test_cmp_sorted expect out +' + +# No, this doesn't make a lot of sense for the path-walk API, +# but it is possible to do. +test_expect_success 'topic, not base, only commits' ' + test-tool path-walk --no-blobs --no-trees \ + -- topic --not base >out && + + cat >expect <<-EOF && + 0:COMMIT::$(git rev-parse topic) + commits:1 + blobs:0 + tags:0 + trees:0 + EOF + + test_cmp_sorted expect out +' + +test_expect_success 'topic, not base, only trees' ' + test-tool path-walk --no-blobs --no-commits \ + -- topic --not base >out && + + cat >expect <<-EOF && + 0:TREE::$(git rev-parse topic^{tree}) + 1:TREE:right/:$(git rev-parse topic:right) + 2:TREE:left/:$(git rev-parse topic:left):UNINTERESTING + commits:0 + blobs:0 + tags:0 + trees:3 + EOF + + test_cmp_sorted expect out +' + +test_expect_success 'topic, not base, boundary' ' + test-tool path-walk -- --boundary topic --not base >out && + + cat >expect <<-EOF && + 0:COMMIT::$(git rev-parse topic) + 0:COMMIT::$(git rev-parse base~1):UNINTERESTING + 1:TREE::$(git rev-parse topic^{tree}) + 1:TREE::$(git rev-parse base~1^{tree}):UNINTERESTING + 2:TREE:right/:$(git rev-parse topic:right) + 2:TREE:right/:$(git rev-parse base~1:right):UNINTERESTING + 3:BLOB:right/d:$(git rev-parse base~1:right/d):UNINTERESTING + 4:BLOB:right/c:$(git rev-parse base~1:right/c):UNINTERESTING + 4:BLOB:right/c:$(git rev-parse topic:right/c) + 5:TREE:left/:$(git rev-parse base~1:left):UNINTERESTING + 6:BLOB:left/b:$(git rev-parse base~1:left/b):UNINTERESTING + 7:BLOB:a:$(git rev-parse base~1:a):UNINTERESTING + blobs:5 + commits:2 + tags:0 + trees:5 + EOF + + test_cmp_sorted expect out +' + +test_expect_success 'topic, not base, boundary with pruning' ' + test-tool path-walk --prune -- --boundary topic --not base >out && + + cat >expect <<-EOF && + 0:COMMIT::$(git rev-parse topic) + 0:COMMIT::$(git rev-parse base~1):UNINTERESTING + 1:TREE::$(git rev-parse topic^{tree}) + 1:TREE::$(git rev-parse base~1^{tree}):UNINTERESTING + 2:TREE:right/:$(git rev-parse topic:right) + 2:TREE:right/:$(git rev-parse base~1:right):UNINTERESTING + 3:BLOB:right/c:$(git rev-parse base~1:right/c):UNINTERESTING + 3:BLOB:right/c:$(git rev-parse topic:right/c) + blobs:2 + commits:2 + tags:0 + trees:4 + EOF + + test_cmp_sorted expect out +' + +test_expect_success 'trees are reported exactly once' ' + test_when_finished "rm -rf unique-trees" && + test_create_repo unique-trees && + ( + cd unique-trees && + mkdir initial && + test_commit initial/file && + git switch -c move-to-top && + git mv initial/file.t ./ && + test_tick && + git commit -m moved && + git update-ref refs/heads/other HEAD + ) && + test-tool -C unique-trees path-walk -- --all >out && + tree=$(git -C unique-trees rev-parse HEAD:) && + grep "$tree" out >out-filtered && + test_line_count = 1 out-filtered +' + +test_done diff --git a/t/t8100-git-survey.sh b/t/t8100-git-survey.sh new file mode 100755 index 00000000000000..8c6edfcae0c6c2 --- /dev/null +++ b/t/t8100-git-survey.sh @@ -0,0 +1,102 @@ +#!/bin/sh + +test_description='git survey' + +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME + +TEST_PASSES_SANITIZE_LEAK=0 +export TEST_PASSES_SANITIZE_LEAK + +. ./test-lib.sh + +test_expect_success 'git survey -h shows experimental warning' ' + test_expect_code 129 git survey -h 2>usage && + grep "EXPERIMENTAL!" usage +' + +test_expect_success 'create a semi-interesting repo' ' + test_commit_bulk 10 && + git tag -a -m one one HEAD~5 && + git tag -a -m two two HEAD~3 && + git tag -a -m three three two && + git tag -a -m four four three && + git update-ref -d refs/tags/three && + git update-ref -d refs/tags/two +' + +test_expect_success 'git survey --progress' ' + GIT_PROGRESS_DELAY=0 git survey --all-refs --progress >out 2>err && + grep "Preparing object walk" err +' + +test_expect_success 'git survey (default)' ' + git survey --all-refs >out 2>err && + test_line_count = 0 err && + + test_oid_cache <<-EOF && + commits_size_on_disk sha1: 1523 + commits_size_on_disk sha256: 1811 + + commits_size sha1: 2153 + commits_size sha256: 2609 + + trees_size_on_disk sha1: 495 + trees_size_on_disk sha256: 635 + + trees_size sha1: 1706 + trees_size sha256: 2366 + + tags_size sha1: 528 + tags_size sha256: 624 + + tags_size_on_disk sha1: 510 + tags_size_on_disk sha256: 569 + EOF + + tr , " " >expect <<-EOF && + GIT SURVEY for "$(pwd)" + ----------------------------------------------------- + + REFERENCES SUMMARY + ======================== + , Ref Type | Count + -----------------+------ + , Branches | 1 + Remote refs | 0 + Tags (all) | 2 + Tags (annotated) | 2 + + REACHABLE OBJECT SUMMARY + ======================== + Object Type | Count + ------------+------ + Tags | 4 + Commits | 10 + Trees | 10 + Blobs | 10 + + TOTAL OBJECT SIZES BY TYPE + =============================================== + Object Type | Count | Disk Size | Inflated Size + ------------+-------+-----------+-------------- + Commits | 10 | $(test_oid commits_size_on_disk) | $(test_oid commits_size) + Trees | 10 | $(test_oid trees_size_on_disk) | $(test_oid trees_size) + Blobs | 10 | 191 | 101 + Tags | 4 | $(test_oid tags_size_on_disk) | $(test_oid tags_size) + EOF + + lines=$(wc -l out-trimmed && + test_cmp expect out-trimmed && + + for type in "DIRECTORIES" "FILES" + do + for metric in "COUNT" "DISK SIZE" "INFLATED SIZE" + do + grep "TOP $type BY $metric" out || return 1 + done || return 1 + done +' + +test_done diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index fde9bf54fc35fc..16b70aebd60749 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -1267,6 +1267,16 @@ test_cmp () { eval "$GIT_TEST_CMP" '"$@"' } +# test_cmp_sorted runs test_cmp on sorted versions of the two +# input files. Uses "$1.sorted" and "$2.sorted" as temp files. + +test_cmp_sorted () { + sort <"$1" >"$1.sorted" && + sort <"$2" >"$2.sorted" && + test_cmp "$1.sorted" "$2.sorted" && + rm "$1.sorted" "$2.sorted" +} + # Check that the given config key has the expected value. # # test_cmp_config [-C ]