From 0fcbae0d2ebe37727e29be58963c41886d39be8f Mon Sep 17 00:00:00 2001 From: Jud Dagnall Date: Mon, 23 Sep 2024 14:39:19 -0700 Subject: [PATCH 1/8] Add a FrequencyTable Guide --- visidata/guides/FrequencyTable.md | 117 ++++++++++++++++++++++++++++++ 1 file changed, 117 insertions(+) create mode 100644 visidata/guides/FrequencyTable.md diff --git a/visidata/guides/FrequencyTable.md b/visidata/guides/FrequencyTable.md new file mode 100644 index 000000000..c925cf1d0 --- /dev/null +++ b/visidata/guides/FrequencyTable.md @@ -0,0 +1,117 @@ +# Frequency Tables are how you GROUP BY + +## About Frequency Tables + +A VisiData frequency table groups data into bins using one or more columns, and creates basic summary aggregates of the data. The default aggregates are as follows. However, if user-defined aggregate columns are present, they will also be included. + +- count - (the count of rows in each group) +- percent - (the percentage of the total rows in each group) +- histogram - (a visual representation of the percentage) + +Here's a tiny dataset of individual hat purchases. Each purchase is for between 1-4 hats. + +``` ++------------+-------+-------+-------+-------+ +| date | color | count | price | total | ++------------+-------+-------+-------+-------+ +| 2024-09-01 | Red | 1 | 30 | 30 | +| 2024-09-02 | Blue | 1 | 28 | 28 | +| 2024-09-02 | Green | 2 | 32 | 64 | +| 2024-09-03 | Red | 4 | 25 | 100 | +| 2024-09-03 | Blue | 1 | 33 | 33 | +| 2024-09-03 | Blue | 3 | 33 | 99 | ++------------+-------+-------+-------+-------+ +``` + +A Frequency Table for the `color` column looks like: + +``` ++-------+-------+---------+----------------------------------------+ +| color | count | percent | histogram | ++-------+-------+---------+----------------------------------------+ +| Blue | 3 | 50 | ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ | +| Red | 2 | 33.33 | ■■■■■■■■■■■■■■■■■■■■■■■■■ | +| Green | 1 | 16.67 | ■■■■■■■■■■■■ | ++-------+-------+---------+----------------------------------------+ +``` + +Given data like this, use a frequency table to answer questions like: + +- How many purchases (rows) per day? +- What unique colors were sold? +- How many purchases (rows) per day and color? +- What color generated the most in total sales? +- What is the first purchase (row) for each day? + +## Group by a single column + +To group by date, navigate to the `date` column and press {help.commands.freq_col} + +Note that in addition to getting the counts, the frequency table also provides a list of unique items in the selected column. + +For example, navigating to the `color` column and pressing [:keys]Shift+F[/] is a quick way to see the unique colors. + +## Group by multiple columns + +To group by multiple columns, the grouping columns must be set as key columns. + +- use {help.commands.key_col} + +In this example, date and color could be set as key columns. + +After setting one or more key columns, group by the key columns to create a frequency table: + +- {help.commands.freq_keys} + +This table can answer how many invoices were created per date and color. + +## User-defined aggregates + +Note that in the examples above, only the row count was aggregated. Frequently, you want to add additional aggregations (like min, max, sum and/or average). These aggregations must be added before generating the frequency table, and they will show up next to built-in aggregations. + +In this example, you can add a `sum` aggregation to the `total` column, and then group by the `color` column: + +- Navigate to the `total` column: [:keys]gShift+L[/] +- Add a `sum` aggregate [:keys]+[/] then enter [:keys]sum[/] +- Navigate to the `color` column +- Create the frequency table by pressing [:keys]Shift+F[/] + +## Quick Summary + +To quickly compare the frequency and aggregates of selected rows compared to the total dataset, use + +- {help.commands.freq_summary} + +## Exploring the data + +From the `FrequencySheet`, you can explore the underlying data for a single group using the following commands: + +- {help.commands.open_row} + +To explore multiple rows within the frequency table, select multiple rows, for example with [:keys]t[/] (stoggle-row) + +- {help.commands.dive_selected} + +In addition to seeing all of the individual rows, sometimes it is helpful to just see the first item from each group. Perhaps you want to see when something first occurred, or to see a representative sample of each group. + +- {help.commands.select_first} + +The rows are selected in the original sheet, so you must leave the frequency table and return to to the source sheet. One way is with: +- {help.commands.jump_prev} + + +## Sorting the data + +Like with other sheets, sort Frequency Table data by navigating to the desired column and using the sort keys: + +- {help.commands.sort_asc} +- {help.commands.sort_desc} + +## Using Split Panes with Frequency Tables + +There are a few functions that combine Split Panes (see the `SplitplanesGuide`) and Frequency Tables. + +If you have an open split [:keys]Shift+Z[/], and create a Frequency Table, it will automatically open in the other split. + +Similarly, if you are in a Frequency Table and open a new split, when you explore the group(s) with [:keys]Enter[/] or [:keys]gEnter[/], the detail view will open in the other pane. + From 8039ad9bd79d2a09f2e0b11b5ad98a16bdca0960 Mon Sep 17 00:00:00 2001 From: Jud Dagnall Date: Mon, 23 Sep 2024 15:06:10 -0700 Subject: [PATCH 2/8] rename count column in sample to avoid ambiguity - it's a bit confusing that the input sheet has `count` and the frequency table also has `count`, but they refer to different things. --- visidata/guides/FrequencyTable.md | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/visidata/guides/FrequencyTable.md b/visidata/guides/FrequencyTable.md index c925cf1d0..9b857df15 100644 --- a/visidata/guides/FrequencyTable.md +++ b/visidata/guides/FrequencyTable.md @@ -8,19 +8,19 @@ A VisiData frequency table groups data into bins using one or more columns, and - percent - (the percentage of the total rows in each group) - histogram - (a visual representation of the percentage) -Here's a tiny dataset of individual hat purchases. Each purchase is for between 1-4 hats. +Here's a tiny dataset of hat purchases. Each purchase may include multiple hats. ``` -+------------+-------+-------+-------+-------+ -| date | color | count | price | total | -+------------+-------+-------+-------+-------+ -| 2024-09-01 | Red | 1 | 30 | 30 | -| 2024-09-02 | Blue | 1 | 28 | 28 | -| 2024-09-02 | Green | 2 | 32 | 64 | -| 2024-09-03 | Red | 4 | 25 | 100 | -| 2024-09-03 | Blue | 1 | 33 | 33 | -| 2024-09-03 | Blue | 3 | 33 | 99 | -+------------+-------+-------+-------+-------+ ++------------+-------+----------+-------+-------+ +| year | color | hatCount | price | total | ++------------+-------+----------+-------+-------+ +| 2024-09-01 | Red | 1 | 30 | 30 | +| 2024-09-02 | Blue | 1 | 28 | 28 | +| 2024-09-02 | Green | 2 | 32 | 64 | +| 2024-09-03 | Red | 4 | 25 | 100 | +| 2024-09-03 | Blue | 1 | 33 | 33 | +| 2024-09-03 | Blue | 3 | 33 | 99 | ++------------+-------+----------+-------+-------+ ``` A Frequency Table for the `color` column looks like: From 606bc001ba7916ed712adfda4746e88e30b9b8dc Mon Sep 17 00:00:00 2001 From: Jud Dagnall Date: Mon, 23 Sep 2024 15:12:59 -0700 Subject: [PATCH 3/8] Fix column title typo --- visidata/guides/FrequencyTable.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/visidata/guides/FrequencyTable.md b/visidata/guides/FrequencyTable.md index 9b857df15..847a2f919 100644 --- a/visidata/guides/FrequencyTable.md +++ b/visidata/guides/FrequencyTable.md @@ -12,7 +12,7 @@ Here's a tiny dataset of hat purchases. Each purchase may include multiple hats. ``` +------------+-------+----------+-------+-------+ -| year | color | hatCount | price | total | +| date | color | hatCount | price | total | +------------+-------+----------+-------+-------+ | 2024-09-01 | Red | 1 | 30 | 30 | | 2024-09-02 | Blue | 1 | 28 | 28 | From 57098a751dd8341b0ee8e1987ad1db1ecbbab61f Mon Sep 17 00:00:00 2001 From: Jud Dagnall Date: Sun, 29 Sep 2024 17:38:48 -0700 Subject: [PATCH 4/8] Clean up the FrequencyTable guide - remove examples, as it is already quite long - reword many sections to simplify the language - trim and condense explanatory text --- visidata/guides/FrequencyTable.md | 135 +++++++++++++----------------- 1 file changed, 59 insertions(+), 76 deletions(-) diff --git a/visidata/guides/FrequencyTable.md b/visidata/guides/FrequencyTable.md index 847a2f919..e7ba59a21 100644 --- a/visidata/guides/FrequencyTable.md +++ b/visidata/guides/FrequencyTable.md @@ -1,117 +1,100 @@ # Frequency Tables are how you GROUP BY -## About Frequency Tables - -A VisiData frequency table groups data into bins using one or more columns, and creates basic summary aggregates of the data. The default aggregates are as follows. However, if user-defined aggregate columns are present, they will also be included. - -- count - (the count of rows in each group) -- percent - (the percentage of the total rows in each group) -- histogram - (a visual representation of the percentage) - -Here's a tiny dataset of hat purchases. Each purchase may include multiple hats. - -``` -+------------+-------+----------+-------+-------+ -| date | color | hatCount | price | total | -+------------+-------+----------+-------+-------+ -| 2024-09-01 | Red | 1 | 30 | 30 | -| 2024-09-02 | Blue | 1 | 28 | 28 | -| 2024-09-02 | Green | 2 | 32 | 64 | -| 2024-09-03 | Red | 4 | 25 | 100 | -| 2024-09-03 | Blue | 1 | 33 | 33 | -| 2024-09-03 | Blue | 3 | 33 | 99 | -+------------+-------+----------+-------+-------+ -``` - -A Frequency Table for the `color` column looks like: - -``` -+-------+-------+---------+----------------------------------------+ -| color | count | percent | histogram | -+-------+-------+---------+----------------------------------------+ -| Blue | 3 | 50 | ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ | -| Red | 2 | 33.33 | ■■■■■■■■■■■■■■■■■■■■■■■■■ | -| Green | 1 | 16.67 | ■■■■■■■■■■■■ | -+-------+-------+---------+----------------------------------------+ -``` - -Given data like this, use a frequency table to answer questions like: - -- How many purchases (rows) per day? -- What unique colors were sold? -- How many purchases (rows) per day and color? -- What color generated the most in total sales? -- What is the first purchase (row) for each day? +- Group data into bins using one or more columns. +- Count the number of items in each group. +- Also perform custom aggregations for each group. -## Group by a single column +- Similar to SQL: -To group by date, navigate to the `date` column and press {help.commands.freq_col} + SELECT column_name, COUNT(*) + FROM sheet_name + GROUP BY column_name + ORDER BY COUNT(*) DESC -Note that in addition to getting the counts, the frequency table also provides a list of unique items in the selected column. +## Group by a single column -For example, navigating to the `color` column and pressing [:keys]Shift+F[/] is a quick way to see the unique colors. +1. Navigate to the target column +- {help.commands.freq_col} ## Group by multiple columns -To group by multiple columns, the grouping columns must be set as key columns. - +1. Set one or more columns as key columns: - use {help.commands.key_col} -In this example, date and color could be set as key columns. +2. group by the key columns to create a frequency table: +- {help.commands.freq_keys} -After setting one or more key columns, group by the key columns to create a frequency table: +## Aggregators -- {help.commands.freq_keys} +Add aggregators to one or or more columns BEFORE creating a frequency table. +- Aggregators include min, max, sum, distinct count, and list. -This table can answer how many invoices were created per date and color. +1. Navigate to a column. +2. Add aggregators (like min, max, sum, list, and distinct count). +- {help.commands.aggregate-col} -## User-defined aggregates +3. Add more aggregators to the same or different columns. +4. Generate the Frequency Table. [:keys]Shift+F[/] or [:keys]gShift+F[/] -Note that in the examples above, only the row count was aggregated. Frequently, you want to add additional aggregations (like min, max, sum and/or average). These aggregations must be added before generating the frequency table, and they will show up next to built-in aggregations. +## Explore the frequency data -In this example, you can add a `sum` aggregation to the `total` column, and then group by the `color` column: +Dive into a group to see the underlying row(s) using the **Frequency Table**: -- Navigate to the `total` column: [:keys]gShift+L[/] -- Add a `sum` aggregate [:keys]+[/] then enter [:keys]sum[/] -- Navigate to the `color` column -- Create the frequency table by pressing [:keys]Shift+F[/] +1. Navigate to the target row. +- {help.commands.open_row} -## Quick Summary +Dive into multiple groups: -To quickly compare the frequency and aggregates of selected rows compared to the total dataset, use +- Select multiple rows, for example with [:keys]t[/] (stoggle-row) +- {help.commands.dive_selected} -- {help.commands.freq_summary} +Return to the frequency table: +- {help.commands.jump_prev} -## Exploring the data +Select the first row of each group, and then return to the source data. For example, identify a sample from each each group: -From the `FrequencySheet`, you can explore the underlying data for a single group using the following commands: +- (`select-first`) - select first source row in each bin +- {help.commands.jump_prev} -- {help.commands.open_row} +## Find Unique Values -To explore multiple rows within the frequency table, select multiple rows, for example with [:keys]t[/] (stoggle-row) +The bins of a frequeny table are the unique items. +- The total unique count appears in the bottom right of the window, for example "14 bins". -- {help.commands.dive_selected} +Copy the unique item list to a new sheet: -In addition to seeing all of the individual rows, sometimes it is helpful to just see the first item from each group. Perhaps you want to see when something first occurred, or to see a representative sample of each group. +1. Generate a frequency table (one or more columns). +2. Hide unwanted columns. [:keys]-[/] or [:keys]Shift+C[/] +3. Copy unique values to a new sheet +- {help.commands.freeze-sheet} -- {help.commands.select_first} +## Count only selected rows -The rows are selected in the original sheet, so you must leave the frequency table and return to to the source sheet. One way is with: -- {help.commands.jump_prev} +Create an ad-hoc frequeny table that compares the selected rows in the current columns to all rows: +- {help.commands.freq_summary} + +The frequency summary can use aggregators as well. -## Sorting the data +## Sort the table -Like with other sheets, sort Frequency Table data by navigating to the desired column and using the sort keys: +Navigate to the target column and sort: - {help.commands.sort_asc} - {help.commands.sort_desc} ## Using Split Panes with Frequency Tables -There are a few functions that combine Split Panes (see the `SplitplanesGuide`) and Frequency Tables. +Open a split [:keys]Shift+Z[/], and create a Frequency Table [:keys]Shift+F[/]. +- The table will automatically open in the other split. + +1. Open a Frequency Table +2. open a new split, when you explore the group(s) with [:keys]Enter[/] or [:keys]gEnter[/], the detail view will open in the other pane. + +See alo the `SplitplanesGuide`. -If you have an open split [:keys]Shift+Z[/], and create a Frequency Table, it will automatically open in the other split. +## Table Options -Similarly, if you are in a Frequency Table and open a new split, when you explore the group(s) with [:keys]Enter[/] or [:keys]gEnter[/], the detail view will open in the other pane. +- {help.options.disp_histogram} +- {help.options.disp_histlen} From 3d509e037dfb05f233dd49da2e5e377423ad40e3 Mon Sep 17 00:00:00 2001 From: Jud Dagnall Date: Sun, 29 Sep 2024 18:10:10 -0700 Subject: [PATCH 5/8] More frequency table cleanup and rewording --- visidata/guides/FrequencyTable.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/visidata/guides/FrequencyTable.md b/visidata/guides/FrequencyTable.md index e7ba59a21..7345f00f5 100644 --- a/visidata/guides/FrequencyTable.md +++ b/visidata/guides/FrequencyTable.md @@ -28,9 +28,10 @@ Add aggregators to one or or more columns BEFORE creating a frequency table. - Aggregators include min, max, sum, distinct count, and list. +- Set an appropriate column type for the aggregator target, for example float: [:key]%[/] 1. Navigate to a column. -2. Add aggregators (like min, max, sum, list, and distinct count). +2. Add an aggregator (like min, max, sum, list, and distinct count). - {help.commands.aggregate-col} 3. Add more aggregators to the same or different columns. @@ -51,7 +52,9 @@ Dive into multiple groups: Return to the frequency table: - {help.commands.jump_prev} -Select the first row of each group, and then return to the source data. For example, identify a sample from each each group: +Select the first row of each group: + +See the selections in the source sheet. For example, select a sample from each each group: - (`select-first`) - select first source row in each bin - {help.commands.jump_prev} @@ -72,10 +75,9 @@ Copy the unique item list to a new sheet: Create an ad-hoc frequeny table that compares the selected rows in the current columns to all rows: +- add aggregators if needed [:keys]+[/] - {help.commands.freq_summary} -The frequency summary can use aggregators as well. - ## Sort the table Navigate to the target column and sort: @@ -91,7 +93,7 @@ Open a split [:keys]Shift+Z[/], and create a Frequency Table [:keys]Shift+F[/]. 1. Open a Frequency Table 2. open a new split, when you explore the group(s) with [:keys]Enter[/] or [:keys]gEnter[/], the detail view will open in the other pane. -See alo the `SplitplanesGuide`. +Also see the `SplitplanesGuide`. ## Table Options From d0b79de5b0a07abf853354e1669e582cbc2105e9 Mon Sep 17 00:00:00 2001 From: Jud Dagnall Date: Sun, 29 Sep 2024 18:16:44 -0700 Subject: [PATCH 6/8] Correct the histogram bin option --- visidata/guides/FrequencyTable.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/visidata/guides/FrequencyTable.md b/visidata/guides/FrequencyTable.md index 7345f00f5..c634a7a62 100644 --- a/visidata/guides/FrequencyTable.md +++ b/visidata/guides/FrequencyTable.md @@ -98,5 +98,5 @@ Also see the `SplitplanesGuide`. ## Table Options - {help.options.disp_histogram} -- {help.options.disp_histlen} +- {help.options.histogram_bins} From 5a2cfcfe4e22c207d074006ba10a547ac13057b0 Mon Sep 17 00:00:00 2001 From: anjakefala Date: Thu, 3 Oct 2024 23:09:23 -0700 Subject: [PATCH 7/8] Clean up Frequency Table guide --- visidata/aggregators.py | 2 +- visidata/guides/FrequencyTable.md | 87 +++++-------------------------- 2 files changed, 13 insertions(+), 76 deletions(-) diff --git a/visidata/aggregators.py b/visidata/aggregators.py index 88b77c4ab..750c87336 100644 --- a/visidata/aggregators.py +++ b/visidata/aggregators.py @@ -273,7 +273,7 @@ def _fmt_aggr_summary(match, row, trigger_key): vd.warning(f'aggregator does not exist: {aggr}') return aggrs -Sheet.addCommand('+', 'aggregate-col', 'addAggregators([cursorCol], chooseAggregators())', 'Add aggregator to current column') +Sheet.addCommand('+', 'aggregate-col', 'addAggregators([cursorCol], chooseAggregators())', 'add aggregator to current column') Sheet.addCommand('z+', 'memo-aggregate', 'cursorCol.memo_aggregate(chooseAggregators(), selectedRows or rows)', 'memo result of aggregator over values in selected rows for current column') ColumnsSheet.addCommand('g+', 'aggregate-cols', 'addAggregators(selectedRows or source[0].nonKeyVisibleCols, chooseAggregators())', 'add aggregators to selected source columns') diff --git a/visidata/guides/FrequencyTable.md b/visidata/guides/FrequencyTable.md index c634a7a62..b82c7e4a0 100644 --- a/visidata/guides/FrequencyTable.md +++ b/visidata/guides/FrequencyTable.md @@ -1,102 +1,39 @@ # Frequency Tables are how you GROUP BY -- Group data into bins using one or more columns. -- Count the number of items in each group. -- Also perform custom aggregations for each group. +Frequency Tables group rows into bins by column value, and includes summary columns for source columns with aggregators. -- Similar to SQL: - - SELECT column_name, COUNT(*) - FROM sheet_name - GROUP BY column_name - ORDER BY COUNT(*) DESC - -## Group by a single column - -1. Navigate to the target column - {help.commands.freq_col} -## Group by multiple columns - -1. Set one or more columns as key columns: -- use {help.commands.key_col} - -2. group by the key columns to create a frequency table: - {help.commands.freq_keys} +- {help.commands.freq_summary} + ## Aggregators -Add aggregators to one or or more columns BEFORE creating a frequency table. -- Aggregators include min, max, sum, distinct count, and list. -- Set an appropriate column type for the aggregator target, for example float: [:key]%[/] +A **Frequency Table** contains a summary columns for each aggregator added to a source column. +These aggregators need to be added before creating the Frequency Table. +Examples of aggregators include min, max, sum, distinct, count, and list. -1. Navigate to a column. -2. Add an aggregator (like min, max, sum, list, and distinct count). - {help.commands.aggregate-col} -3. Add more aggregators to the same or different columns. -4. Generate the Frequency Table. [:keys]Shift+F[/] or [:keys]gShift+F[/] +Note: set an appropriate type for the aggregator target column, for example {help.commands.type_float}. -## Explore the frequency data +## Explore the data Dive into a group to see the underlying row(s) using the **Frequency Table**: -1. Navigate to the target row. - {help.commands.open_row} - -Dive into multiple groups: - -- Select multiple rows, for example with [:keys]t[/] (stoggle-row) - {help.commands.dive_selected} -Return to the frequency table: -- {help.commands.jump_prev} - -Select the first row of each group: - -See the selections in the source sheet. For example, select a sample from each each group: - -- (`select-first`) - select first source row in each bin -- {help.commands.jump_prev} - -## Find Unique Values - -The bins of a frequeny table are the unique items. -- The total unique count appears in the bottom right of the window, for example "14 bins". - -Copy the unique item list to a new sheet: - -1. Generate a frequency table (one or more columns). -2. Hide unwanted columns. [:keys]-[/] or [:keys]Shift+C[/] -3. Copy unique values to a new sheet -- {help.commands.freeze-sheet} - -## Count only selected rows - -Create an ad-hoc frequeny table that compares the selected rows in the current columns to all rows: - -- add aggregators if needed [:keys]+[/] -- {help.commands.freq_summary} - -## Sort the table - -Navigate to the target column and sort: - -- {help.commands.sort_asc} -- {help.commands.sort_desc} +Select a group to select all of its underlying rows in the source sheet. ## Using Split Panes with Frequency Tables -Open a split [:keys]Shift+Z[/], and create a Frequency Table [:keys]Shift+F[/]. -- The table will automatically open in the other split. - -1. Open a Frequency Table -2. open a new split, when you explore the group(s) with [:keys]Enter[/] or [:keys]gEnter[/], the detail view will open in the other pane. +Press `Shift+Z` to open a split pane, and then `Shift+F` to create a **Frequency Table**. The **Frequency Table** will automatically open in the other pane. -Also see the `SplitplanesGuide`. +See the `SplitpanesGuide` for more. -## Table Options +#### Options - {help.options.disp_histogram} - {help.options.histogram_bins} - From 11ab614dcceff69be48976693e912a6896b80cf8 Mon Sep 17 00:00:00 2001 From: anjakefala Date: Sun, 6 Oct 2024 18:05:09 -0700 Subject: [PATCH 8/8] Add numeric binning --- visidata/guides/FrequencyTable.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/visidata/guides/FrequencyTable.md b/visidata/guides/FrequencyTable.md index b82c7e4a0..4b321833b 100644 --- a/visidata/guides/FrequencyTable.md +++ b/visidata/guides/FrequencyTable.md @@ -2,6 +2,8 @@ Frequency Tables group rows into bins by column value, and includes summary columns for source columns with aggregators. +Set `--numeric-binning` to bin numeric rows into ranges instead of discrete values. + - {help.commands.freq_col} - {help.commands.freq_keys} @@ -37,3 +39,4 @@ See the `SplitpanesGuide` for more. - {help.options.disp_histogram} - {help.options.histogram_bins} +- {help.options.numeric_binning}