[#158] Rename the "Comparison Types" page to "Comparison Features"

ipums · Oct 30, 2024 · 63a49d8 · 63a49d8
1 parent 319c60b
commit 63a49d8
Show file tree

Hide file tree

Showing 4 changed files with 38 additions and 26 deletions.
diff --git a/sphinx-docs/comparison_types.md → sphinx-docs/comparison_features.md b/sphinx-docs/comparison_types.md → sphinx-docs/comparison_features.md
@@ -1,30 +1,42 @@
-# Comparison types, transform add-ons, aggregate features, and household aggregate features
+# Comparison Features
 
-This page has information on the different comparison types available for the `[[comparison_features]]`
-section, along with some attributes available to all of the comparison types and some aggregate features
-that are not configurable.
+During matching, hlink computes comparison features on each record pair which
+it considers a potential match. These comparison features can be passed as
+features to machine-learning algorithms or used to define
+[comparisons](comparisons) which filter the `potential_matches` table.
 
-## Comparison types
-Each header below represents a comparison type.  Transforms are used in the context of `comparison_features`.
+Each comparison feature must have a comparison type, which tells hlink how to
+compute the comparison feature. This page has information on the available
+comparison types and how to configure them. It also lists some attributes
+available to all comparison types and some predefined aggregate features which
+do not need to be explicitly configured.
 
-```
-[[comparison_features]]
-alias = "relatematch"
-column_name = "relate_div_100"
-comparison_type = "equals"
-categorical = true
-```
+## Comparison Types
+
+Each section below describes a comparison type. Each type represents a
+different operation, computation, or transformation that hlink can perform on
+one or more input columns. Some comparison types expect their own attributes
+for additional configuration. These attributes are listed in each section,
+along with an example.
 
 ### maximum_jaro_winkler
-Finds the greatest Jaro-Winkler value among the cartesian product of multiple columns.  For example, given an input of `column_names = ['namefrst', 'namelast']`, it would return the maximum Jaro-Winkler name comparison value among the following four comparisons: 
+
+Finds the greatest Jaro-Winkler value among the cartesian product of multiple
+columns.  For example, given an input of `column_names = ['namefrst',
+'namelast']`, it would return the maximum Jaro-Winkler name comparison value
+among the following four comparisons: 
+
 ```
-[('namefrst_a', 'namefrst_b'),
- ('namefrst_a', 'namelast_b'),
- ('namelast_a', 'namefrst_b'),
- ('namelast_a', 'namelast_b')]
- ```
+a.namefrst, b.namefrst
+a.namefrst, b.namelast
+a.namelast, b.namefrst
+a.namelast, b.namelast
+```
+
 * Attributes:
-  * `column_names` -- Type: list of strings.  Required.  The list of columns used as input for the set of comparisons generated by taking the cartesian product.
+  * `column_names` -- Type: list of strings.  Required.  The list of columns
+    used as input for the set of comparisons, which are generated by taking the
+    Cartesian product of the set of input columns with itself.
 
  ```
 [[comparison_features]]

diff --git a/sphinx-docs/config.md b/sphinx-docs/config.md
@@ -671,18 +671,18 @@ feature_name = "byrdiff"
 threshold_expr = "<= 10"
 ```
 
-## [Comparison Features](comparison_types)
+## [Comparison Features](comparison_features)
 
 * Header name: `comparison_features`
-* Description: A list of comparison features to create when comparing records. Comparisons for individual and household linking rounds are both represented here -- no need to duplicate comparisons if used in both rounds, simply specify the `column_name` in the appropriate `training` or `hh_training` section of the config.  See the [comparison types](comparison_types) section for more information.
+* Description: A list of comparison features to create when comparing records. Comparisons for individual and household linking rounds are both represented here -- no need to duplicate comparisons if used in both rounds, simply specify the `column_name` in the appropriate `training` or `hh_training` section of the config.  See the [comparison features documentation page](comparison_features) for more information.
 * Required: True
 * Type: List
 * Attributes:
   * `alias` -- Type: `string`. Optional. The name of the comparison feature column to be generated.  If not specified, the output column will default to `column_name`.
   * `column_name` -- Type: `string`. The name of the columns to compare.
-  * `comparison_type` -- Type: `string`. The name of the comparison type to use. See the [comparison types](comparison_types) section for more information.
+  * `comparison_type` -- Type: `string`. The name of the comparison type to use.
   * `categorical` -- Type: `boolean`.  Optional.  Whether the output data should be treated as categorical data (important information used during one-hot encoding and vectorizing in the machine learning pipeline stage).
-  * Other attributes may be included as well depending on `comparison_type`.  See the [comparison types](comparison_types) section for details on each comparison type.
+  * Other attributes may be included as well depending on `comparison_type`.  See the [comparison features page](comparison_features) for details on each comparison type.
 
 ```
 [[comparison_features]]

diff --git a/sphinx-docs/index.rst b/sphinx-docs/index.rst
@@ -25,7 +25,7 @@ Configuration API
 
    Column Mappings <column_mappings.md>
    comparisons
-   Comparison Types <comparison_types.md>
+   Comparison Features <comparison_features.md>
    Feature Selection <feature_selection_transforms.md>
    Pipeline Features <pipeline_features.md>
    substitutions

diff --git a/sphinx-docs/link_tasks.md b/sphinx-docs/link_tasks.md
@@ -84,7 +84,7 @@ are grouped into the same blocking bucket.
 on each record. These features may be passed to a machine learning model through the
 [`training`](config.html#training-and-models) section and/or passed to deterministic
 rules with the [`comparisons`](config.html#comparisons) section. There are many
-different [comparison types](comparison_types) available for use with
+different [comparison types](comparison_features) available for use with
 `comparison_features`.
 * [`pipeline_features`](pipeline_features.html#pipeline-generated-features) are machine learning transformations
 useful for reshaping and interacting data before they are fed to the machine learning