Skip to content

Commit f569256

Browse files
committed
[#137] Add documentation for the new blocking.or_group feature
1 parent 1383518 commit f569256

File tree

4 files changed

+36
-1
lines changed

4 files changed

+36
-1
lines changed

docs/_sources/config.md.txt

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -568,6 +568,18 @@ expression = "sex == 1"
568568
* `dataset` -- Type: `string`. Optional. Must be `a` or `b` and used in conjuction with `explode`. Will only explode the column from the `a` or `b` dataset when specified.
569569
* `derived_from` -- Type: `string`. Used in conjunction with `explode = true`. Specifies an input column from the existing dataset to be exploded.
570570
* `expand_length` -- Type: `integer`. When `explode` is used on a column that is an integer, this can be specified to create an array with a range of integer values from (`expand_length` minus `original_value`) to (`expand_length` plus `original_value`). For example, if the input column value for birthyr is 1870, explode is true, and the expand_length is 3, the exploded column birthyr_3 value would be the array [1867, 1868, 1869, 1870, 1871, 1872, 1873].
571+
* `or_group` -- Type: `string`. Optional. The "OR group" to which this
572+
blocking table belongs. Blocking tables that belong to the same OR group
573+
are joined by OR in the blocking condition instead of AND. By default each
574+
blocking table belongs to a different OR group. For example, suppose that
575+
your dataset has 3 possible birthplaces BPL1, BPL2, and BPL3 for each
576+
record. If you don't provide OR groups when blocking on each BPL variable,
577+
then you will get a blocking condition like `(a.BPL1 = b.BPL1) AND (a.BPL2
578+
= b.BPL2) AND (a.BPL3 = b.BPL3)`. But if you set `or_group = "BPL"` for
579+
each of the 3 variables, then you will get a blocking condition like this
580+
instead: `(a.BPL1 = b.BPL1 OR a.BPL2 = b.BPL2 OR a.BPL3 = b.BPL3)`. Note
581+
the parentheses around the entire OR group condition. Other OR groups would
582+
be connected to the BPL OR group with an AND condition.
571583

572584

573585
```

docs/config.html

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -619,6 +619,17 @@ <h2>Blocking<a class="headerlink" href="#blocking" title="Link to this heading">
619619
<li><p><code class="docutils literal notranslate"><span class="pre">dataset</span></code> – Type: <code class="docutils literal notranslate"><span class="pre">string</span></code>. Optional. Must be <code class="docutils literal notranslate"><span class="pre">a</span></code> or <code class="docutils literal notranslate"><span class="pre">b</span></code> and used in conjuction with <code class="docutils literal notranslate"><span class="pre">explode</span></code>. Will only explode the column from the <code class="docutils literal notranslate"><span class="pre">a</span></code> or <code class="docutils literal notranslate"><span class="pre">b</span></code> dataset when specified.</p></li>
620620
<li><p><code class="docutils literal notranslate"><span class="pre">derived_from</span></code> – Type: <code class="docutils literal notranslate"><span class="pre">string</span></code>. Used in conjunction with <code class="docutils literal notranslate"><span class="pre">explode</span> <span class="pre">=</span> <span class="pre">true</span></code>. Specifies an input column from the existing dataset to be exploded.</p></li>
621621
<li><p><code class="docutils literal notranslate"><span class="pre">expand_length</span></code> – Type: <code class="docutils literal notranslate"><span class="pre">integer</span></code>. When <code class="docutils literal notranslate"><span class="pre">explode</span></code> is used on a column that is an integer, this can be specified to create an array with a range of integer values from (<code class="docutils literal notranslate"><span class="pre">expand_length</span></code> minus <code class="docutils literal notranslate"><span class="pre">original_value</span></code>) to (<code class="docutils literal notranslate"><span class="pre">expand_length</span></code> plus <code class="docutils literal notranslate"><span class="pre">original_value</span></code>). For example, if the input column value for birthyr is 1870, explode is true, and the expand_length is 3, the exploded column birthyr_3 value would be the array [1867, 1868, 1869, 1870, 1871, 1872, 1873].</p></li>
622+
<li><p><code class="docutils literal notranslate"><span class="pre">or_group</span></code> – Type: <code class="docutils literal notranslate"><span class="pre">string</span></code>. Optional. The “OR group” to which this
623+
blocking table belongs. Blocking tables that belong to the same OR group
624+
are joined by OR in the blocking condition instead of AND. By default each
625+
blocking table belongs to a different OR group. For example, suppose that
626+
your dataset has 3 possible birthplaces BPL1, BPL2, and BPL3 for each
627+
record. If you don’t provide OR groups when blocking on each BPL variable,
628+
then you will get a blocking condition like <code class="docutils literal notranslate"><span class="pre">(a.BPL1</span> <span class="pre">=</span> <span class="pre">b.BPL1)</span> <span class="pre">AND</span> <span class="pre">(a.BPL2</span> <span class="pre">=</span> <span class="pre">b.BPL2)</span> <span class="pre">AND</span> <span class="pre">(a.BPL3</span> <span class="pre">=</span> <span class="pre">b.BPL3)</span></code>. But if you set <code class="docutils literal notranslate"><span class="pre">or_group</span> <span class="pre">=</span> <span class="pre">&quot;BPL&quot;</span></code> for
629+
each of the 3 variables, then you will get a blocking condition like this
630+
instead: <code class="docutils literal notranslate"><span class="pre">(a.BPL1</span> <span class="pre">=</span> <span class="pre">b.BPL1</span> <span class="pre">OR</span> <span class="pre">a.BPL2</span> <span class="pre">=</span> <span class="pre">b.BPL2</span> <span class="pre">OR</span> <span class="pre">a.BPL3</span> <span class="pre">=</span> <span class="pre">b.BPL3)</span></code>. Note
631+
the parentheses around the entire OR group condition. Other OR groups would
632+
be connected to the BPL OR group with an AND condition.</p></li>
622633
</ul>
623634
</li>
624635
</ul>

docs/searchindex.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

sphinx-docs/config.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -568,6 +568,18 @@ expression = "sex == 1"
568568
* `dataset` -- Type: `string`. Optional. Must be `a` or `b` and used in conjuction with `explode`. Will only explode the column from the `a` or `b` dataset when specified.
569569
* `derived_from` -- Type: `string`. Used in conjunction with `explode = true`. Specifies an input column from the existing dataset to be exploded.
570570
* `expand_length` -- Type: `integer`. When `explode` is used on a column that is an integer, this can be specified to create an array with a range of integer values from (`expand_length` minus `original_value`) to (`expand_length` plus `original_value`). For example, if the input column value for birthyr is 1870, explode is true, and the expand_length is 3, the exploded column birthyr_3 value would be the array [1867, 1868, 1869, 1870, 1871, 1872, 1873].
571+
* `or_group` -- Type: `string`. Optional. The "OR group" to which this
572+
blocking table belongs. Blocking tables that belong to the same OR group
573+
are joined by OR in the blocking condition instead of AND. By default each
574+
blocking table belongs to a different OR group. For example, suppose that
575+
your dataset has 3 possible birthplaces BPL1, BPL2, and BPL3 for each
576+
record. If you don't provide OR groups when blocking on each BPL variable,
577+
then you will get a blocking condition like `(a.BPL1 = b.BPL1) AND (a.BPL2
578+
= b.BPL2) AND (a.BPL3 = b.BPL3)`. But if you set `or_group = "BPL"` for
579+
each of the 3 variables, then you will get a blocking condition like this
580+
instead: `(a.BPL1 = b.BPL1 OR a.BPL2 = b.BPL2 OR a.BPL3 = b.BPL3)`. Note
581+
the parentheses around the entire OR group condition. Other OR groups would
582+
be connected to the BPL OR group with an AND condition.
571583

572584

573585
```

0 commit comments

Comments
 (0)