Update developer documentation by @riley-harper in #204
Refactor column mapping transforms by @riley-harper in #207
Document 5 column mapping transforms by @riley-harper in #212
Add github workflow to build and publish sphinx docs to github pages by @joegrover in #210
Don't create a history file on startup by @riley-harper in #215
Update pyproject to switch to new license spec format by @riley-harper in #216
Support custom column mapping transforms by @riley-harper in #213
Simplify handling of the deprecated training.param_grid attribute by @riley-harper in #217
Clean up Sphinx docs workflow by @riley-harper in #218
Remove notes about XGBoost being unstable by @riley-harper in #219
Bump the version to 4.2.0 by @riley-harper in #220

Full Changelog: v4.1.0...v4.2.0

Contributors

joegrover and riley-harper

Assets 2

15 Apr 21:14

riley-harper

v4.1.0

07ad22b

v4.1.0

What's Changed

Require setuptools >= 71 by @riley-harper in #198
Remove restriction of scikit-learn < 1.6 for xgboost optional feature by @riley-harper in #196
Fix threshold ratio bug by @riley-harper in #200
Allow rematching in households by @riley-harper in #201
Save hh training metadata as step 3 of hh_training by @joegrover in #202
Updated the project version in pyproject.toml to 4.1.0 by @joegrover in #203

New Contributors

@joegrover made their first contribution in #202

Full Changelog: v4.0.0...v4.1.0

Contributors

joegrover and riley-harper

Assets 2

07 Apr 17:08

riley-harper

v4.0.0

5e4e93c

v4.0.0

Overview

This version of hlink contains a large update to the model exploration task, several bug fixes, and a few breaking changes. For a curated list of changes, check out the changelog at https://hlink.docs.ipums.org/changelog.html.

What's Changed

Refactor nested cross validation by @ccdavis in #169
Add Randomized Parameter Search by @riley-harper in #168
Update linking.core.classifier and linking.core.threshold by @riley-harper in #175
Model exploration metrics by @ccdavis in #177
Remove "suspicious data" functionality from model exploration by @riley-harper in #178
Add the F-measure model metric, restructure for clarity by @riley-harper in #180
Allow setting the checkpoint directory through SparkConnection by @riley-harper in #182
Remove deprecated code for version 4 by @riley-harper in #184
Use tomli instead of the toml package by default by @riley-harper in #185
Fix a bug where model_metrics.mcc() < -1.0 by @riley-harper in #188
Create a changelog file by @riley-harper in #189
Add docs for Model Exploration by @riley-harper in #190
Update docs for training.param_grid by @riley-harper in #191
Version 4.0.0 by @riley-harper in #186
Bump the version to 4.0.0 by @riley-harper in #192

New Contributors

@ccdavis made their first contribution in #169

Full Changelog: v3.8.0...v4.0.0

Contributors

ccdavis and riley-harper

Assets 2

10 Mar 14:33

riley-harper

v4.0.0b1

27a07c9

v4.0.0b1 Pre-release

Pre-release

Overview

This is the first beta release for version 4. We do not expect to be doing any more feature work or breaking changes for version 4 after this release, so if all goes well, the interface should be pretty stable now. Like the alpha release, this is a pre-release, and so pip should not install it unless you specifically request it. Listed below are the changes from 4.0.0a1 to 4.0.0b1, which include a few small breaking changes.

There is now a user-facing changelog for hlink which is more carefully curated than these auto-generated release notes! You can see the changelog, which has a preview of v4.0.0, here.

What's Changed

Remove deprecated code for version 4 by @riley-harper in #184
Use tomli instead of the toml package by default by @riley-harper in #185
Fix a bug where model_metrics.mcc() < -1.0 by @riley-harper in #188
Create a changelog file by @riley-harper in #189
Add docs for Model Exploration by @riley-harper in #190
Update docs for training.param_grid by @riley-harper in #191

Full Changelog: v4.0.0a1...v4.0.0b1

Contributors

riley-harper

Assets 2

13 Dec 22:10

riley-harper

v4.0.0a1

8bfe87e

v4.0.0a1 Pre-release

Pre-release

Version 4.0.0 Alpha 1

This pre-release has upcoming changes for version 4 of hlink. Since this includes breaking changes and an overhaul of the model exploration task, we'd like to test it out a bit before creating a full release. Part of the work yet to be done is documentation and code cleanup. The documentation for these changes and new features is lacking so far. Here is a preview of the version 4 highlights (so far!):

Completely overhauled the model exploration task, switching to a nested cross-validation algorithm.
Added support for a third strategy for generating models to test in model exploration. Along with "explicit" (take exactly what's in training.model_parameters) and grid search, there is now randomized search. Randomized search takes a certain number of samples from a distribution defined in training.model_parameters.
Added the F-measure metric to the model exploration output, and simplified the output so that it always has the same columns.
Removed the training.output_suspicious_TD configuration option because it was rarely used and presented code and performance issues. Removing output_suspicious_TD makes the model exploration code more maintainable and helps it run more quickly.
Disentangled two core modules (classifier and pipeline) from the configuration format by changing the arguments to a couple of functions. This should help separate those concerns more neatly and make changes to the configuration easier if we end up doing that in the future.
Changed SparkConnection to require a checkpoint_dir argument, which fixes a bug related to Spark configuration.

Assets 2

04 Dec 20:36

riley-harper

v3.8.0

85a1818

v3.8.0

What's Changed

Added optional support for two new gradient boosting ML libraries: XGBoost and LightGBM. You can read more about these libraries and how to install them with their dependencies in the docs here. PR #165
Added a new hlink.linking.transformers.RenameVectorAttributes transformer which can rename the attributes or "slots" of Spark vector columns. Hlink uses this to support LightGBM, which disallows certain characters in its feature names. PR #165
Documented comparisons, which are not the same as comparison features. Previously the documentation was misleading and seemed to indicate that these were the same thing. PR #159
Fixed a bug in the substitution file documentation. The documentation had the meaning of the substitution file columns flip-flopped, which was confusing. PR #166

Developer-Facing Changes

Updated Sphinx to 8.1.3 and fixed two Sphinx build warnings. PR #159
Updated CI/CD to automatically run only on PRs and on pushes to main. You can also now manually trigger a CI/CD run from the Actions tab in GitHub. Also removed the custom "quickcheck" pytest marker in favor of using pytest -k and removed flake8 from CI/CD because it kept causing more trouble than it was worth. PR #164

Full Changelog: v3.7.0...v3.8.0

Assets 2

10 Oct 17:16

riley-harper

v3.7.0

c1713e5

v3.7.0

What's Changed

Add tests to cover several untested sections of code by @riley-harper in #147
Refactor core.transforms.generate_transforms() for readability and maintainability; improve documentation and type hints by @riley-harper in #148
Fix tests for Python 3.12 and clarify Python 3.12 support and dependence on PySpark by @riley-harper in #151
Improve logging by writing to module-level loggers instead of the root logger by @riley-harper in #152
Support setting the app name via an optional argument in SparkConnection. The default behavior of setting the app name to "linking" is unchanged. By @riley-harper in #156
Improve model_exploration step 2 terminal output, logging, and documentation to make the step more understandable by @riley-harper in #155

Full Changelog: v3.6.1...v3.7.0

Contributors

riley-harper

Assets 2

14 Aug 21:26

riley-harper

v3.6.1

54d4820

v3.6.1

What's Changed

Support blocking sections with multiple exploded columns by @riley-harper in #143. This fixes a bug that caused a crash in Matching step 0 - explode.

Full Changelog: v3.6.0...v3.6.1

Contributors

riley-harper

Assets 2

Releases: ipums/hlink

v4.2.2

What's Changed

Contributors

Uh oh!

v4.2.1

What's Changed

Contributors

Uh oh!

v4.2.0

What's Changed

Contributors

Uh oh!

v4.1.0

What's Changed

New Contributors

Contributors

Uh oh!

v4.0.0

Overview

What's Changed

New Contributors

Contributors

Uh oh!

v4.0.0b1

Overview

What's Changed

Contributors

Uh oh!

v4.0.0a1

Version 4.0.0 Alpha 1

Uh oh!

v3.8.0

What's Changed

Developer-Facing Changes

Uh oh!

v3.7.0

What's Changed

Contributors

Uh oh!

v3.6.1

What's Changed

Contributors

Uh oh!