Skip to content

Releases: ipums/hlink

v4.2.1

19 Aug 14:38
c378436
Compare
Choose a tag to compare

What's Changed

Full Changelog: v4.2.0...v4.2.1

v4.2.0

30 Apr 15:50
423e231
Compare
Choose a tag to compare

What's Changed

Full Changelog: v4.1.0...v4.2.0

v4.1.0

15 Apr 21:14
07ad22b
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v4.0.0...v4.1.0

v4.0.0

07 Apr 17:08
5e4e93c
Compare
Choose a tag to compare

Overview

This version of hlink contains a large update to the model exploration task, several bug fixes, and a few breaking changes. For a curated list of changes, check out the changelog at https://hlink.docs.ipums.org/changelog.html.

What's Changed

New Contributors

Full Changelog: v3.8.0...v4.0.0

v4.0.0b1

10 Mar 14:33
Compare
Choose a tag to compare
v4.0.0b1 Pre-release
Pre-release

Overview

This is the first beta release for version 4. We do not expect to be doing any more feature work or breaking changes for version 4 after this release, so if all goes well, the interface should be pretty stable now. Like the alpha release, this is a pre-release, and so pip should not install it unless you specifically request it. Listed below are the changes from 4.0.0a1 to 4.0.0b1, which include a few small breaking changes.

There is now a user-facing changelog for hlink which is more carefully curated than these auto-generated release notes! You can see the changelog, which has a preview of v4.0.0, here.

What's Changed

Full Changelog: v4.0.0a1...v4.0.0b1

v4.0.0a1

13 Dec 22:10
Compare
Choose a tag to compare
v4.0.0a1 Pre-release
Pre-release

Version 4.0.0 Alpha 1

This pre-release has upcoming changes for version 4 of hlink. Since this includes breaking changes and an overhaul of the model exploration task, we'd like to test it out a bit before creating a full release. Part of the work yet to be done is documentation and code cleanup. The documentation for these changes and new features is lacking so far. Here is a preview of the version 4 highlights (so far!):

  • Completely overhauled the model exploration task, switching to a nested cross-validation algorithm.
  • Added support for a third strategy for generating models to test in model exploration. Along with "explicit" (take exactly what's in training.model_parameters) and grid search, there is now randomized search. Randomized search takes a certain number of samples from a distribution defined in training.model_parameters.
  • Added the F-measure metric to the model exploration output, and simplified the output so that it always has the same columns.
  • Removed the training.output_suspicious_TD configuration option because it was rarely used and presented code and performance issues. Removing output_suspicious_TD makes the model exploration code more maintainable and helps it run more quickly.
  • Disentangled two core modules (classifier and pipeline) from the configuration format by changing the arguments to a couple of functions. This should help separate those concerns more neatly and make changes to the configuration easier if we end up doing that in the future.
  • Changed SparkConnection to require a checkpoint_dir argument, which fixes a bug related to Spark configuration.

v3.8.0

04 Dec 20:36
85a1818
Compare
Choose a tag to compare

What's Changed

  • Added optional support for two new gradient boosting ML libraries: XGBoost and LightGBM. You can read more about these libraries and how to install them with their dependencies in the docs here. PR #165
  • Added a new hlink.linking.transformers.RenameVectorAttributes transformer which can rename the attributes or "slots" of Spark vector columns. Hlink uses this to support LightGBM, which disallows certain characters in its feature names. PR #165
  • Documented comparisons, which are not the same as comparison features. Previously the documentation was misleading and seemed to indicate that these were the same thing. PR #159
  • Fixed a bug in the substitution file documentation. The documentation had the meaning of the substitution file columns flip-flopped, which was confusing. PR #166

Developer-Facing Changes

  • Updated Sphinx to 8.1.3 and fixed two Sphinx build warnings. PR #159
  • Updated CI/CD to automatically run only on PRs and on pushes to main. You can also now manually trigger a CI/CD run from the Actions tab in GitHub. Also removed the custom "quickcheck" pytest marker in favor of using pytest -k and removed flake8 from CI/CD because it kept causing more trouble than it was worth. PR #164

Full Changelog: v3.7.0...v3.8.0

v3.7.0

10 Oct 17:16
c1713e5
Compare
Choose a tag to compare

What's Changed

  • Add tests to cover several untested sections of code by @riley-harper in #147
  • Refactor core.transforms.generate_transforms() for readability and maintainability; improve documentation and type hints by @riley-harper in #148
  • Fix tests for Python 3.12 and clarify Python 3.12 support and dependence on PySpark by @riley-harper in #151
  • Improve logging by writing to module-level loggers instead of the root logger by @riley-harper in #152
  • Support setting the app name via an optional argument in SparkConnection. The default behavior of setting the app name to "linking" is unchanged. By @riley-harper in #156
  • Improve model_exploration step 2 terminal output, logging, and documentation to make the step more understandable by @riley-harper in #155

Full Changelog: v3.6.1...v3.7.0

v3.6.1

14 Aug 21:26
54d4820
Compare
Choose a tag to compare

What's Changed

  • Support blocking sections with multiple exploded columns by @riley-harper in #143. This fixes a bug that caused a crash in Matching step 0 - explode.

Full Changelog: v3.6.0...v3.6.1

v3.6.0

18 Jun 20:12
94f0e8b
Compare
Choose a tag to compare

What's Changed

  • Support OR conditions in blocking by @riley-harper in #138. This new feature supports connecting some or all blocking conditions together with ORs instead of with ANDs. You can read more documentation about it under the "or_group" bullet point here.
  • Unskip several skipped tests by @riley-harper in #139. This is a development change that should not affect users.

Full Changelog: v3.5.5...v3.6.0