Skip to content

v4.0.0a1

Pre-release
Pre-release
Compare
Choose a tag to compare
@riley-harper riley-harper released this 13 Dec 22:10
· 0 commits to main since this release

Version 4.0.0 Alpha 1

This pre-release has upcoming changes for version 4 of hlink. Since this includes breaking changes and an overhaul of the model exploration task, we'd like to test it out a bit before creating a full release. Part of the work yet to be done is documentation and code cleanup. The documentation for these changes and new features is lacking so far. Here is a preview of the version 4 highlights (so far!):

  • Completely overhauled the model exploration task, switching to a nested cross-validation algorithm.
  • Added support for a third strategy for generating models to test in model exploration. Along with "explicit" (take exactly what's in training.model_parameters) and grid search, there is now randomized search. Randomized search takes a certain number of samples from a distribution defined in training.model_parameters.
  • Added the F-measure metric to the model exploration output, and simplified the output so that it always has the same columns.
  • Removed the training.output_suspicious_TD configuration option because it was rarely used and presented code and performance issues. Removing output_suspicious_TD makes the model exploration code more maintainable and helps it run more quickly.
  • Disentangled two core modules (classifier and pipeline) from the configuration format by changing the arguments to a couple of functions. This should help separate those concerns more neatly and make changes to the configuration easier if we end up doing that in the future.
  • Changed SparkConnection to require a checkpoint_dir argument, which fixes a bug related to Spark configuration.