v4.0.0a1
Pre-release
Pre-release
Version 4.0.0 Alpha 1
This pre-release has upcoming changes for version 4 of hlink. Since this includes breaking changes and an overhaul of the model exploration task, we'd like to test it out a bit before creating a full release. Part of the work yet to be done is documentation and code cleanup. The documentation for these changes and new features is lacking so far. Here is a preview of the version 4 highlights (so far!):
- Completely overhauled the model exploration task, switching to a nested cross-validation algorithm.
- Added support for a third strategy for generating models to test in model exploration. Along with "explicit" (take exactly what's in
training.model_parameters
) and grid search, there is now randomized search. Randomized search takes a certain number of samples from a distribution defined intraining.model_parameters
. - Added the F-measure metric to the model exploration output, and simplified the output so that it always has the same columns.
- Removed the
training.output_suspicious_TD
configuration option because it was rarely used and presented code and performance issues. Removingoutput_suspicious_TD
makes the model exploration code more maintainable and helps it run more quickly. - Disentangled two core modules (
classifier
andpipeline
) from the configuration format by changing the arguments to a couple of functions. This should help separate those concerns more neatly and make changes to the configuration easier if we end up doing that in the future. - Changed
SparkConnection
to require acheckpoint_dir
argument, which fixes a bug related to Spark configuration.