Version 4.0.0 Alpha 1

This pre-release has upcoming changes for version 4 of hlink. Since this includes breaking changes and an overhaul of the model exploration task, we'd like to test it out a bit before creating a full release. Part of the work yet to be done is documentation and code cleanup. The documentation for these changes and new features is lacking so far. Here is a preview of the version 4 highlights (so far!):

Completely overhauled the model exploration task, switching to a nested cross-validation algorithm.
Added support for a third strategy for generating models to test in model exploration. Along with "explicit" (take exactly what's in training.model_parameters) and grid search, there is now randomized search. Randomized search takes a certain number of samples from a distribution defined in training.model_parameters.
Added the F-measure metric to the model exploration output, and simplified the output so that it always has the same columns.
Removed the training.output_suspicious_TD configuration option because it was rarely used and presented code and performance issues. Removing output_suspicious_TD makes the model exploration code more maintainable and helps it run more quickly.
Disentangled two core modules (classifier and pipeline) from the configuration format by changing the arguments to a couple of functions. This should help separate those concerns more neatly and make changes to the configuration easier if we end up doing that in the future.
Changed SparkConnection to require a checkpoint_dir argument, which fixes a bug related to Spark configuration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v4.0.0a1

Version 4.0.0 Alpha 1