Adjust how choose_classifier handles seed parameters #222

riley-harper · 2025-08-15T21:04:04Z

This closes #221.
It also closes #224, a CI/CD bug.

Previously, we were manually setting the seeds for Spark's built-in ML models but not for XGBoost and LightGBM. This inconsistency is an oversight I made while adding XGBoost and LightGBM. Since we weren't setting the seed for XGBoost or LightGBM, the models trained by these libraries were slightly different on each run of hlink. This caused some inconsistent results from matching.

Also, the manual setting of the seeds for the Spark models did not allow users to pass in their own seeds, so they were stuck with the single seed we had chosen.

Now all of these models are handled uniformly. We accept the seed set by the user if there is one. If there is no seed in the params dictionary, then we add a "seed": 2133 entry before passing the parameters to the classifier. This fixes both issues.

We recently got automatically updated to Debian trixie with Java 21. But that seems to cause problems for the current version of XGBoost.

riley-harper added 3 commits August 15, 2025 20:40

[#221] Allow users to optionally set a seed for Spark ML models

9a4ca54

[#221] Always set the seed parameter for XGBoost and LightGBM

b8e297b

[#224] Pin to Debian bookworm and Java 17 in CI/CD

d5ee25e

We recently got automatically updated to Debian trixie with Java 21. But that seems to cause problems for the current version of XGBoost.

riley-harper merged commit 159f8da into main Aug 18, 2025
6 checks passed

riley-harper deleted the classifier_seeds branch August 18, 2025 15:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adjust how choose_classifier handles seed parameters #222

Adjust how choose_classifier handles seed parameters #222

Uh oh!

riley-harper commented Aug 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Adjust how choose_classifier handles seed parameters #222

Adjust how choose_classifier handles seed parameters #222

Uh oh!

Conversation

riley-harper commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

riley-harper commented Aug 15, 2025 •

edited

Loading