Is there any rule of thumb difference between settings blocking rule and "training" rules? #1543
Unanswered
jkginfinite
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am wondering if there is any logical choice in difference between the rules we use for the settings "blocking rules to generate predicitons"
settings = {
"link_type": "dedupe_only",
"comparisons": [
ctl.name_comparison("first_name"),
ctl.name_comparison("surname"),
ctl.date_comparison("dob", cast_strings_to_date=True),
cl.exact_match("city", term_frequency_adjustments=True),
ctl.email_comparison("email", include_username_fuzzy_level=False),
],
"blocking_rules_to_generate_predictions": [
block_on("first_name"),
block_on("surname"),
],
"retain_matching_columns": True,
"retain_intermediate_calculation_columns": True,
}
linker = DuckDBLinker(df, settings)
And these rules;
training_blocking_rule = block_on(["first_name", "surname"])
training_session_fname_sname = linker.estimate_parameters_using_expectation_maximisation(training_blocking_rule)
Beta Was this translation helpful? Give feedback.
All reactions