Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a bug with the override_column_X attributes in conf_validations.py #131

Merged
merged 8 commits into from
Feb 20, 2024

Conversation

riley-harper
Copy link
Contributor

@riley-harper riley-harper commented Feb 19, 2024

Fixes #130.

In issue #118 we documented the override_column_a and override_column_b attributes for column mappings and added some tests for those. I forgot to check the config validation though, and there was no logic in there for handling the override_column_X attributes. So if you used the override_column_X attributes in a column mapping, you'd still get a config validation error from the conf_validations module. If you decided to run hlink anyways, it probably ran without errors. The bug was just in the analysis code, not in the core module code.

I've added some more unit tests for this section of the code and refactored the check_column_mappings() function to make it easier to work with.

The changes to hlink/linking/matching/link_step_explode.py are black formatting changes. black has updated since we last made a change to hlink.

Copy link

@joegrover joegrover left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's crazy how readable this is while at the same time being so confusing. The logic all tracks, though I'm not entirely sure what it is trying to do. Or at least it seems like it is doing more than what I initially assumed was a simple "call COLA from file A COLB in file B". Specifically the concept of a "previous alias" caught me off guard. Do these things stack?

@riley-harper
Copy link
Contributor Author

Haha, yeah this is something that feels very simple at first but is more complicated than it seems. There is some documentation for column mappings here that might be helpful.

Yes, the previous aliases do stack. So if you have a column mapping in your file that reads in a column as AGE, you can have another column mapping later that uses the column AGE as input. In hlink/linking/preprocessing/link_step_prep_dataframes.py line 106 the column mappings are selected one at a time.

@joegrover
Copy link

Ah, got it. Yeah that makes sense then.

@riley-harper riley-harper merged commit 12ff643 into main Feb 20, 2024
6 checks passed
@riley-harper riley-harper deleted the bug_fix_column_mappings_validation branch February 20, 2024 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Config validation doesn't handle column_mappings' override_column_a and override_column_b attributes
2 participants