Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the "suspicious data" logic from model exploration #176

Open
riley-harper opened this issue Dec 9, 2024 · 0 comments · May be fixed by #186
Open

Remove the "suspicious data" logic from model exploration #176

riley-harper opened this issue Dec 9, 2024 · 0 comments · May be fixed by #186

Comments

@riley-harper
Copy link
Contributor

riley-harper commented Dec 9, 2024

This logic makes up a large chunk of the complexity of model exploration and takes a lot of time to compute. It is not used at all by researchers at IPUMS. Creating high-quality training data is also out of the scope of hlink. So we should remove this feature in v4 to simplify model exploration and streamline it.

@riley-harper riley-harper added this to the v4.0.0 milestone Dec 9, 2024
riley-harper added a commit that referenced this issue Dec 10, 2024
Using a single select() should let us take better advantage of Spark's
parallel/distributed computing. My initial results profiling this are
pretty promising.
@riley-harper riley-harper linked a pull request Mar 6, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant