Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Data Filter #2700

Open
phorne-uncharted opened this issue Jun 7, 2021 · 0 comments
Open

Missing Data Filter #2700

phorne-uncharted opened this issue Jun 7, 2021 · 0 comments
Labels

Comments

@phorne-uncharted
Copy link
Contributor

Depending on the column type, having a filter on the column could result in rows with missing data also being excluded. For example, if field A is a timestamp and has missing values, then excluding data with values between 2010-01-01 and 2011-01-01 would also exclude rows where A is missing. It is unclear if that is desired behaviour or not. Letting the user specify what to do can also get messy quickly as generating the appropriate queries all the time may not be the easiest thing to do, especially on type switches.

There is also a more fundamental problem with filters and missing data. The primitives themselves do not support missing data and crash if ANY row has missing data (datetime and numerical definitely, others unsure). The only way to really handle this would be to add an initial filter on all fields where missing data could be present to explicitly exclude empty rows BEFORE parsing the data to the right type. But then that becomes very annoying very quickly, and for potentially little gain.

A decision needs to be made at some point to determine exactly the kind of missing data support Distil needs to offer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant