Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text Query Default Handling #2683

Open
phorne-uncharted opened this issue Jun 3, 2021 · 0 comments
Open

Text Query Default Handling #2683

phorne-uncharted opened this issue Jun 3, 2021 · 0 comments

Comments

@phorne-uncharted
Copy link
Contributor

Text queries match stemmed versions of words to aggregate similar words together. For example, singulars and plurals are reduced to their common form. However, stop words are not included in the stemmed word catalogue so the current text queries will simply drop them.

If the field is actually a text field, then that isn't too bad as it will simply reflect a word count excluding the stop words. On the other hand, if the field is a misclassified categorical field, then the facet may be presenting inaccurate information to the user. If one of the categories is a stop word (ex: a, and), then the facet will not be displaying it and the total count will not be accurate. This will make it harder for the user to understand what is going on. The attached csv file is a simple dataset example where one of the categories is a and when ingested, the field is a text field. Note the relevant facet will not display that category, making it look like only 15 rows exist when there are actually 20.
timestamp_empty.zip

The text queries should be updated to default to the empty string for words that do not match a stemmed version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant