enhancement(match_datadog_query): add is_phrase flag to equals method#1334
enhancement(match_datadog_query): add is_phrase flag to equals method#1334PSeitz wants to merge 2 commits intovectordotdev:mainfrom
Conversation
Update the Filter trait's equals signature to include an `is_phrase` boolean flag. Without that information there is no way to distinguish between phrased and non-phrased queries.
| &self, | ||
| field: Field, | ||
| to_match: &str, | ||
| is_phrase: bool, |
There was a problem hiding this comment.
Which implementation needs this bool?
Also, please avoid passing flags if possible e.g. we can introduce a new phrase_equals.
There was a problem hiding this comment.
This is for matching on default fields (typically message) in datadog. The matching behavior for phrases is different than non-phrased. In a phrase the tokens need to be in the same order. E.g.
Hello nice world => "Hello world" no match
Hello nice world => Hello world matches
Btw. there is an alternative solution where the tokenizer would emit multiple tokens.
Hello world would expand to the equivalent of_default_:Hello AND _default_:World.
"Hello world" would expand to the equivalent of_default_:"Hello World"
I considered adding a new method phrase_equals, but wasn't sure, since this behavior applies only to default fields.
There was a problem hiding this comment.
updated to introduce phrase fn callback
|
Thanks @PSeitz, this looks better now. Correct me if I am wrong, but this a breaking change? Since the filter matcher is now stricter. |
| Field::Default(_) => { | ||
| let re = word_regex(phrase); | ||
| Ok(resolve_value( | ||
| buf, | ||
| Run::boxed(move |value| match value { | ||
| Value::Bytes(val) => re.is_match(&String::from_utf8_lossy(val)), | ||
| _ => false, | ||
| }), | ||
| )) | ||
| } |
There was a problem hiding this comment.
How is this different from the condition in equals below? It looks word-for-word identical but maybe I missed something. I'd also like to see some unit test cases that demonstrate the different behavior.
There was a problem hiding this comment.
It is the same, since I didn't know the impact of the change, so currently it's a non-breaking behavior.
Currently the non-phrase behavior is also behaving incorrectly, they both would need to be adjusted to include tokenization (and probably some adaption on the query parser)
I implemented it partially here https://github.com/DataDog/pomsky/pull/79. For full compatibility I'd need to have a closer look on how percolation tokenizes.
I think yes, but nothing disruptive with what's done in datadog (we want to get more in line with the datadog log explorer behaviour) |
Summary
Update the Filter trait's equals signature to include an
is_phraseboolean flag.Without that information there is no way to distinguish between phrased and non-phrased queries. They behave differently in datadog on default fields
This is for matching on default fields (typically
message) in datadog. The matching behavior for phrases is different than non-phrased. In a phrase the tokens need to be in the same order. E.g.Hello nice world=>"Hello world"no matchHello nice world=>Hello worldmatchesAlternative Options
There is an alternative solution where the tokenizer would emit multiple tokens (only for default fields):
Hello worldwould expand to the equivalent of_default_:HelloAND_default_:World."Hello world"would expand to the equivalent of_default_:"Hello World"Change Type
Is this a breaking change?
Does this PR include user facing changes?
our guidelines.
Checklist
run
dd-rust-license-tool writeand commit the changes. More details here.