Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: improve regex validaiton message #5447

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

chenqi0805
Copy link
Collaborator

Description

This PR

  • adds missing Jakarta validation for regex pattern plugin setting values
  • improves the error message on regex pattern validation.

Issues Resolved

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

return grokCompiler.compile(item, grokProcessorConfig.isNamedCapturesOnly());
} catch (IllegalArgumentException e) {
throw new RuntimeException(
String.format("Invalid regex pattern in match.%s", entry.getKey()), e);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to stop the stream processing if we encounter IllegalArugmentException? or we want to collect the errors but continue with stream processing?

Also, in the exception we are throwing, we are attaching the original e. Depending on whoever is handling this exception, it could potentially print the entire stacktrace for each failure and create a lot of noise in the logs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a config validation error which means data prepper will crash at runtime. The change I made is to increase clarity on the failure message:

previous:

s3-log-pipeline.processor.grok: caused by: Exception thrown from plugin "grok". caused by: No definition for key 'ses_logs' found, aborting

now

2025-02-20T13:14:40,784 [main] ERROR org.opensearch.dataprepper.core.validation.LoggingPluginErrorsHandler - 1. waf-access-log-pipeline.processor.grok: caused by: Exception thrown from plugin "grok". caused by: Invalid regex pattern in match.message caused by: No definition for key 'CUSTOM_PATTERN_FROM_FILE' found, aborting

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. It is not a data processing error 👍

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one thought, instead of RuntimeException, we could possibly throw InvalidPluginConfigurationException to be more explicit

Signed-off-by: George Chen <[email protected]>
}

private boolean validateRegex(final String pattern) {
if (pattern != null && !Objects.equals(pattern, "")) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty regex pattern or null regex pattern is valid?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this config context it is valid.


private static Stream<Arguments> provideFromKeyRegexAndIsValid() {
return Stream.of(
Arguments.of("", true),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add one null case too here.

return validateRegex(delimiterRegex);
}

private boolean validateRegex(final String pattern) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we got this method repeated multiple times. Probably a good idea to keep this in a static util class?


private static Stream<Arguments> provideDelimiterRegexAndIsValid() {
return Stream.of(
Arguments.of("", true),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding one null case would help here too

Signed-off-by: George Chen <[email protected]>
pluginMetrics, grokProcessorConfig, expressionEvaluator));
assertThat("No definition for key 'CUSTOMBIRTHDAYPATTERN' found, aborting", equalTo(throwable.getMessage()));
assertThat(throwable.getCause(), instanceOf(IllegalArgumentException.class));
assertThat("No definition for key 'CUSTOMBIRTHDAYPATTERN' found, aborting", equalTo(throwable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we remove the "aborting" part of this message? Small but seems a little weird for users to get

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: George Chen <[email protected]>
Signed-off-by: George Chen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants