-
Notifications
You must be signed in to change notification settings - Fork 686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOLR-17181: Using apache commons implementation for wildcard matching for glob patterns #2301
SOLR-17181: Using apache commons implementation for wildcard matching for glob patterns #2301
Conversation
We just got rid of commons-io, https://issues.apache.org/jira/browse/SOLR-15784 Can you find an alternative that is already a dependency, or copy just the code we need into one of Solr's util classes? |
Do we have a test case here that includes Unicode in the file name? For example https://docs.modular.com/mojo/faq.html#why-does-mojo-have-the-file-extension Edit: that might not apply, I skimmed the change without realising we're talking about field names here |
Updated this to just reuse the applicable code from commons-io without the dependency. I'd like to use that code since we know it will perform the same as previous Solr versions. |
Not including a unicode test here since field names should be alphanumeric and that is what current use cases are checking. |
@risdenk I noticed that for the one test failing that seems to be unrelated to changes here, but more of a random occurrence. Looking at the code, I think
|
@justinrsweeney would probably open a separate MR to fix that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
@@ -27,11 +28,161 @@ public class GlobPatternUtil { | |||
* PathMatcher to match glob patterns in the same way to how glob patterns are matches for file | |||
* paths, rather than implementing our own glob pattern matching. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the javadoc to reflect the change
… for glob patterns (#2301) * Using apache commons implementation for wildcard matching for glob patterns * Removing commons io dependency and instead reusing applicable code from that library * Fixing comments with Javadoc links * Actually fixing comments with Javadoc links * Fixing javadoc comments
Hmm, problem with copying code out of other libraries is we won't get any bug-fixes with new versions or be aware of relevant CVE's in the copied code. Might be better to just have the dependency? |
For SolrJ we really have to keep removing external deps, not adding. We can of course keep looking for a performant glob method in one of the other existing libs we have in SolrJ as a replacement. But it is not worth re-adding entire commons-io for this one method imo. |
The functionality we are trying to implement here is not super complicated so I don't think we likely see it change that much in the future or needing to incorporate commons-io changes necessarily. The FilenameUtils approach is mostly just a convenience that it does what we need and could certainly be changed replaced on its own in the Solr codebase. |
None the less there should be some mechanism to track it so that if a CVE for commons IO does come out somebody looks at it to verify that that CVE has nothing to do with glob matching. This particular case probably low risk, but pulling code out of dependencies doesn't reduce our exposure to CVE's in other dependencies, and there's also still a license question. I think we track NOTICE.txt entries based on what libs we use right? Well we still need to atribute this, and I'm pretty sure this PR is not complying with the final bullet here:
If we put an entry back in NOTICE.txt, how to we keep it from getting removed when someone reviews dependency information. |
Are you suggesting we add a snippet to solr/NOTICE.txt with a mention of the copied content, in addition to the generic attribution we have in place for commons-in? |
I haven't taken time to verify it but I presume the removal of commons-io would have also removed references to it from NOTICE.txt? If we are pulling their code back in that has to return. If it was missed and still is there, that's good. Either way, there's a risk that at a later date someone will audit NOTICE.txt, not find commons-io in the output of |
Commons-io is still a core dependency and will likely be forever, so both license and notice is still there. But it is not a dependency of solrj. |
https://issues.apache.org/jira/browse/SOLR-17181
Description
This is intended to fix a bug raised where glob pattern matching suffered a performance degradation after #1996.
Solution
This reverts back to using the Apache Commons IO FilenameUtils.wildcardMatch method to do glob pattern matching. This is the original implementation used prior to 9.5 and thus should match performance. This does introduce commons-io as a new library for solrj since glob pattern matching is needed in solrj as well to be available for all use cases.
Tests
Using existing tests to validate functionality.
Checklist
Please review the following and check all that apply:
main
branch../gradlew check
.