-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[plsql,tsql] Fix CPD being case sensitive in PLSQL and TSQL #4943
Open
oowekyala
wants to merge
22
commits into
pmd:master
Choose a base branch
from
oowekyala:issue4396-cpd-case-sensitive
base: master
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 10 commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
44f29c3
Fix #4396 - Fix PLSQL CPD being case-sensitive
oowekyala 72408ca
Normalize image of PLSQL tokens to uppercase, reuse strings
oowekyala 1c23df7
Fix some weird things in PLSQL tokens
oowekyala 0cb2e37
Update reference files
oowekyala ab80b24
Also add this ability for Antlr lexers, adapt TSQL
oowekyala 41c0135
Fix things
oowekyala f484c75
Add test for PLSQL ignore literals
oowekyala 8f1e6b0
Fix exclusive end index in antlr token
oowekyala 835abc8
Add back ctor for compatibility
oowekyala 10dfb45
Replace numbers with names
oowekyala c482666
Merge branch 'master' into issue4396-cpd-case-sensitive
oowekyala 06eb7ea
review comments
oowekyala d45aef8
Apply suggestions from code review
oowekyala b56925f
Merge remote-tracking branch 'origin/issue4396-cpd-case-sensitive' in…
oowekyala f4e7541
Treat unquotable identifiers as unquoted in PLSQL
oowekyala b931c2f
Fix name of String literal token
oowekyala 95721ef
Trick javacc into giving string literal a non-literal image
oowekyala 838df27
Normalize token images also in PMD parser
oowekyala 75e50df
Make @Image have old behavior, remove KEYWORD_UNRESERVED from tree
oowekyala 7484186
Merge branch 'master' into issue4396-cpd-case-sensitive
oowekyala 2e9aa06
Fix API compatibility
oowekyala ffc71e8
Don"t add publicly supported API
oowekyala File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
32 changes: 32 additions & 0 deletions
32
pmd-core/src/main/java/net/sourceforge/pmd/lang/ast/impl/antlr4/AntlrLexerBehavior.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
/** | ||
* BSD-style license; for more info see http://pmd.sourceforge.net/license.html | ||
*/ | ||
|
||
package net.sourceforge.pmd.lang.ast.impl.antlr4; | ||
|
||
import org.antlr.v4.runtime.Token; | ||
|
||
import net.sourceforge.pmd.cpd.CpdLanguageProperties; | ||
|
||
/** | ||
* Strategy to customize some aspects of the mapping | ||
* from Antlr tokens to PMD/CPD tokens. | ||
*/ | ||
public class AntlrLexerBehavior { | ||
|
||
|
||
/** | ||
* Return the image that the token should have, possibly applying a transformation. | ||
* The default just returns {@link Token#getText()}. | ||
* Transformations here are usually normalizations, for instance, mapping | ||
* the image of all keywords to uppercase/lowercase to implement case-insensitivity, | ||
* or replacing the image of literals by a placeholder to implement {@link CpdLanguageProperties#CPD_ANONYMIZE_LITERALS}. | ||
* | ||
* @param token A token from the Antlr Lexer | ||
* | ||
* @return The image | ||
*/ | ||
protected String getTokenImage(Token token) { | ||
return token.getText(); | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Up to now these things would have been done in the token filter, which is supported by both Javacc and antlr.
I'm a little wary of adding yet another mechanism that can overlap. The only reason to get something into the lexer itself (the way Javacc is implemented) is because the behavior should apply both to cpd and pmd, as it's inherent to the language. The way this is hooked into antlr languages it's only applicable to cpd.
Yes, the tokenfilter was a catch all that made for some complex implementations and would probably use some cleanup, but I'm unsure this is is.
I feel we need to have a better defined boundary / limit extension points. Once this is published we will have to support it. I'd love to see this behavior pushed into the lexer rather the token / token manager as is the case for Javacc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean having this behavior pushed into the Antlr lexer, so that the Antlr parser also sees the normalized image? In that case I agree, it would be nicer. I can look into it.
I think overall using the token filter to do that is an abuse when the language itself applies specific normalizations. That's why I believe our Javacc extension point is justified, and we need a mirror extension point for Antlr.
Edit: I see now that on the line you commented on, the reference to CPD_ANONYMIZE_LITERALS should be removed, as this part should be exclusive to the CpdLexer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you got my intent right.
Try and push this down to the antlr lexer, the same way the Javacc one works. It will seamlessly apply to pmd and cpd, make it non optional (it's language behavior), and exclude it as an extension point.
Other cpd specific behaviors such as anonymize literals should be done in cpd specific code. Right now that is the token filter, but as we both agree, at some point that api will need to evolve.