-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Add a flexible way to assign files with arbitrary names to languages #4945
base: master
Are you sure you want to change the base?
Changes from all commits
63a6849
1630237
a6caf80
fd283bf
bd05b72
628fabd
257f6b1
de63594
027a6a7
01684f2
165092e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -59,6 +59,18 @@ The tool comes with a rather extensive help text, simply running with `--help`! | |
<p>See also [Providing the auxiliary classpath](pmd_languages_java.html#providing-the-auxiliary-classpath).</p>" | ||
languages="Java" | ||
%} | ||
{% include custom/cli_option_row.html options="--assign-language" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We'll need to add the same for CPD CLI doc. |
||
option_arg="langId regex" | ||
description="Specify that files whose name match the regex should be assigned the given language. | ||
This option is useful when the default extension-based matching does not match some file names in | ||
your project. In particular, languages like XML have a variety of extensions, but by default only the `.xml` | ||
extension will be recognized. | ||
|
||
This option may be repeated to make several assignments. If several regexes match the same file, only | ||
the last occurrence of the option is taken into account. If the language is not loaded, the assignment | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would vote for defining the first match/occurrence wins... |
||
is still valid and no further regexes/extension matching will occur, but the file will be ignored. A warning | ||
will be logged once on the CLI before the execution. | ||
%} | ||
{% include custom/cli_option_row.html options="--benchmark,-b" | ||
description="Enables benchmark mode, which outputs a benchmark report upon completion. | ||
The report is sent to standard error." | ||
|
@@ -279,7 +291,19 @@ All formats are described at [PMD Report formats](pmd_userdocs_report_formats.ht | |
|
||
### Analyze other xml formats | ||
|
||
If your xml language doesn't use `xml` as file extension, you can still use PMD with `--force-language`: | ||
If your XML language doesn't use `xml` as file extension, you can match these files with the option `--assign-language`: | ||
|
||
{% include cli_example.html | ||
id="force" | ||
linux="pmd check -d src/xml-file.ext -f text -R ruleset.xml --assign-language xml '.*\.ext'" | ||
windows="pmd.bat check -d src\xml-file.ext -f text -R ruleset.xml --assign-language xml '.*\.ext'" %} | ||
|
||
You can repeat the `--assign-language` option several time to match different patterns. The first argument | ||
is a language ID, the second is a Java regular expression. You can therefore write eg `--assign-language xml '.*\.(ext|ext2)|foo.myxmlformat'`. | ||
|
||
|
||
Another, less flexible way to do this is to use `--force-language`, but this will assign all files to the same language and | ||
will not allow analyzing several languages in one go: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we need to give less space to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With "--force-language java", you could e.g. parse a "File.txt" as a Java-File. With "--assign-language java '.*.txt'" you could achieve the same, correct? If so, we might consider to deprecate "force-language" then... as it seems all use cases are covered by assign-language. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's correct. You are probably right that we should deprecate it, I don't see a use case for it anymore |
||
|
||
{% include cli_example.html | ||
id="force" | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7,15 +7,25 @@ | |
import java.net.URI; | ||
import java.nio.file.Files; | ||
import java.nio.file.Path; | ||
import java.util.ArrayList; | ||
import java.util.LinkedHashSet; | ||
import java.util.List; | ||
import java.util.Set; | ||
import java.util.Stack; | ||
import java.util.regex.Pattern; | ||
import java.util.regex.PatternSyntaxException; | ||
|
||
import org.slf4j.LoggerFactory; | ||
|
||
import net.sourceforge.pmd.AbstractConfiguration; | ||
import net.sourceforge.pmd.cli.commands.mixins.internal.EncodingMixin; | ||
import net.sourceforge.pmd.cli.internal.CliExitCode; | ||
import net.sourceforge.pmd.cli.internal.PmdRootLogger; | ||
import net.sourceforge.pmd.lang.Language; | ||
import net.sourceforge.pmd.lang.LanguageVersionDiscoverer.LanguageFilePattern; | ||
import net.sourceforge.pmd.util.log.internal.SimpleMessageReporter; | ||
|
||
import picocli.CommandLine; | ||
import picocli.CommandLine.Mixin; | ||
import picocli.CommandLine.Option; | ||
import picocli.CommandLine.ParameterException; | ||
|
@@ -29,26 +39,26 @@ public abstract class AbstractAnalysisPmdSubcommand<C extends AbstractConfigurat | |
// see the setters #setInputPaths and setPositionalInputPaths for @Option and @Parameters annotations | ||
// Note: can't use annotations on the fields here, as otherwise the complete list would be replaced | ||
// rather than accumulated. | ||
protected Set<Path> inputPaths; | ||
private Set<Path> inputPaths; | ||
|
||
@Option(names = "--file-list", | ||
description = | ||
"Path to a file containing a list of files to analyze, one path per line. " | ||
+ "One of --dir, --file-list or --uri must be provided.") | ||
protected Path fileListPath; | ||
|
||
@Option(names = { "--uri", "-u" }, | ||
description = "Database URI for sources. " | ||
+ "One of --dir, --file-list or --uri must be provided.") | ||
protected URI uri; | ||
|
||
@Option(names = "--no-fail-on-violation", | ||
description = "By default PMD exits with status 4 if violations are found. " | ||
+ "Disable this option with '--no-fail-on-violation' to exit with 0 instead and just write the report.", | ||
defaultValue = "true", negatable = true) | ||
protected boolean failOnViolation; | ||
|
||
protected List<Path> relativizeRootPaths; | ||
private List<Path> relativizeRootPaths; | ||
|
||
@Option(names = { "--relativize-paths-with", "-z"}, description = "Path relative to which directories are rendered in the report. " | ||
+ "This option allows shortening directories in the report; " | ||
|
@@ -88,6 +98,24 @@ protected void setPositionalInputPaths(final List<Path> inputPaths) { | |
this.setInputPaths(inputPaths); | ||
} | ||
|
||
private Path ignoreListPath; | ||
|
||
@Option(names = "--ignore-list", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds like ok, it seems we named it "exclude" in CPD and "ignore-list" in PMD. So, we have here another opportunity to streamline CLI paramaters... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think CPD's |
||
description = "Path to a file containing a list of files to exclude from the analysis, one path per line. " | ||
+ "This option can be combined with --dir, --file-list and --uri.") | ||
public void setIgnoreListPath(final Path ignoreListPath) { | ||
this.ignoreListPath = ignoreListPath; | ||
} | ||
|
||
@Option(names = "--assign-language", description = "Use a regex pattern to assign filenames to a language." | ||
+ " Eg `--assign-language html '.*\\.twig'` will recognize files with extension \\.twig and assign them the language HTML." | ||
+ " This only affects language assignment for the files that are mentioned with other options like --dir, it will not search for " | ||
+ " new files outside of these. These patterns take precedence over the default language assignment. If several patterns match," | ||
+ " only the latest pattern to be mentioned on the CLI will be considered. Note that the regex will be applied on the absolute path" | ||
+ " of the file, with file separators normalized to '/'.", parameterConsumer = LanguageFilePatternConverter.class) | ||
protected List<LanguageFilePattern> languageFilePatterns = new ArrayList<>(); | ||
|
||
|
||
@Override | ||
protected final void validate() throws ParameterException { | ||
super.validate(); | ||
|
@@ -99,6 +127,30 @@ protected final void validate() throws ParameterException { | |
} | ||
} | ||
|
||
protected void configureCommonOptions(C configuration) { | ||
|
||
// Setup CLI message reporter | ||
configuration.setReporter(new SimpleMessageReporter(LoggerFactory.getLogger(PmdCommand.class))); | ||
|
||
if (inputPaths != null) { | ||
configuration.setInputPathList(new ArrayList<>(inputPaths)); | ||
} | ||
configuration.setInputFilePath(fileListPath); | ||
configuration.setIgnoreFilePath(ignoreListPath); | ||
configuration.setInputUri(uri); | ||
configuration.setSourceEncoding(encoding.getEncoding()); | ||
if (relativizeRootPaths != null) { | ||
configuration.addRelativizeRoots(relativizeRootPaths); | ||
} | ||
for (LanguageFilePattern pat : languageFilePatterns) { | ||
Language lang = configuration.getLanguageRegistry().getLanguageById(pat.getLanguageid()); | ||
if (lang == null) { | ||
configuration.getReporter().warn("Language {0} mentioned in --assign-language option is not loaded.", pat.getLanguageid()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it just not loaded or does it not exist? On the CLI, you can provide arbitrary strings... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can't know in the CLI :/ There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this be a warning or a Also, I'd speak of a language being "unknown" instead of loaded / not loaded. This would be consistent with what we currently throw when we set an invalid language elsewhere. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe a ParameterException yes... I thought it would be useful to be able to define patterns even for languages that are not loaded... The use case I had in mind is adding patterns for POM even if it is not loaded, so that XML does not take precedence. If we make languages register their patterns upon loading then that should not be a valid use case anymore. |
||
} | ||
configuration.addLanguageFilePattern(pat); | ||
} | ||
} | ||
|
||
|
||
protected abstract C toConfiguration(); | ||
|
||
|
@@ -113,4 +165,23 @@ protected CliExitCode execute() { | |
this::doExecute); | ||
} | ||
|
||
static class LanguageFilePatternConverter implements CommandLine.IParameterConsumer { | ||
|
||
@Override | ||
public void consumeParameters(Stack<String> args, CommandLine.Model.ArgSpec argSpec, CommandLine.Model.CommandSpec commandSpec) { | ||
if (args.size() < 2) { | ||
throw new ParameterException(commandSpec.commandLine(), | ||
"Expected two arguments for the language file pattern: one language ID and one pattern."); | ||
} | ||
String langId = args.pop(); | ||
String globString = args.pop(); | ||
Pattern pattern; | ||
try { | ||
pattern = Pattern.compile(globString); | ||
} catch (PatternSyntaxException pse) { | ||
throw new ParameterException(commandSpec.commandLine(), "Invalid regex specification for language " + langId + ": " + pse.getMessage()); | ||
} | ||
((List<LanguageFilePattern>) argSpec.getValue()).add(LanguageFilePattern.ofRegex(pattern, langId)); | ||
} | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
<?xml version="1.0"?> | ||
<ruleset name="Test Ruleset" xmlns="http://pmd.sourceforge.net/ruleset/2.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" | ||
xsi:schemaLocation="http://pmd.sourceforge.net/ruleset/2.0.0 https://pmd.sourceforge.io/ruleset_2_0_0.xsd"> | ||
|
||
<description> | ||
Ruleset used by test RuleSetFactoryTest | ||
</description> | ||
|
||
<rule name="ReportAllRootNodes" language="dummy2" since="1.0" message="Violation from ReportAllRootNodes" | ||
class="net.sourceforge.pmd.lang.rule.xpath.XPathRule" | ||
externalInfoUrl="${pmd.website.baseurl}/rules/test/TestRuleset3.xml#Ruleset3Rule1"> | ||
<description>Just for test</description> | ||
<priority>3</priority> | ||
<properties> | ||
<!-- Reports one violation per file. --> | ||
<property name="xpath" value="(/)"/> | ||
</properties> | ||
</rule> | ||
|
||
</ruleset> |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -208,6 +208,19 @@ protected void checkLanguageIsAcceptable(Language lang) throws UnsupportedOperat | |
// do nothing | ||
} | ||
|
||
/** | ||
* Add a pattern that will be matched to a language. If the language is unknown, | ||
* return false. File patterns are matched in the reverse order they were added. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "... matched in reverse order..." - Can we make it the other way round? IMHO this makes it a bit more difficult to grasp. It would be easier, if we said "language file patterns are applied in the order they are added here - the first match wins". Maybe we can give the impl. detail, that patterns from CLI are added first and take therefore precedence over the default patterns from the language modules. Not sure, if that makes the implementation more complicated or simpler - I didn't look at the actual implementation yet. Ok, so the patterns from CLI are actually added in AbstractAnalysisPmdSubcommand - in the order they are given in the CLI. I would find it more intuitive, if we preserve the order - the first match given with "--assign-language" wins. |
||
* This behavior allows later patterns to take precedence over already added patterns. | ||
* The first match stops the search. If the language is unknown (not loaded), the | ||
* search is stopped anyway. | ||
* | ||
* @param pattern A pattern | ||
*/ | ||
public void addLanguageFilePattern(LanguageVersionDiscoverer.LanguageFilePattern pattern) { | ||
languageVersionDiscoverer.addLanguageFilePattern(pattern); | ||
} | ||
|
||
/** | ||
* Get the LanguageVersion of the source file with given name. This depends | ||
* on the fileName extension, and the java version. | ||
|
@@ -220,9 +233,7 @@ protected void checkLanguageIsAcceptable(Language lang) throws UnsupportedOperat | |
* Name of the file, can be absolute, or simple. | ||
* @return the LanguageVersion | ||
*/ | ||
// FUTURE Delete this? I can't think of a good reason to keep it around. | ||
// Failure to determine the LanguageVersion for a file should be a hard | ||
// error, or simply cause the file to be skipped? | ||
// TODO Delete this, only used in tests, duplicates logic from FileCollector. | ||
public @Nullable LanguageVersion getLanguageVersionOfFile(String fileName) { | ||
LanguageVersion forcedVersion = getForceLanguageVersion(); | ||
if (forcedVersion != null) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to add an example on how to externalize the CLI parameters and load them with
@
...Otherwise we didn't fix #461