Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong information showing on terminal #353

Closed
Yao-Wen-Chang opened this issue May 3, 2024 · 3 comments · Fixed by #365
Closed

Wrong information showing on terminal #353

Yao-Wen-Chang opened this issue May 3, 2024 · 3 comments · Fixed by #365
Labels
enhancement New feature or request

Comments

@Yao-Wen-Chang
Copy link

Command$ guarddog pypi scan --help

Output:

Usage: guarddog pypi scan [OPTIONS] TARGET

Scan a given PyPI package

Options:
--exit-non-zero-on-finding Exit with a non-zero status code if at least
one issue is identified
-v, --version TEXT Specify a version to scan
--output-format [json]
-x, --exclude-rules [cmd-overwrite|exfiltrate-sensitive-data|npm-serialize-environment|npm-install-script|release_zero|repository_integrity_mismatch|potentially_compromised_email_domain|npm-exec-base64|steganography|typosquatting|exec-base64|clipboard-access|silent-process-execution|npm-silent-process-execution|download-executable|npm-obfuscation|shady-links|unclaimed_maintainer_email_domain|code-execution|obfuscation|empty_information|single_python_file]
-r, --rules [cmd-overwrite|exfiltrate-sensitive-data|npm-serialize-environment|npm-install-script|release_zero|repository_integrity_mismatch|potentially_compromised_email_domain|npm-exec-base64|steganography|typosquatting|exec-base64|clipboard-access|silent-process-execution|npm-silent-process-execution|download-executable|npm-obfuscation|shady-links|unclaimed_maintainer_email_domain|code-execution|obfuscation|empty_information|single_python_file]
--help Show this message and exit.

Issue:
The information displayed in the console is confusing because the PyPI scan should not include npm-related information.
The same situation occurred in the results returned by the scanner.

@christophetd christophetd added the enhancement New feature or request label May 3, 2024
@jxdv
Copy link
Contributor

jxdv commented May 18, 2024

This happens with the union of SEMGREP_RULE_NAMES:

NPM_RULES = set(get_metadata_detectors(ECOSYSTEM.NPM).keys()) | SEMGREP_RULE_NAMES
PYPI_RULES = set(get_metadata_detectors(ECOSYSTEM.PYPI).keys()) | SEMGREP_RULES_NAMES

If you remove it you're left with correct metadata heuristics for each ecosystem. Adding source code heuristics can be done as follows, but not sure if this is the best solution

# Add source code rules to PYPI / NPM set
for ecosystem, rules in SOURCECODE_RULES.items():
    for rule in rules:
        rule_id = rule["id"]
        if rule_id.startswith("npm"):
            NPM_RULES.add(rule_id)
        else:
            PYPI_RULES.add(rule_id)

After that:

pypi:

Usage: python -m guarddog pypi scan [OPTIONS] TARGET

  Scan a given PyPI package

Options:
  --exit-non-zero-on-finding      Exit with a non-zero status code if at least
                                  one issue is identified
  -v, --version TEXT              Specify a version to scan
  --output-format [json]
  -x, --exclude-rules [potentially_compromised_email_domain|typosquatting|bundled_binary|unclaimed_maintainer_email_domain|repository_integrity_mismatch|single_python_file|shady-links|exec-base64|download-executable|clipboard-access|cmd-overwrite|code-execution|steganography|exfiltrate-sensitive-data|silent-process-execution|empty_information|obfuscation|release_zero]
  -r, --rules [potentially_compromised_email_domain|typosquatting|bundled_binary|unclaimed_maintainer_email_domain|repository_integrity_mismatch|single_python_file|shady-links|exec-base64|download-executable|clipboard-access|cmd-overwrite|code-execution|steganography|exfiltrate-sensitive-data|silent-process-execution|empty_information|obfuscation|release_zero]
  --help                          Show this message and exit.

npm:

Usage: python -m guarddog npm scan [OPTIONS] TARGET

  Scan a given npm package

Options:
  --exit-non-zero-on-finding      Exit with a non-zero status code if at least
                                  one issue is identified
  -v, --version TEXT              Specify a version to scan
  --output-format [json]
  -x, --exclude-rules [potentially_compromised_email_domain|unclaimed_maintainer_email_domain|direct_url_dependency|npm-silent-process-execution|bundled_binary|npm-exec-base64|typosquatting|empty_information|npm-obfuscation|npm-install-script|npm-exfiltrate-sensitive-data|npm-serialize-environment|npm_metadata_mismatch|release_zero]
  -r, --rules [potentially_compromised_email_domain|unclaimed_maintainer_email_domain|direct_url_dependency|npm-silent-process-execution|bundled_binary|npm-exec-base64|typosquatting|empty_information|npm-obfuscation|npm-install-script|npm-exfiltrate-sensitive-data|npm-serialize-environment|npm_metadata_mismatch|release_zero]
  --help                          Show this message and exit.

@sobregosodd / @christophetd any thoughts?

@Yao-Wen-Chang
Copy link
Author

After doing a thorough review of the source code, I got these findings:

  1. The issue only occurs in source code rules
  2. In
    NPM_RULES = set(get_metadata_detectors(ECOSYSTEM.NPM).keys()) | SEMGREP_RULE_NAMES
    and
    PYPI_RULES = set(get_metadata_detectors(ECOSYSTEM.PYPI).keys()) | SEMGREP_RULE_NAMES

    , it is making more sense to change to this:
NPM_RULES = set(get_metadata_detectors(ECOSYSTEM.NPM).keys()) | set([rules["id"] for rules in SOURCECODE_RULES[ECOSYSTEM.NPM]])
PYPI_RULES = set(get_metadata_detectors(ECOSYSTEM.PYPI).keys()) | set([rules["id"] for rules in SOURCECODE_RULES[ECOSYSTEM.PYPI]])
  1. In
    self.sourcecode_ruleset = SEMGREP_RULE_NAMES

    , the source code ruleset involves both NPM and PYPI.
    I would recommend to change the code to this:
self.sourcecode_ruleset = PYPI_SOURCE_RULES if ecosystem is ECOSYSTEM.PYPI else NPM_SOURCE_RULES

[OUTPUT]

  • PyPI Source Rules ['download-executable', 'clipboard-access', 'cmd-overwrite', 'obfuscation', 'exfiltrate-sensitive-data', 'steganography', 'exec-base64', 'silent-process-execution', 'shady-links', 'code-execution']

  • NPM Source Rules ['npm-exfiltrate-sensitive-data', 'npm-silent-process-execution', 'npm-serialize-environment', 'shady-links', 'npm-exec-base64', 'npm-install-script', 'npm-obfuscation']

Another recommendation:
Instead of modifying those lines of code, we can separate the source code rules into two folders.
This change will make the rules easier to maintain.

Feel free to let me know your choice and any suggestions!

I can request a PR to fix this issue.

@jxdv
Copy link
Contributor

jxdv commented May 19, 2024

After doing a thorough review of the source code, I got these findings:
, it is making more sense to change to this:

NPM_RULES = set(get_metadata_detectors(ECOSYSTEM.NPM).keys()) | set([rules["id"] for rules in SOURCECODE_RULES[ECOSYSTEM.NPM]])
PYPI_RULES = set(get_metadata_detectors(ECOSYSTEM.PYPI).keys()) | set([rules["id"] for rules in SOURCECODE_RULES[ECOSYSTEM.PYPI]])

Yes, I like this approach better than mine.

3. In https://github.com/DataDog/guarddog/blob/91450fc8d06cbd96332bc6f7a363e591ab108643/guarddog/analyzer/analyzer.py#L48
   
   , the source code ruleset involves both NPM and PYPI.
   I would recommend to change the code to this:
self.sourcecode_ruleset = PYPI_SOURCE_RULES if ecosystem is ECOSYSTEM.PYPI else NPM_SOURCE_RULES

[OUTPUT]

* PyPI Source Rules ['download-executable', 'clipboard-access', 'cmd-overwrite', 'obfuscation', 'exfiltrate-sensitive-data', 'steganography', 'exec-base64', 'silent-process-execution', 'shady-links', 'code-execution']

* NPM Source Rules ['npm-exfiltrate-sensitive-data', 'npm-silent-process-execution', 'npm-serialize-environment', 'shady-links', 'npm-exec-base64', 'npm-install-script', 'npm-obfuscation']

Another recommendation: Instead of modifying those lines of code, we can separate the source code rules into two folders. This change will make the rules easier to maintain.

Feel free to let me know your choice and any suggestions!

I can request a PR to fix this issue.

I am merely just a contributor, but I like your changes. Feel free to open a PR, and hopefully they'll review it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
3 participants