-
Notifications
You must be signed in to change notification settings - Fork 28
feat(heuristics): add SimilarProjectAnalyzer to detect structural similarity across packages from same maintainer #1089
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ilarity across packages from same maintainer Signed-off-by: Amine <[email protected]>
list[str]: A list of maintainers. | ||
""" | ||
url = f"https://pypi.org/project/{package_name}/" | ||
response = requests.get(url, timeout=10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you have a look at src/macaron/util.py
and see if any of these functions may be imported and used instead of directly using the requests
package?
} | ||
return HeuristicResult.PASS, {} | ||
|
||
def get_maintainers(self, package_name: str) -> list[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this functionality is offered by PyPIRegistry.get_maintainers_of_package
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. That is what I used first, but @behnazh-w told me that get_maintainers_of_package is not working anymore because PYPI blocks it so I rewrite it there .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this code can obtain the maintainer info, rather than adding a new function, please update the PyPIRegistry.get_maintainers_of_package
function so other heuristics can benefit from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I change it here, or create a new PR for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either works, although a separate PR is preferable.
list[str]: A list of package names. | ||
""" | ||
url = f"https://pypi.org/user/{username}/" | ||
response = requests.get(url, timeout=10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar here with using util.py
.
similar_projects_set.discard(package_name) | ||
return list(similar_projects_set) | ||
|
||
def fetch_sdist_url(self, package_name: str, version: str | None = None) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be possible to create a PyPIPackageJsonAsset
object so you may use its function get_sourcecode_url
for this?
return None | ||
|
||
try: | ||
response = requests.get(sdist_url, stream=True, timeout=10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar here with using util.py
@@ -381,6 +383,10 @@ def run_check(self, ctx: AnalyzeContext) -> CheckResultData: | |||
failed({Heuristics.CLOSER_RELEASE_JOIN_DATE.value}), | |||
forceSetup. | |||
|
|||
% Package released that is similar to other packages maintained by the same maintainer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you walk me through the rationale of why we should combine SIMILAR_PROJECTS
failing with the forceSetup
and quickUndetailed
rules and why these rules together are a malicious indicator with HIGH
confidence?
Summary
This PR adds a new heuristic analyzer called
SimilarProjectAnalyzer
. It checks whether a PyPI package has a similar file/folder structure to other packages maintained by the same user. This helps in identifying potentially malicious packages that replicate existing structures.Description of changes
SimilarProjectAnalyzer
.heuristics.py
registry.detect_malicious_metadata_check.py
to include and utilize the new heuristic.Related issues
None
verified
label should appear next to all of your commits on GitHub.