The goal of this image is to provide a ready-to-use toolbox to perform offline scanning of a code base.
π‘ The goal is to prevent any disclosure of the code base scanned.
Tool | Usage |
---|---|
Semgrep | Code scanning (SAST activity). |
Gitleaks | Search for secrets/credentials/... |
π¬ When Semgrep fails to detect a problem that I know exists, I try to suggest a new rule to the Semgrep rules registry:
π‘ In order to be able to use proposed rules during the period in which corresponding PR are pending, all proposed rules are imported into the folder /tools/semgrep-rules-righettod
:
- β If a rule has its PR rejected then it stay permanently into this folder.
- β If a rule has its PR merged then it is removed from this folder as it become part of the semgrep rules registry. Accepted rules are keep, as backup, into the folder archived-rules.
π The folder /tools/semgrep-rules-righettod
represent my cutom semgrep rules registry.
π» Use the following set of command to build the docker image of the toolbox:
git clone https://github.com/righettod/toolbox-codescan.git
cd toolbox-codescan
docker build . -t righettod/toolbox-codescan
π‘ The image is build every week and pushed to the GitHub image repository. You can retrieve it with the following command:
docker pull ghcr.io/righettod/toolbox-codescan:main
Caution
It is important to add the option --network none
to prevent any IO.
π» Use the following command to create a container of the toolbox:
docker run --rm -v "C:/Temp:/work" --network none -it ghcr.io/righettod/toolbox-codescan:main
# From here, use one of the provided script...
Note
π‘ jq is installed and can be used to manipulate the result of a scan.
Note
π‘ regexploit is installed and can be used to test exposure of a regular expression to ReDOS.
Tip
π¦ All scripts are stored in the folder /tools/scripts
but they are referenced into the PATH
environment variable.
Tip
Semgrem rules from other providers are stored into the corresponding folder using the naming convention semgrep-rules-[github-org-name]
. Use ../semgrep-rules-[github-org-name]/[rules_folder_name]
as [RULES_FOLDER_NAME]
parameter to use them instead of the rules from the Semgrep registry.
Note
Use the command list-rules-providers
to see the list of rules imported from other providers.
Script to scan the current folder using a set of SEMGREP rules with SEMGREP OSS version.
π Findings will be stored in file findings.json
.
π‘ This script can be used to obtains an overview of the findings identified and stored into the file findings.json
. It is imported as the file /tools/scripts/report-code.py
.
π» Usage & Example:
$ pwd
/work/sample
$ scan-code.sh
Usage:
scan-code.sh [RULES_FOLDER_NAME]
Call example:
scan-code.sh java
scan-code.sh php
scan-code.sh json
See sub folders in '/tools/semgrep-rules'.
Findings will be stored in file 'findings.json'.
$ scan-code.sh java
ββββββββββββββββββ
β 1 Code Finding β
ββββββββββββββββββ
src/burp/ActivityLogger.java
β―β―β± tools.semgrep-rules.java.lang.security.audit.formatted-sql-string
Detected a formatted string in a SQL statement. This could lead to SQL injection
if variables in the SQL statement are not properly sanitized. Use a prepared
statements (java.sql.PreparedStatement) instead. You can obtain a PreparedStatement
using 'connection.prepareStatement'.
91β stmt.execute(SQL_TABLE_CREATE);
Note
In this script the notion of technology is the same than the notion of RULES_FOLDER_NAME used by the script scan-code.sh
.
Perform the same processing than the script scan-code.sh
but scan the current folder using all SEMGREP rules related to the target technology.
This script first gather all rules provided by all rules providers for the target technology and then use this consolidated set of rules for the scan.
Important
This custom configuration file is used to define detection expressions.
Script to scan the current folder using GITLEAKS to find secrets into source files and git files. Git files scanning is only performed if a folder .git
is present.
π Leaks will be stored in files leaks-gitfiles.json
and leaks-sourcefiles.json
.
π‘ This script can be used to obtains an overview of the leaks identified and stored into the files leaks-*.json
. It is imported as the file /tools/scripts/report-secrets.py
.
π» Usage & Example:
$ pwd
/work/sample
$ scan-secrets.sh
5:47PM INF scan completed in 78.1ms
5:47PM INF no leaks found
Script to scan the current folder using a dictionary of secret common variables names (source).
π‘ The dictionary of secret common variables names referenced above is imported, as the file /tools/secret-common-variable-names.txt
, during the build time of the image.
π» Usage & Example:
$ pwd
/work/sample
$ scan-secrets-extended.sh
./config/db.properties:50:DB_PASSWORD=Password2024
Script to scan a collection of online git repositories using GITLEAKS to find secrets into source files and git files.
π‘ The script scan-secrets.sh is used for the scan of a git repository once cloned.
π‘ Use the script online-scan-secrets-consolidate.py to consolidate the generated data into a single file.
π» Usage & Example:
$ online-scan-secrets.sh
Usage:
online-scan-secrets.sh [FILE_WITH_COLLECTION_OF_GIT_REPO_URLS]
Call example:
online-scan-secrets.sh repositories.txt
$ online-scan-secrets.sh repositories.txt
[*] Execution context:
List of git repositories URL : repositories.txt (1030 entries)
Data collection storage folder : /work/data-collected
[*] Start repositories checking and data collection...
...
Script to allow filtering a large leaks file that uses the GITLEAKS format, like for example, a file generated by the script online-scan-secrets-consolidate.py.
π‘The output allow to search for specific secrets using grep with differents regexes like grep -B 4 -E 'ey[A-Za-z0-9]{15,}\.[A-Za-z0-9]{15,}\.[A-Za-z0-9_-]*' report.txt
.
π» Usage:
filters-secrets.py leaks-consolidated.json
π€ I noticed that SemGrep, with the community set of rules for CSharp, is not very effective.
π‘ To address this:
- I found the tool DevSkim provided by Microsoft that I added to the box. It's why I moved, from alpine to ubuntu for the base image, as I did not achieved to make it run on the alpine based image.
- I created and added this script as
report-code-devskim.py
to explore the results of the scan. - I created the alias
scan-code-devskim
to scan the current folder withDevSkim
and generate the results in the json filefindings.json
to stay consistent with other scripts of the toolbox.