Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scan GitHub and GitLab refs that aren't cloned by default #1918

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

rgmz
Copy link
Contributor

@rgmz rgmz commented Oct 18, 2023

Description:

This fixes #1588.

In my experience, this find significantly more secrets with a negligible performance impact.

The only issue is that these secrets are technically not a part of the repository, so refactoring may be necessary to indicate that a result comes from a historical PR/MR branch. It now outputs the source pull/merge request (based on git log --source), in case the commit only exists in the PR history and not the actual repo history, which can happen when PRs are squashed.

✅ Found verified result 🐷🔑
Detector Type: AWS
Decoder Type: PLAIN
Raw result: AKIAXYZDQCEN4B6JSJQI
Resource_type: Access key
Account: 534261010715
Is_canary: true
Message: This is an AWS canary token generated at canarytokens.org, and was not set off; learn more here: https://trufflesecurity.com/canaries
Arn: arn:aws:iam::534261010715:user/canarytokens.com@@88uc3ciwodujg5f18inco2yvu
Pull Request: 2387
Commit: e47780e2e4d2dbaa3d1e63bdfe1cf00eb2c5681b
Email: Ahrav Dutta <[email protected]>
File: pkg/gitparse/gitparse_test.go
Line: 715
Link: https://github.com/trufflesecurity/trufflehog/blob/e47780e2e4d2dbaa3d1e63bdfe1cf00eb2c5681b/pkg/gitparse/gitparse_test.go#L715
Repository: https://github.com/trufflesecurity/trufflehog.git
Timestamp: 2024-02-05 17:37:58 +0000

image

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

@rgmz rgmz requested a review from a team as a code owner October 18, 2023 21:27
@rgmz rgmz force-pushed the feat/additional-refspecs branch 4 times, most recently from f64e0a3 to e2fb273 Compare October 23, 2023 21:59
@rgmz rgmz force-pushed the feat/additional-refspecs branch 2 times, most recently from b2e724c to ec2de50 Compare October 30, 2023 00:06
@rgmz rgmz force-pushed the feat/additional-refspecs branch from 438418c to 7cb8af2 Compare April 9, 2024 15:39
@rgmz rgmz marked this pull request as draft April 12, 2024 11:25
@CLAassistant
Copy link

CLAassistant commented Apr 12, 2024

CLA assistant check
All committers have signed the CLA.

@rgmz rgmz force-pushed the feat/additional-refspecs branch 10 times, most recently from 956b38d to c9a7acd Compare April 13, 2024 22:44
@rgmz rgmz marked this pull request as ready for review April 13, 2024 22:44
@rgmz rgmz requested a review from a team as a code owner April 13, 2024 22:44
pkg/output/plain.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@rosecodym rosecodym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! Just to make sure I understand: This PR has two discrete changes, right? (Pulling down all the refs and printing the source ref of found secrets.)

pkg/sources/git/git.go Outdated Show resolved Hide resolved
pkg/output/plain.go Outdated Show resolved Hide resolved
@rgmz rgmz mentioned this pull request Apr 23, 2024
2 tasks
@rgmz rgmz force-pushed the feat/additional-refspecs branch from 9ac8dbe to d526837 Compare May 2, 2024 12:55
@rgmz
Copy link
Contributor Author

rgmz commented Jun 9, 2024

I've made some progress towards using --mirror. The GitHub source has been updated to clone and open the repo as bare, however, it seems like every Git-related source will need to be changed.

...

Additionally, getting binary files currently doesn't work. I can fix this, but I am not able to do the above.

2024-06-09T11:31:08-04:00       error   trufflehog      waiting for command failed      {"source_manager_worker_id": "jGSti", "repo": "https://github.com/trufflesecurity/trufflehog.git", "error": "error waiting for command: command=/usr/bin/git -C /tmp/trufflehog-1690212-1041037951/.git cat-file blob 2db06f05767dd8475df2f071785d9775144ee549:pkg/handlers/testdata/testdir.zip, stderr=fatal: cannot change to '/tmp/trufflehog-1690212-1041037951/.git': No such file or directory\n, commit=2db06f05767dd8475df2f071785d9775144ee549: exit status 128"}

@rgmz rgmz force-pushed the feat/additional-refspecs branch from 50f192c to 24bcc74 Compare June 10, 2024 00:57
@rosecodym
Copy link
Collaborator

@rgmz is it theoretically possible to split this work up into two separate changes: One that clones using --mirror and one that reports the provenance of detected secrets? I'm trying to think of ways to minimize the change risk.

@rgmz
Copy link
Contributor Author

rgmz commented Jun 12, 2024

It is; I don't think it would accomplish much, as I'd consider the gitparse changes to be negligible compared to everything else.

@zricethezav
Copy link
Collaborator

Thanks for the useful information @bplaxco!

It is; I don't think it would accomplish much, as I'd consider the gitparse changes to be negligible compared to everything else.

It should be separated to keep the PR focused.

I did some toying around with getting --mirrored to work but was unable to get the shelled out git log -p cmd to generate any output... Ideally this change would just be

  1. add --mirrored to git clone
  2. add --no-replace-objects to git log
  3. force the bare config option so we aren't looking for a .git folder

@rgmz
Copy link
Contributor Author

rgmz commented Jun 13, 2024

I did some toying around with getting --mirrored to work but was unable to get the shelled out git log -p cmd to generate any output...

I'm not sure what you mean by this. From this PR, or your own experimentation?

Git itself is challenging because it's a mix of confusing, duplicated, and black-box logic. In particular, it seems to accept both a list of directories and repositories (sources.proto), along with separate scanRepos and scanDirs functions, but it the OSS code only accepts a single URI.

... #1918 (comment)

Any insight into this?

@zricethezav
Copy link
Collaborator

I'm not sure what you mean by this. From this PR, or your own experimentation?

From this PR

@rgmz
Copy link
Contributor Author

rgmz commented Jun 14, 2024

I haven't encountered any issues, so I'm not sure what the cause could be.

If you can provide more detailed info here, or in Slack/Discord, I can try to troubleshoot.

@zricethezav
Copy link
Collaborator

@rgmz

~/code/trufflesecurity/trufflehog (feat/additional-refspecs) go build && ./trufflehog git https://github.com/leaktk/fake-leaks.git
🐷🔑🐷  TruffleHog. Unearth your secrets. 🐷🔑🐷

2024-06-14T10:00:57-04:00       info-0  trufflehog      prepareRepoSinceCommit  {"commit": ""}
2024-06-14T10:00:57-04:00       info-0  trufflehog      running source  {"source_manager_worker_id": "MQh7O", "with_units": true}
2024-06-14T10:00:57-04:00       info-0  trufflehog      enumerating dirs        {"source_manager_worker_id": "MQh7O", "repo": "/var/folders/qc/__3s_93j7tlg3qrj7zxsbjwr0000gn/T/trufflehog-49059-1392975446"}
2024-06-14T10:00:57-04:00       info-0  trufflehog      scanDir {"source_manager_worker_id": "MQh7O", "unit": "/var/folders/qc/__3s_93j7tlg3qrj7zxsbjwr0000gn/T/trufflehog-49059-1392975446", "unit_kind": "dir", "dir": "/var/folders/qc/__3s_93j7tlg3qrj7zxsbjwr0000gn/T/trufflehog-49059-1392975446"}
2024-06-14T10:00:57-04:00       error   trufflehog      error getting RepoFromPath      {"source_manager_worker_id": "MQh7O", "unit": "/var/folders/qc/__3s_93j7tlg3qrj7zxsbjwr0000gn/T/trufflehog-49059-1392975446", "unit_kind": "dir", "dir": "/var/folders/qc/__3s_93j7tlg3qrj7zxsbjwr0000gn/T/trufflehog-49059-1392975446", "error": "repository does not exist"}
2024-06-14T10:00:57-04:00       info-0  trufflehog      finished scanning       {"chunks": 0, "bytes": 0, "verified_secrets": 0, "unverified_secrets": 0, "scan_duration": "430.436792ms", "trufflehog_version": "dev"}

How were you testing remote repos?

@rgmz
Copy link
Contributor Author

rgmz commented Jun 14, 2024

Ah. Only GitHub and GitLab sources work right now. I have not made the necessary changes to Git because I don't know what a "repo" is vs. a "directory", or how it's called in Enterprise vs the OSS CLI.

See #1918 (comment)

@rgmz rgmz force-pushed the feat/additional-refspecs branch 2 times, most recently from c0c6955 to 8f9c202 Compare June 17, 2024 23:11
@zricethezav
Copy link
Collaborator

@rgmz @bplaxco alright after banging my head against this git source I think it's better if we use "remote.origin.fetch=+refs/:refs/remotes/origin/". Introducing --mirror has a large blast radius. We should just be able to append -c remote.origin.fetch=+refs/*:refs/remotes/origin/* to the clone command and call it a day.

@zricethezav zricethezav mentioned this pull request Jun 18, 2024
2 tasks
@rgmz rgmz force-pushed the feat/additional-refspecs branch 4 times, most recently from 983d41e to f057784 Compare June 20, 2024 15:06
@rgmz
Copy link
Contributor Author

rgmz commented Jun 20, 2024

@rgmz is it theoretically possible to split this work up into two separate changes: One that clones using --mirror and one that reports the provenance of detected secrets? I'm trying to think of ways to minimize the change risk.

I've rebased this onto #2988. This now only has changes related to reporting ref provenance.

rgmz added 2 commits July 1, 2024 14:37
'Hidden' refs, such as 'refs/pull/1004/head' may cause confusion if reported upon. GitHub, for example, will display a banner saying that the commit doesn't belong to the repository.
This parse the output of 'git log --source' and converts it to a human-readable format, IF the ref is 'hidden'.
@rgmz rgmz force-pushed the feat/additional-refspecs branch from 005d5e1 to 805f5dc Compare July 1, 2024 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Scan GitHub and GitLab refs that aren't pulled by default
6 participants