-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
Background
Trivy recently introduced an ArtifactID field to uniquely identify scan targets across different artifact types (#9662, #9663). For container images, the current implementation uses the Image ID (config blob hash) directly as the Artifact ID.
After team discussion, we identified that this approach is insufficient for our use cases. We need a more nuanced Artifact ID generation that considers the repository context while maintaining consistency across tags of the same image.
Problem Statement
The current implementation using only Image ID as Artifact ID does not meet our requirements for distinguishing between:
- Images from different repositories within the same registry
- Images from different registries
- Images with the same content but different repository contexts
This leads to incorrect deduplication and tracking of vulnerabilities across different deployment contexts.
Requirements
The new Artifact ID generation for container images must satisfy the following requirements:
1. Same Artifact ID Requirements
Images should have the same Artifact ID when:
- They share the same Image ID (config blob hash)
- They are from the same registry
- They are from the same repository
- They only differ in tags
Example:
ghcr.io/aquasecurity/trivy:latest
ghcr.io/aquasecurity/trivy:v0.65.0
These should have the same Artifact ID (assuming same Image ID).
2. Different Artifact ID Requirements
Images should have different Artifact IDs when they are from:
a. Different Repositories (Same Registry)
Example:
ghcr.io/aquasecurity/trivy:v0.65.0
ghcr.io/aqua-sec/trivy:v0.65.0
Even with the same Image ID, these should have different Artifact IDs.
b. Different Registries
Example:
ghcr.io/aquasecurity/trivy:v0.65.0
docker.io/aquasecurity/trivy:v0.65.0
Even with the same Image ID, these should have different Artifact IDs.
Proposed Solution
Artifact ID Calculation
The Artifact ID for container images should be calculated as:
ArtifactID = hash(ImageID + Registry + Repository)
Where:
ImageID: The existing image configuration blob hash (sha256:...)Registry: The registry hostname (e.g.,ghcr.io,docker.io)Repository: The repository path without the tag (e.g.,aquasecurity/trivy)
Implementation Details
1. Parsing Image References
The implementation must correctly parse image references to extract:
- Registry hostname
- Repository path
- Tag/digest (to be excluded from calculation)
// Example parsing
// Input: ghcr.io/aquasecurity/trivy:v0.65.0
// Parsed:
// Registry: ghcr.io
// Repository: aquasecurity/trivy
// Tag: v0.65.0 (excluded from Artifact ID)2. Hash Function
- Use SHA256 for consistency with existing Image ID format
- Combine components in a deterministic order
- Format:
sha256:<hash>
func GenerateArtifactID(imageID, registry, repository string) string {
input := fmt.Sprintf("%s:%s:%s", imageID, registry, repository)
hash := sha256.Sum256([]byte(input))
return fmt.Sprintf("sha256:%x", hash)
}3. Edge Cases Handling
- Default Registry: Images without explicit registry should default to
docker.io - Port Handling: Registry URLs with ports should be normalized (e.g.,
localhost:5000) - Multi-level Repositories: Handle paths like
registry/org/team/imagecorrectly - Digest References: Images referenced by digest should still use the repository path
Examples
Given two images with the same Image ID sha256:abc123...:
Example 1: Same Repository, Different Tags → Same Artifact ID
Input 1: ghcr.io/aquasecurity/trivy:latest
Input 2: ghcr.io/aquasecurity/trivy:v0.65.0
Components:
ImageID: sha256:abc123...
Registry: ghcr.io
Repository: aquasecurity/trivy
Result: sha256:def456... (same for both)
Example 2: Different Repositories → Different Artifact IDs
Input 1: ghcr.io/aquasecurity/trivy:v0.65.0
Components: ImageID=sha256:abc123..., Registry=ghcr.io, Repository=aquasecurity/trivy
Result: sha256:def456...
Input 2: ghcr.io/aqua-sec/trivy:v0.65.0
Components: ImageID=sha256:abc123..., Registry=ghcr.io, Repository=aqua-sec/trivy
Result: sha256:ghi789... (different)
Example 3: Different Registries → Different Artifact IDs
Input 1: ghcr.io/aquasecurity/trivy:v0.65.0
Components: ImageID=sha256:abc123..., Registry=ghcr.io, Repository=aquasecurity/trivy
Result: sha256:def456...
Input 2: docker.io/aquasecurity/trivy:v0.65.0
Components: ImageID=sha256:abc123..., Registry=docker.io, Repository=aquasecurity/trivy
Result: sha256:jkl012... (different)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status