Skip to content

feat: include registry and repository in artifact ID calculation #9678

@knqyf263

Description

@knqyf263

Background

Trivy recently introduced an ArtifactID field to uniquely identify scan targets across different artifact types (#9662, #9663). For container images, the current implementation uses the Image ID (config blob hash) directly as the Artifact ID.

After team discussion, we identified that this approach is insufficient for our use cases. We need a more nuanced Artifact ID generation that considers the repository context while maintaining consistency across tags of the same image.

Problem Statement

The current implementation using only Image ID as Artifact ID does not meet our requirements for distinguishing between:

  • Images from different repositories within the same registry
  • Images from different registries
  • Images with the same content but different repository contexts

This leads to incorrect deduplication and tracking of vulnerabilities across different deployment contexts.

Requirements

The new Artifact ID generation for container images must satisfy the following requirements:

1. Same Artifact ID Requirements

Images should have the same Artifact ID when:

  • They share the same Image ID (config blob hash)
  • They are from the same registry
  • They are from the same repository
  • They only differ in tags

Example:

ghcr.io/aquasecurity/trivy:latest
ghcr.io/aquasecurity/trivy:v0.65.0

These should have the same Artifact ID (assuming same Image ID).

2. Different Artifact ID Requirements

Images should have different Artifact IDs when they are from:

a. Different Repositories (Same Registry)

Example:

ghcr.io/aquasecurity/trivy:v0.65.0
ghcr.io/aqua-sec/trivy:v0.65.0

Even with the same Image ID, these should have different Artifact IDs.

b. Different Registries

Example:

ghcr.io/aquasecurity/trivy:v0.65.0
docker.io/aquasecurity/trivy:v0.65.0

Even with the same Image ID, these should have different Artifact IDs.

Proposed Solution

Artifact ID Calculation

The Artifact ID for container images should be calculated as:

ArtifactID = hash(ImageID + Registry + Repository)

Where:

  • ImageID: The existing image configuration blob hash (sha256:...)
  • Registry: The registry hostname (e.g., ghcr.io, docker.io)
  • Repository: The repository path without the tag (e.g., aquasecurity/trivy)

Implementation Details

1. Parsing Image References

The implementation must correctly parse image references to extract:

  • Registry hostname
  • Repository path
  • Tag/digest (to be excluded from calculation)
// Example parsing
// Input: ghcr.io/aquasecurity/trivy:v0.65.0
// Parsed:
//   Registry: ghcr.io
//   Repository: aquasecurity/trivy
//   Tag: v0.65.0 (excluded from Artifact ID)

2. Hash Function

  • Use SHA256 for consistency with existing Image ID format
  • Combine components in a deterministic order
  • Format: sha256:<hash>
func GenerateArtifactID(imageID, registry, repository string) string {
    input := fmt.Sprintf("%s:%s:%s", imageID, registry, repository)
    hash := sha256.Sum256([]byte(input))
    return fmt.Sprintf("sha256:%x", hash)
}

3. Edge Cases Handling

  • Default Registry: Images without explicit registry should default to docker.io
  • Port Handling: Registry URLs with ports should be normalized (e.g., localhost:5000)
  • Multi-level Repositories: Handle paths like registry/org/team/image correctly
  • Digest References: Images referenced by digest should still use the repository path

Examples

Given two images with the same Image ID sha256:abc123...:

Example 1: Same Repository, Different Tags → Same Artifact ID

Input 1: ghcr.io/aquasecurity/trivy:latest
Input 2: ghcr.io/aquasecurity/trivy:v0.65.0

Components:
  ImageID: sha256:abc123...
  Registry: ghcr.io
  Repository: aquasecurity/trivy

Result: sha256:def456... (same for both)

Example 2: Different Repositories → Different Artifact IDs

Input 1: ghcr.io/aquasecurity/trivy:v0.65.0
  Components: ImageID=sha256:abc123..., Registry=ghcr.io, Repository=aquasecurity/trivy
  Result: sha256:def456...

Input 2: ghcr.io/aqua-sec/trivy:v0.65.0
  Components: ImageID=sha256:abc123..., Registry=ghcr.io, Repository=aqua-sec/trivy
  Result: sha256:ghi789... (different)

Example 3: Different Registries → Different Artifact IDs

Input 1: ghcr.io/aquasecurity/trivy:v0.65.0
  Components: ImageID=sha256:abc123..., Registry=ghcr.io, Repository=aquasecurity/trivy
  Result: sha256:def456...

Input 2: docker.io/aquasecurity/trivy:v0.65.0
  Components: ImageID=sha256:abc123..., Registry=docker.io, Repository=aquasecurity/trivy
  Result: sha256:jkl012... (different)

Metadata

Metadata

Assignees

Labels

kind/featureCategorizes issue or PR as related to a new feature.target/container-imageIssues relating to container image scanning

Projects

Status

No status

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions