Fix image caching logic to detect changes in intermediate stage FROM images #1843
+402
−5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
The image caching logic was incorrectly skipping builds when a Dockerfile had intermediate stages based on images that had changed. The cache validation only checked the final stage's base image, not all FROM images used throughout the Dockerfile.
Example Scenario
Consider this multi-stage Dockerfile from dotnet-buildtools-prereqs-docker:
Before this fix: Cache hit reported ❌ (only checked
crossdeps-llvm, missed thecrossdeps-builderchange)After this fix: Cache miss reported ✅ (checks both
crossdeps-builderandcrossdeps-llvm, detects the change)This caused builds to skip images that actually needed to be rebuilt because they copied artifacts from changed intermediate stages.
Solution
1. Track All FROM Images
Added a
FromImagesproperty toPlatformDatathat stores a dictionary mapping each FROM image to its digest:This property is serialized to the image-info.json file and tracks all FROM images in the Dockerfile, not just the final stage.
2. Populate FROM Images During Build
Created
SetPlatformDataFromImagesAsync()inBuildCommandto:FromImagesdictionary3. Validate All FROM Images During Cache Check
Enhanced
ImageCacheServicewithAreFromImageDigestsUpToDateAsync()to:FromImages4. Maintain Backward Compatibility
If
FromImagesis null or empty (e.g., in existing image-info files), the code falls back to the legacy behavior of checking only the final stage base image. This ensures existing workflows continue to work while new builds benefit from the improved caching logic.Changes
FromImagespropertyFromImagesduring buildTesting
Added
BuildCommand_Caching_IntermediateStageChangetest that validates:Impact
This fix ensures that images with multi-stage Dockerfiles are correctly rebuilt when any of their FROM dependencies change, not just the final stage. This is particularly important for the dotnet-buildtools-prereqs-docker repository where this issue was originally observed.
Fixes scenario described in the issue where changes to crossdeps-builder Dockerfile were not triggering rebuilds of the cross image that depended on it.
Implementation suggested by @lbussell in issue comments.
Warning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
11vvsblobprodcus336.vsblob.vsassets.iodotnet build(dns block)2kmvsblobprodcus39.vsblob.vsassets.iodotnet build(dns block)2zrvsblobprodcus388.vsblob.vsassets.iodotnet build(dns block)37cvsblobprodcus359.vsblob.vsassets.iodotnet build(dns block)4myvsblobprodcus32.vsblob.vsassets.iodotnet build(dns block)4zjvsblobprodcus390.vsblob.vsassets.iodotnet build(dns block)5dkvsblobprodcus355.vsblob.vsassets.iodotnet build(dns block)5rqvsblobprodcus385.vsblob.vsassets.iodotnet build(dns block)7devsblobprodcus323.vsblob.vsassets.iodotnet build(dns block)7tjvsblobprodcus341.vsblob.vsassets.iodotnet build(dns block)8xbvsblobprodcus382.vsblob.vsassets.iodotnet build(dns block)ajhvsblobprodcus363.vsblob.vsassets.iodotnet build(dns block)bcnvsblobprodcus378.vsblob.vsassets.iodotnet build(dns block)fdpvsblobprodcus345.vsblob.vsassets.iodotnet build(dns block)frdvsblobprodcus327.vsblob.vsassets.iodotnet build(dns block)ibzvsblobprodcus369.vsblob.vsassets.iodotnet build(dns block)imzvsblobprodcus368.vsblob.vsassets.iodotnet build(dns block)jd4vsblobprodcus366.vsblob.vsassets.iodotnet build(dns block)josvsblobprodcus372.vsblob.vsassets.iodotnet build(dns block)jrqvsblobprodcus343.vsblob.vsassets.iodotnet build(dns block)k0ivsblobprodcus356.vsblob.vsassets.iodotnet build(dns block)kgfvsblobprodcus314.vsblob.vsassets.iodotnet build(dns block)kh4vsblobprodcus325.vsblob.vsassets.iodotnet build(dns block)kijvsblobprodcus387.vsblob.vsassets.iodotnet build(dns block)l49vsblobprodcus358.vsblob.vsassets.iodotnet build(dns block)m16vsblobprodcus374.vsblob.vsassets.iodotnet build(dns block)p2ovsblobprodcus312.vsblob.vsassets.iodotnet build(dns block)pdfvsblobprodcus380.vsblob.vsassets.iodotnet build(dns block)pe4vsblobprodcus351.vsblob.vsassets.iodotnet build(dns block)pkvvsblobprodcus321.vsblob.vsassets.iodotnet build(dns block)s4uvsblobprodcus326.vsblob.vsassets.iodotnet build(dns block)tphvsblobprodcus375.vsblob.vsassets.iodotnet build(dns block)u3hvsblobprodcus371.vsblob.vsassets.iodotnet build(dns block)ukkvsblobprodcus352.vsblob.vsassets.iodotnet build(dns block)uy6vsblobprodcus34.vsblob.vsassets.iodotnet build(dns block)vb4vsblobprodcus33.vsblob.vsassets.iodotnet build(dns block)vwvvsblobprodcus334.vsblob.vsassets.iodotnet build(dns block)xupvsblobprodcus332.vsblob.vsassets.iodotnet build(dns block)yluvsblobprodcus367.vsblob.vsassets.iodotnet build(dns block)If you need me to access, download, or install something from one of these locations, you can either:
Original prompt
This section details on the original issue you should resolve
<issue_title>Image caching logic doesn't take into account dependencies from intermediate stages</issue_title>
<issue_description>There is a scenario where the image caching logic incorrectly skips building a Dockerfile when that Dockerfile has an intermediate stage based on an image that has a cache miss. An example of this scenario is the this build (internal link). That build was triggered from this PR: dotnet/dotnet-buildtools-prereqs-docker#1207. In that PR, a change was made only to the crossdeps-builder Dockerfile. The build only ended up publishing that crossdeps-builder image and none of the images that are dependent on it.
Here is an example Dockerfile with such a dependency: https://github.com/dotnet/dotnet-buildtools-prereqs-docker/blob/642ddfa47486760a15781758c7e8c0709c4c96c0/src/azurelinux/3.0/net9.0/cross/amd64/Dockerfile#L3. The image caching logic only accounts for the dependency of the final image. In this case, the final image is based on crossdeps-llvm, which has not changed. So the cross Dockerfile never gets built. But it needs to be built because the content it contains is copied from crossdeps-builder, which has changed.</issue_description>
Comments on the Issue (you are @copilot in this section)
@lbussell Instead of adding lots of complexity to our custom caching mechanism, maybe it's time to evaluate using [Docker's built-in caching](https://docs.docker.com/build/cache/backends/registry/). There have been lots of changes since BuildKit became the default. If we store the cache on the registry, Docker *should* cache most or all layers using the normal [cache invalidation rules](https://docs.docker.com/build/cache/invalidation/), which includes base image changes and individual Dockerfile instructions. If we use the "`max`" cache option, then it'll also store intermediate stages/layers on the registry. @mthalman > Instead of adding lots of complexity to our custom caching mechanism, maybe it's time to evaluate using [Docker's built-in caching](https://docs.docker.com/build/cache/backends/registry/).Maybe. But that defeats the whole purpose of #1449. 🤷♂️
@lbussell [Triage] One way to fix this would be to include all `FROM` images in the Dockerfile in the image-info file on https://github.com/dotnet/versions. Then we could re-build when any of those images change. However, this is not a small amount of work for this one scenario that really only affects the buildtools-prereqs repo. Therefore, we should close this as not planned for now, and consider re-opening in the future if it becomes a bigger issue. @lbussell [Triage] This would still be high-cost to fix, still only applies to the buildtools prereqs repo, and still has valid workarounds. A lower-cost solution would be to write a guide/workflow on the buildtools-prereqs repo for handling these scenarios.To validate these scenarios in PRs, it would also be good to add a no-cache build leg to that repo. Special care should be taken to avoid spending too much compute time with this.</comment_new>
<comment_new>@lbussell
This has been on the sprint column for 5 months. I am moving it back to current release. @mthalman I see this is on two boards at once. Should it be removed from the .NET Docker board?</comment_new>
<comment_new>@mthalman
Yeah, I think it's fine to remove it since it only applies to the buildtools repo.</comment_new>
<comment_new>@lbussell
If I were going to fix this today, I'd probably add a new
FromImagesproperty toPlatformDataalongside what already exists.docker-tools/src/ImageBuilder/Models/Image/PlatformData.cs
Lines 35 to 36 in 46e34e7
You would need to add more logic to BuildCommand.cs to fill out the
FromImagesproperty inPlatformData. I would probably add it toSetPlatformDataBaseDigestsince it already does something similar:docker-tools/src/ImageBuilder/Commands/BuildCommand.cs
Lines 257 to 285 in 46e34e7
Then, we would need to update ImageCacheService.cs to invalidate the cache when any of the FROM images were changed, not just the base image digest.
This would also probably get us some rudiment...
Fixes #1455
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.