Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't expand RUN heredocs ourselves, let the shell do it #5473

Merged
merged 2 commits into from May 20, 2024

Conversation

nalind
Copy link
Member

@nalind nalind commented Apr 11, 2024

What type of PR is this?

/kind bug

What this PR does / why we need it:

When handling RUN instructions that use heredoc syntax, don't bother interpolating environment variables and argument values, and let the command that's running handle it.

Try to improve error reporting when we're committing an image, so that we don't lose out-of-space errors through layers of compression and editing of file headers in the layer diffs.

How to verify it

New conformance test!

Which issue(s) this PR fixes:

Special notes for your reviewer:

We fail to build e.g. https://github.com/devfile/developer-images/tree/main/universal/ubi8 when the heredoc references environment variables which are set in the heredoc itself, since we currently expand them before passing the heredoc as an argument to the command being invoked.

Does this PR introduce a user-facing change?

Environment variables referenced in RUN instructions which use heredoc syntax will now be evaluated by the command being invoked rather than by the builder.

Copy link

Ephemeral COPR build failed. @containers/packit-build please check.

@nalind nalind force-pushed the heredoc-quoting branch 5 times, most recently from 6ad8b5f to f06f82d Compare April 12, 2024 15:46
go.mod Outdated
go 1.20
go 1.21

toolchain go1.21.9
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nalind I am not sure but I think this directive is a blocking thing because of podman, I created a thread here. #5472 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'm going to try some things in the vendor Makefile target.

@nalind nalind force-pushed the heredoc-quoting branch 2 times, most recently from b25d215 to c3c0536 Compare April 12, 2024 17:38
Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you wait with updating go until we have formal understanding on how to do it consistently across all our repos.
I would suggest to focus on containers/skopeo#2297 first and then we apply the changes from there in the other repos.

Makefile Outdated
GO111MODULE=on $(GO) mod tidy
GO111MODULE=on $(GO) mod vendor
GO111MODULE=on $(GO) mod verify
if go env | grep -q GOTOOLCHAIN ; then go mod edit -toolchain=none ; fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will break all renovate PRs as it doesn't use our makefile

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh. Backing it out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to try adding the variable to our local configuration for renovate.

Makefile Outdated
GO111MODULE=on $(GO) mod vendor
GO111MODULE=on $(GO) mod verify
if go env | grep -q GOTOOLCHAIN ; then go mod edit -toolchain=none ; fi
GOTOOLCHAIN=local GO111MODULE=on $(GO) mod tidy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why you force GOTOOLCHAIN=local here only in the vendor target.
If anything it must be set for every go command.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backing it out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, trying again with "export" this time.

@nalind nalind changed the title WIP: don't expand RUN heredocs ourselves, let the shell do it Don't expand RUN heredocs ourselves, let the shell do it Apr 16, 2024
Copy link
Collaborator

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As always you're way ahead of me. Thank you. One request inline.

tests/bud/no-history/Dockerfile Outdated Show resolved Hide resolved
@@ -1,7 +1,8 @@
# The important thing about that first base image is that it has no history
# entries, but it does have at least one layer.

FROM nixery.dev/shell AS first-stage
FROM example.com/shell AS first-stage
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking anything on quay.io, or basically anything that does not introduce a new failure pain point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, using example.com has the advantage of producing an image reference that can't be confused with something that exists (or might once have existed, if we come back to this in the future) in an actual registry somewhere. How strongly are you in favor of a quay.io name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, something like fakeregistry.podman.io/notreal makes the intend clear that this is not a real image. And in cases where we ever try to pull by accident we at least hit a domain under our control and name lookup should fail for that.
And ideally you add a comment in the Dockerfile that this image is being created in the test and should not be pulled.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my question was: instead of on-demand building each time, can this image be built once (with saved-committed-tracked script and instructions), say onto quay.io/libpod/nohistory:20240422, and then fetched on demand. This is a very difficult test to review. If any step ever breaks in a CI run, it will cost a lot of energy to investigate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe I'm authorized to push to that repository, so that's not a thing I can do in this PR, or set up for this PR to use. The rest I can do.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Just in case it is of interest, https://datatracker.ietf.org/doc/html/rfc6761#section-6.4 specifies that the .invalid TLD should never resolve; that might be useful for similar situations.)

tests/bud.bats Outdated
# image) with the name the Dockerfile will specify as its base image.
run_buildah tag ${configdigest%% *} example.com/shell
# Build images using our image-with-no-history as a base, to check that we
# don't trip over ourselves when doing so.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether you go the build-each-time or build-once-and-pull-when-testing approach, I would really appreciate a step such as run_buildah inspect History and confirm that it is empty.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding one.

Copy link
Contributor

openshift-ci bot commented Apr 29, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, nalind

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Makefile Outdated
$(GO) mod tidy
$(GO) mod vendor
$(GO) mod verify
if test -n "$(strip $(shell go env GOTOOLCHAIN))"; then go mod edit -toolchain none ; fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rather not remove the toolchain line and have behave consistently across all our projects (containers/skopeo#2297, containers/common#1963 and others ...)
I don't think this is wrong, the end result is the same as far as I can tell but consistency is key IMO

cc @mtrmac

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever changes we do here impact the vendoring validation tests. I'm not averse to splitting it out, but it has effects on things that come after it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://go.dev/ref/mod:

For reproducibility, the go command writes its own toolchain name in a toolchain line any time it is updating the go version in the go.mod file

and IIRC that’s the only time that happens. So, in c/image and Skopeo we have a toolchain 1.21.0 line; then go get does not trigger any updates to 1.21.x as long as we stay on Go 1.21.


A possible unanticipated downside of the toolchain 1.21.0 approach is that Renovate files PRs just to update the toolchain, as in containers/image#2382 .

I don’t think we know for sure what Renovate does with Go ≥ 1.21 if there is no toolchain directive at all, AFAIK all updates to 1.21 were using the toolchain 1.21.0 approach so far.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… I think what I’m saying here is that IIRC we haven’t considered the approach of just editing out the toolchain dependency.

And it might actually be a pretty good solution; if nothing else, reflects our intent more precisely than a 1.21.0 value.

… but also, I… don’t personally want to be spending much more time on toolchain. If this approach is better, I’ll happily help switch other repos, but I’d rather not invest much time evaluating it myself.

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also as far as actual review goes I am totally confused about the totally different set of changes here. Te me this look like a bunch of independent changes thrown together.

tests/bud.bats Outdated
@@ -333,7 +333,7 @@ _EOF
}

@test "bud build with heredoc content" {
_prefetch fedora
_prefetch quay.io/fedora/python-311
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please only ever use images from quay.io/libpod/...
Second to we actually have to pull a large python image for this? I don't see the relevance of python specific code in herdoc. Can just as well test a simple shell script IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's probably some value in checking that the right thing happens when the designated consumer of the heredoc is something other than the shell. I don't think anything the quay.io/libpod organization includes a python interpreter, but if there is one, I wouldn't mind using it instead.

tests/bud.bats Outdated
@@ -4547,17 +4547,17 @@ EOM
}

@test "bud arg and env var with same name" {
_prefetch centos:8
_prefetch busybox
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for the other image please make sure to use quay.io/libpod images if we already change them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For historical reasons there's some super complicated bizarre logic in the tests such that "busybox" and "alpine" translate to quay.io/libpod. See https://github.com/containers/buildah/blob/3763681506611b476cdd3377ee669ecaa7014b8b/tests/registries.conf

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have the impression that this is a collection of quite unrelated changes; although most are individually desirable, it’s harder to understand and discuss.

In particular, it took me quite a long time to understand that the stated topic of the PR is being achieved by an update of github.com/openshift/imagebuilder.

Makefile Outdated
@@ -1,4 +1,6 @@
export GOPROXY=https://proxy.golang.org
export GOTOOLCHAIN=local
export GO111MODULE=on
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Maybe not in this PR?) I think with current Go versions (the go 1.21 directive in go.mod is being enforced, so we can assume a recent version), it should be possible to never refer to this variable, and to assume modules throughout.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“this variable” being GO111MODULE, to be explicit.

Makefile Outdated
$(GO) mod tidy
$(GO) mod vendor
$(GO) mod verify
if test -n "$(strip $(shell go env GOTOOLCHAIN))"; then go mod edit -toolchain none ; fi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this go env condition checking for? It’s not absolutely obvious to me, my best guess is “is this a new enough Go to understand GOTOOLCHAIN?”

And if that’s the question, the go 1.21 directive in go.mod ensures the answer is “yes” (or, alternatively, if building with a much older Go which does not enforce the go 1.21 directive, the build is going to fail on a missing library feature anyway).

digester.go Outdated
if err == nil {
err = err1
}
pipeReader.CloseWithError(err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always worry excessively that a failing goroutine will leave the consumer hanging.

Consider something like

func someOperationGoroutine() {
	err := errors.New("Internal error: unexpected panic in someOperationGoroutine")
	defer func() { // Ensure we _always_ wake up the caller
		_ = pipeReader.CloseWithError(err) // CloseWithError(nil) is equivalent to Close(), always returns nil
                wg.Done()
	}()
	err = someOperation()
}
func someOperation() {
    // straight-line worry-free Go code with liberally sprinkled early `return`s and any other control flow
}

@nalind
Copy link
Member Author

nalind commented Apr 30, 2024

Spun out #5493, #5494, #5495, #5496, #5497, #5498, #5499.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 3, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 6, 2024
@nalind nalind force-pushed the heredoc-quoting branch 5 times, most recently from 0823914 to c84b0f6 Compare May 9, 2024 19:29
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 10, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 10, 2024
@nalind nalind force-pushed the heredoc-quoting branch 3 times, most recently from fdcc7a6 to f837728 Compare May 16, 2024 17:25
When handling RUN instructions that use heredoc syntax, don't bother
interpolating environment variables and argument values, and let the
command that's running handle it.

Signed-off-by: Nalin Dahyabhai <[email protected]>
When we get a tried-to-write-to-closed-pipe error while encoding
something for a coprocess, try to capture error output from the
coprocess and add it to the error message, to hopefully catch a flake
we're seeing in CI.

Signed-off-by: Nalin Dahyabhai <[email protected]>
@rhatdan
Copy link
Member

rhatdan commented May 20, 2024

LGTM

@rhatdan
Copy link
Member

rhatdan commented May 20, 2024

@mtrmac @Luap99 PTAL

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm label May 20, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 7f0a322 into containers:main May 20, 2024
32 of 34 checks passed
@nalind nalind deleted the heredoc-quoting branch May 20, 2024 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved kind/bug Categorizes issue or PR as related to a bug. lgtm
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants