run e2e test on tmpfs #22533

Luap99 · 2024-04-29T13:51:21Z

Follow up to commit eaf60c7, lets see how bad things are going to break.

Does this PR introduce a user-facing change?

None

packit-as-a-service · 2024-04-29T14:08:00Z

Ephemeral COPR build failed. @containers/packit-build please check.

test/e2e/common_test.go

edsantiago · 2024-04-29T14:52:37Z

Error logs are misleading. Actual error seems to be in BeforeEtc:

  # podman [options] load -q -i /var/tmp/registry.fedoraproject.org-fedora-toolbox-36.tar
           Error: payload does not match any of the supported image formats:
            * oci: open /var/tmp/registry.fedoraproject.org-fedora-toolbox-36.tar/index.json: not a directory
            * oci-archive: loading index: open /var/tmp/container_images_oci1129938589/index.json: no such file or directory
            * docker-archive: writing blob: adding layer with blob "sha256:5067dfc06cd159373bab350ebcb8986dd729e64324592b6f28bbcdb6fc5efda0": processing tar file(write /usr/lib64/dri/nouveau_drv_video.so: no space left on device): exit status 1
            * dir: open /var/tmp/registry.fedoraproject.org-fedora-toolbox-36.tar/manifest.json: not a directory

Luap99 · 2024-04-29T15:18:49Z

Error logs are misleading. Actual error seems to be in BeforeEtc:

  # podman [options] load -q -i /var/tmp/registry.fedoraproject.org-fedora-toolbox-36.tar
           Error: payload does not match any of the supported image formats:
            * oci: open /var/tmp/registry.fedoraproject.org-fedora-toolbox-36.tar/index.json: not a directory
            * oci-archive: loading index: open /var/tmp/container_images_oci1129938589/index.json: no such file or directory
            * docker-archive: writing blob: adding layer with blob "sha256:5067dfc06cd159373bab350ebcb8986dd729e64324592b6f28bbcdb6fc5efda0": processing tar file(write /usr/lib64/dri/nouveau_drv_video.so: no space left on device): exit status 1
            * dir: open /var/tmp/registry.fedoraproject.org-fedora-toolbox-36.tar/manifest.json: not a directory

Yeah I think the first step is to remove the big images we use, the image dir is already over 1.1G so we should get this down as first step. Maybe we can get away with 4 Gb RAM then.
Also why the hell did this even run if the setup failed?! That needs fixing too.

edsantiago · 2024-04-29T16:04:18Z

This maybe?

diff --git a/test/e2e/common_test.go b/test/e2e/common_test.go
index bc1f73b82..f617c3a58 100644
--- a/test/e2e/common_test.go
+++ b/test/e2e/common_test.go
@@ -1027,6 +1027,7 @@ func (p *PodmanTestIntegration) RestoreArtifactToCache(image string) error {
 		p.Root = p.ImageCacheDir
 		restore := p.PodmanNoEvents([]string{"load", "-q", "-i", tarball})
 		restore.WaitWithDefaultTimeout()
+		Expect(restore).To(ExitCleanly())
 	}
 	return nil
 }

Luap99 · 2024-04-29T16:05:36Z

This maybe?

diff --git a/test/e2e/common_test.go b/test/e2e/common_test.go
index bc1f73b82..f617c3a58 100644
--- a/test/e2e/common_test.go
+++ b/test/e2e/common_test.go
@@ -1027,6 +1027,7 @@ func (p *PodmanTestIntegration) RestoreArtifactToCache(image string) error {
 		p.Root = p.ImageCacheDir
 		restore := p.PodmanNoEvents([]string{"load", "-q", "-i", tarball})
 		restore.WaitWithDefaultTimeout()
+		Expect(restore).To(ExitCleanly())
 	}
 	return nil
 }

yes

Luap99 · 2024-04-29T16:05:58Z

Q: why did debian pass?
A: debian VM images do not have tmpfs mounted at /tmp

edsantiago · 2024-04-29T16:07:53Z

A: debian VM images do not have tmpfs mounted at /tmp

Yeah... I noticed that while playing with my fedora tmpfs changes. AFAICT that is the Debian default, not something we do in VM creation, so I left it as-is.

Luap99 · 2024-04-29T16:11:46Z

A: debian VM images do not have tmpfs mounted at /tmp

Yeah... I noticed that while playing with my fedora tmpfs changes. AFAICT that is the Debian default, not something we do in VM creation, so I left it as-is.

filled containers/automation_images#350 to track this, but first let's see how much of a difference this really makes here first

edsantiago · 2024-04-29T17:48:08Z

I think you're onto something. Ballpark shows tests running in ~30m, down from ~40

type	distro	user	DB	local	remote	container
int	rawhide	root		27:37	29:44
int	rawhide	rootless		28:55
int	fedora-39	root		27:42	28:59	!31:15
int	fedora-39	rootless		27:02
int	fedora-38	root	boltdb	38:25	35:19	!29:37
int	fedora-38	rootless	boltdb	!28:17
int	debian-13	root		32:49	31:52
int	debian-13	rootless		29:21
sys	rawhide	root		01:04	01:13
sys	rawhide	rootless		01:07

Luap99 · 2024-04-29T18:27:27Z

Yeah that looks like a solid start, not sure how accurate the time report is
https://api.cirrus-ci.com/v1/artifact/task/4837578803773440/runner_stats/int%20podman%20fedora-39%20root%20host%20sqlite-runner_stats.log
https://api.cirrus-ci.com/v1/artifact/task/5963478710616064/runner_stats/int%20podman%20fedora-38%20root%20host%20boltdb-runner_stats.log

We clearly got the system time down a lot which signals IO is a problem, also we still did not max out the CPU per the stats reporting in the cirrus UI so this surprises me a bit because locally this goes full out and I have close to 100% CPU usage on all cores. But maybe the cirrus graph is not counting everything?

Luap99 · 2024-05-02T16:48:29Z

@edsantiago What do you think about the TMPDIR change? No idea if it will pass but it gives us a random dir each time so it should not allow anyone to hard code /tmp/something paths. It is always on /tmp though so unless we manually mount some new tmpfs to a random root dir I don't think there is a better way.

edsantiago · 2024-05-02T16:53:03Z

I like it.

edsantiago · 2024-05-02T17:00:16Z

Oops: containerized

Don't re-push yet though; I'm really curious to see how this'll work out

Luap99 · 2024-05-02T17:02:21Z

Oops: containerized

Don't re-push yet though; I'm really curious to see how this'll work out

Which is perfect as it shows my check actually works, and yeah I let it run and will continue tomorrow.

edsantiago · 2024-05-02T18:58:32Z

I can't reproduce. Doesn't seem to be AVC. Could it be that the new remount is adding noexec? (SWAG. I haven't looked into it deeply enough and am unlikely to do so today)

Luap99 · 2024-05-03T09:35:38Z

I can't reproduce. Doesn't seem to be AVC. Could it be that the new remount is adding noexec? (SWAG. I haven't looked into it deeply enough and am unlikely to do so today)

Which of the failures are you talking about? There just to many to make sense of this. /tmp does not have noexec set, if it would be then no container would be able to run in the e2e tests.

Luap99 · 2024-05-03T11:28:01Z

type	distro	user	DB	local	remote	container
int	rawhide	root		28:12	26:42
int	rawhide	rootless		27:36
int	fedora-39	root		28:07	27:59	27:15
int	fedora-39	rootless		27:04
int	fedora-38	root	boltdb	29:11	30:03	26:52
int	fedora-38	rootless	boltdb	29:46
int	debian-13	root		29:36	28:29
int	debian-13	rootless		!29:14

Seems to be a bit under 30m now, looking at other PRs they seem to be in the range of 35-40+ mins so I think it is safe to say this is a noticeable change.

I will try to drop the toolbox change as I think it also helps a bit and I only want to see the tmpfs diff.

The image is way to big (over 800MB) that slows tests down as we always have to pull this, the tests itself are also super slow due the entrypoint logic that we don't care about. We should be testing for features needed and not specific tools. I think the current changes should have a similar coverage in terms of podman features, it no longer tests toolbox but IMO this never was a task for podman CI tests. The main driver for this is to make the tests run entirely based on tmpfs and this image is just to much[1]. [1] containers#22533 Signed-off-by: Paul Holzinger <[email protected]>

Luap99 · 2024-05-03T16:10:41Z

[+1075s] not ok 285 [125] podman export, alter tarball, re-import
         # tags: distro-integration
         # (in test file test/system/[125-import.bats, line 89](https://github.com/containers/podman/blob/9fa6d0d5cc375790a143f33817303c32b0846be4/test/system/125-import.bats#L89))
         #   `tar -C $PODMAN_TMPDIR -rf $PODMAN_TMPDIR/$b_cnt.tar tmp/testfile2' failed with status 2
         #
<+     > # # podman rm -t 0 --all --force --ignore
         #
<+046ms> # # podman ps --all --external --format {{.ID}} {{.Names}}
         #
<+051ms> # # podman images --all --format {{.Repository}}:{{.Tag}} {{.ID}}
<+045ms> # quay.io/libpod/testimage:20240123 1f6acd4c4a1d
         #
<+608ms> # # podman build -t before_change_img /tmp/CI_GsPZ/podman_bats.HLYIBo
<+485ms> # STEP 1/3: FROM quay.io/libpod/testimage:20240123
         # STEP 2/3: ADD testfile1 /tmp
         # --> a9813c7c65da
         # STEP 3/3: WORKDIR /tmp
         # COMMIT before_change_img
         # --> ef8a0675b6e9
         # Successfully tagged localhost/before_change_img:latest
         # ef8a0675b6e9ad6e9a85d112ec86f3753290f88b763f592381b0fd406950567d
         #
<+009ms> # # podman create --name before_change_cnt before_change_img
<+085ms> # cb630d35167fa2d230f93076774d4fda2f2747f63f3fb07d3a52a25b65fe2287
         #
<+009ms> # # podman export -o /tmp/CI_GsPZ/podman_bats.HLYIBo/before_change_cnt.tar before_change_cnt
         #
<+159ms> # # podman rm -t 0 -f before_change_cnt
<+075ms> # before_change_cnt
         # # tar --delete -f (tmpdir)/before_change_cnt.tar tmp/testfile1
         # # tar -C (tmpdir) -rf (tmpdir)/before_change_cnt.tar tmp/testfile2
         # tar: Skipping to next header
         # tar: Skipping to next header
         # tar: Exiting with failure status due to previous errors

I suspect we hit another tar bug now in debian, maybe the same as #19407??

Other issue:

rm: cannot remove '/tmp/CI_0FhD/buildah3968082470/mnt': Permission denied

Seems to be the removal of the new TMPDIR I create, not seen on every run but seem to be rootless only so I guess it could be due missing podman unshare and leaked files without different owners...
But the real issue should be we are leaking temporarily buildah files which is really not good.

The image is way to big (over 800MB) that slows tests down as we always have to pull this, the tests itself are also super slow due the entrypoint logic that we don't care about. We should be testing for features needed and not specific tools. I think the current changes should have a similar coverage in terms of podman features, it no longer tests toolbox but IMO this never was a task for podman CI tests. The main driver for this is to make the tests run entirely based on tmpfs and this image is just to much[1]. [1] containers#22533 Signed-off-by: Paul Holzinger <[email protected]>

cevich · 2024-05-08T18:22:39Z

Yes sure that is the next step but one step at the time, for now lets get the tmpfs change done.

Yep, np. Was just concerned if maybe there were some failures due to OOM (I haven't looked).

The image is way to big (over 800MB) that slows tests down as we always have to pull this, the tests itself are also super slow due the entrypoint logic that we don't care about. We should be testing for features needed and not specific tools. I think the current changes should have a similar coverage in terms of podman features, it no longer tests toolbox but IMO this never was a task for podman CI tests. The main driver for this is to make the tests run entirely based on tmpfs and this image is just to much[1]. [1] containers#22533 Signed-off-by: Paul Holzinger <[email protected]>

packit-as-a-service · 2024-05-13T11:52:57Z

Cockpit tests failed for commit 385c493. @martinpitt, @jelly, @mvollmer please check.

Follow up to commit eaf60c7, with the toolbox image removal it is possible to run all tests from tmpfs. Signed-off-by: Paul Holzinger <[email protected]>

from containers/automation_images#351 Signed-off-by: Paul Holzinger <[email protected]>

First, setup a custom TMPDIR to ensure we have no special assumptions about hard coded paths. Second, make sure it is actually on a tmpfs so we can catch regressions in the VM setup immediately. Signed-off-by: Paul Holzinger <[email protected]>

This reverts commit 02b8fd7. The new CI images should have a apparmor workaround. Fixes containers#22625 Signed-off-by: Paul Holzinger <[email protected]>

packit-as-a-service · 2024-05-13T15:48:20Z

Ephemeral COPR build failed. @containers/packit-build please check.

Luap99 · 2024-05-13T15:53:50Z

This is good to review now, I think tests are going to pass this time around
@cevich @edsantiago @containers/podman-maintainers PTAL

packit-as-a-service · 2024-05-13T16:05:01Z

Cockpit tests failed for commit 9233864. @martinpitt, @jelly, @mvollmer please check.

edsantiago

Nice and clean. LGTM with one suggestion if you need to re-push for other reasons.

Keeping my fingers crossed about the new pasta.

edsantiago · 2024-05-13T16:16:57Z

contrib/cirrus/runner.sh

+ export TMPDIR
+ fstype=$(findmnt -n -o FSTYPE --target $TMPDIR)
+ if [[ "$fstype" != "tmpfs" ]]; then
+ die "The CI test TMPDIR is not on a tmpfs mount, we need tmpfs to make the tests faster"


Not worth a re-push, but: in case of failure, this will be very difficult to debug. A more helpful message might be something like

die "The CI test TMPDIR ($TMPDIR) fs type is '$fstype'; it should be 'tmpfs' (PR #22533)"

openshift-ci · 2024-05-13T16:17:37Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Luap99,edsantiago]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

edsantiago · 2024-05-13T17:32:34Z

Timing results:

type	distro	user	DB	local	remote	container
int	rawhide	root		31:02	26:26
int	rawhide	rootless		29:28
int	fedora-40	root		28:04	26:47	26:28
int	fedora-40	rootless		27:42
int	fedora-39	root	boltdb	28:11	30:25	28:29
int	fedora-39	rootless	boltdb	29:47
int	debian-13	root		29:23	28:56
int	debian-13	rootless		25:56

Seems to shave 2-5 minutes on podman local, maybe 5-8 remote.

rhatdan · 2024-05-13T18:17:31Z

/lgtm

openshift-ci bot added release-note-none do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 29, 2024

edsantiago reviewed Apr 29, 2024

View reviewed changes

test/e2e/common_test.go Show resolved Hide resolved

Luap99 force-pushed the e2e-tmp-ci branch from 3030ec7 to 3624917 Compare April 29, 2024 16:15

Luap99 force-pushed the e2e-tmp-ci branch 2 times, most recently from bd1b2c0 to 59a60bf Compare May 2, 2024 16:32

Luap99 force-pushed the e2e-tmp-ci branch from 59a60bf to 7a370d0 Compare May 3, 2024 09:57

Luap99 force-pushed the e2e-tmp-ci branch from 7a370d0 to b5604a4 Compare May 3, 2024 11:28

Luap99 mentioned this pull request May 3, 2024

test/e2e: remove toolbox image #22591

Merged

Luap99 force-pushed the e2e-tmp-ci branch from 64548eb to 21088da Compare May 8, 2024 10:59

edsantiago mentioned this pull request May 8, 2024

Update CI VMs to F40, F39, D13 #22549

Merged

Luap99 force-pushed the e2e-tmp-ci branch 2 times, most recently from 1829e7f to 0f955f4 Compare May 8, 2024 14:50

Luap99 mentioned this pull request May 8, 2024

Couldn't open network namespace /var/tmp/...cut...: Permission denied #22625

Closed

Luap99 force-pushed the e2e-tmp-ci branch from 0f955f4 to 385c493 Compare May 13, 2024 11:31

Luap99 added 4 commits May 13, 2024 17:26

run e2e test on tmpfs

e771618

Follow up to commit eaf60c7, with the toolbox image removal it is possible to run all tests from tmpfs. Signed-off-by: Paul Holzinger <[email protected]>

use new CI images with tmpfs /tmp

6e655c7

from containers/automation_images#351 Signed-off-by: Paul Holzinger <[email protected]>

CI tests: enforce TMPDIR on tmpfs

5901bf5

First, setup a custom TMPDIR to ensure we have no special assumptions about hard coded paths. Second, make sure it is actually on a tmpfs so we can catch regressions in the VM setup immediately. Signed-off-by: Paul Holzinger <[email protected]>

Revert "Temporarily disable rootless debian e2e testing"

9233864

This reverts commit 02b8fd7. The new CI images should have a apparmor workaround. Fixes containers#22625 Signed-off-by: Paul Holzinger <[email protected]>

Luap99 force-pushed the e2e-tmp-ci branch from 385c493 to 9233864 Compare May 13, 2024 15:27

Luap99 changed the title ~~WIP: run e2e test on tmpfs~~ run e2e test on tmpfs May 13, 2024

Luap99 marked this pull request as ready for review May 13, 2024 15:50

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 13, 2024

edsantiago approved these changes May 13, 2024

View reviewed changes

openshift-ci bot assigned rhatdan May 13, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 13, 2024

openshift-merge-bot bot merged commit c9808e7 into containers:main May 13, 2024
88 of 91 checks passed

Luap99 deleted the e2e-tmp-ci branch May 13, 2024 18:32

Luap99 mentioned this pull request May 13, 2024

debian: use tmpfs on /tmp + bump /tmp size on fedora containers/automation_images#351

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run e2e test on tmpfs #22533

run e2e test on tmpfs #22533

Luap99 commented Apr 29, 2024

packit-as-a-service bot commented Apr 29, 2024

edsantiago commented Apr 29, 2024

Luap99 commented Apr 29, 2024

edsantiago commented Apr 29, 2024

Luap99 commented Apr 29, 2024

Luap99 commented Apr 29, 2024

edsantiago commented Apr 29, 2024

Luap99 commented Apr 29, 2024

edsantiago commented Apr 29, 2024

Luap99 commented Apr 29, 2024

Luap99 commented May 2, 2024

edsantiago commented May 2, 2024

edsantiago commented May 2, 2024

Luap99 commented May 2, 2024

edsantiago commented May 2, 2024

Luap99 commented May 3, 2024

Luap99 commented May 3, 2024

Luap99 commented May 3, 2024

cevich commented May 8, 2024

packit-as-a-service bot commented May 13, 2024

packit-as-a-service bot commented May 13, 2024

Luap99 commented May 13, 2024

packit-as-a-service bot commented May 13, 2024

edsantiago left a comment

edsantiago May 13, 2024

openshift-ci bot commented May 13, 2024

edsantiago commented May 13, 2024

rhatdan commented May 13, 2024

run e2e test on tmpfs #22533

run e2e test on tmpfs #22533

Conversation

Luap99 commented Apr 29, 2024

Does this PR introduce a user-facing change?

packit-as-a-service bot commented Apr 29, 2024

edsantiago commented Apr 29, 2024

Luap99 commented Apr 29, 2024

edsantiago commented Apr 29, 2024

Luap99 commented Apr 29, 2024

Luap99 commented Apr 29, 2024

edsantiago commented Apr 29, 2024

Luap99 commented Apr 29, 2024

edsantiago commented Apr 29, 2024

Luap99 commented Apr 29, 2024

Luap99 commented May 2, 2024

edsantiago commented May 2, 2024

edsantiago commented May 2, 2024

Luap99 commented May 2, 2024

edsantiago commented May 2, 2024

Luap99 commented May 3, 2024

Luap99 commented May 3, 2024

Luap99 commented May 3, 2024

cevich commented May 8, 2024

packit-as-a-service bot commented May 13, 2024

packit-as-a-service bot commented May 13, 2024

Luap99 commented May 13, 2024

packit-as-a-service bot commented May 13, 2024

edsantiago left a comment

Choose a reason for hiding this comment

edsantiago May 13, 2024

Choose a reason for hiding this comment

openshift-ci bot commented May 13, 2024

edsantiago commented May 13, 2024

rhatdan commented May 13, 2024