Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go.mod: github.com/containerd/containerd v1.7.18, switch to github.com/containerd/errdefs module #599

Merged
merged 2 commits into from
Aug 1, 2024

Conversation

thaJeztah
Copy link
Member

go.mod: github.com/containerd/containerd v1.7.18

full diff: containerd/containerd@v1.7.7...v1.7.18

switch to github.com/containerd/errdefs module

containerd 1.7.18 and up alias the errdefs package to the new module,
and deprecate the package.

@thaJeztah thaJeztah closed this Jun 24, 2024
@thaJeztah thaJeztah reopened this Jun 24, 2024
@thaJeztah thaJeztah marked this pull request as ready for review June 26, 2024 10:20
@thaJeztah
Copy link
Member Author

Hmm... looks like there's failures in CI related to OTEL; looks like multiple metrics servers are tried to be set up on port 9110;

time="2024-06-26T10:15:05.945761805Z" level=info msg="loading plugin \"io.containerd.internal.v1.tracing\"..." type=io.containerd.internal.v1

...

time="2024-06-26T10:15:05.946779174Z" level=error msg="failed to load cni during init, please check CRI plugin status before setting up network for pods" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"

...

umount globally shared mountpoint
umount: /var/lib/containerd-nydus/mnt: no mount point specified.
ls: cannot access '/run/containerd-nydus/containerd-nydus-grpc.sock': No such file or directory
Fail(1). Retrying...
time="2024-06-26T10:15:08.579292914Z" level=info msg="Start nydus-snapshotter. Version: fba89c3, PID: 143, FsDriver: fusedev, DaemonMode: multiple"
time="2024-06-26T10:15:08.581727603Z" level=info msg="Trying to translate bucket records..."
time="2024-06-26T10:15:08.581798015Z" level=info msg="Trying to update bucket records from v1.0 to v1.1 ..."
time="2024-06-26T10:15:08.582908629Z" level=info msg="parsed cgroup config: cgroup.Config{MemoryLimitInBytes:-1}"
time="2024-06-26T10:15:08.582980705Z" level=info msg="cgroup mode: legacy"
time="2024-06-26T10:15:08.583967256Z" level=info msg="create cgroup (v1) successful, state: thawed"
time="2024-06-26T10:15:08.584895731Z" level=info msg="Run daemons monitor..."
time="2024-06-26T10:15:08.585325775Z" level=fatal msg="failed to start nydus-snapshotter" error="failed to initialize snapshotter: start metrics HTTP server: metrics server listener, addr=:9110: listen tcp :9110: bind: address already in use"
ls: cannot access '/run/containerd-nydus/containerd-nydus-grpc.sock': No such file or directory

@thaJeztah
Copy link
Member Author

Looks related to this code, but I'm wondering now if the snapshotted itself should setup a metrics server, or if that's something that only containerd should do

metricServer, err := metrics.NewServer(
ctx,
metrics.WithProcessManagers(fsManagers),
)
if err != nil {
return nil, errors.Wrap(err, "create metrics server")
}
// Start to collect metrics.
if cfg.MetricsConfig.Address != "" {
if err := metrics.NewMetricsHTTPListenerServer(cfg.MetricsConfig.Address); err != nil {
return nil, errors.Wrap(err, "start metrics HTTP server")
}
go func() {
if err := metricServer.StartCollectMetrics(ctx); err != nil {
log.L.WithError(err).Errorf("Failed to start collecting metrics")
}
}()
log.L.Infof("Started metrics HTTP server on %q", cfg.MetricsConfig.Address)
}

@thaJeztah
Copy link
Member Author

Code above was added in fe511b8, which is part of this PR:

@thaJeztah
Copy link
Member Author

cc @cpuguy83 @dmcgowan perhaps one of you knows?

@sctb512
Copy link
Member

sctb512 commented Jun 26, 2024

Code above was added in fe511b8, which is part of this PR:

I think the CI failure is not caused by this patch. It seems the :9110 is already in use.

Can we retry this CI?

@sctb512
Copy link
Member

sctb512 commented Jun 26, 2024

Can we retry this CI?

Otherwise, we can disable metrics server in E2E test.

@sctb512
Copy link
Member

sctb512 commented Jun 26, 2024

The reason why the :9110 port is already in use is the old "containerd-nydus-grpc process is not killed successfully.

https://github.com/containerd/nydus-snapshotter/blob/main/integration/entrypoint.sh#L140

@thaJeztah
Copy link
Member Author

thaJeztah commented Jun 26, 2024

Ah! I guess I misinterpreted the cfg.MetricsConfig.Address to be at the "global" level (so for containerd as a whole), but it's for this snapshotter.

The reason why the :9110 port is already in use is the old "containerd-nydus-grpc process is not killed successfully.

Good one; maybe? I don't have permissions to restart Ci on this repo; I can do a quick close & reopen to try, or perhaps you're able to kick only the failing ones.

This failure didn't happen on my other PR though, and I've seen it fail twice (both before and after rebasing), so .. it's still possible there's a regression somewhere (or change in behavior)

@thaJeztah
Copy link
Member Author

Let me just do a close/reopen to get a fresh run

@thaJeztah thaJeztah closed this Jun 26, 2024
@thaJeztah thaJeztah reopened this Jun 26, 2024
@apostasie
Copy link
Contributor

The reason why the :9110 port is already in use is the old "containerd-nydus-grpc process is not killed successfully.

https://github.com/containerd/nydus-snapshotter/blob/main/integration/entrypoint.sh#L140

Facing the same issue.
The process gets killed, but it is slow to exit: https://github.com/containerd/nydus-snapshotter/blob/main/integration/entrypoint.sh#L145

0.5s is not enough on my machine.

2s does work better. YMMV

Anyhow, this is inherently racy - maybe we can replace by an explicit check (curl the endpoint maybe?).

@imeoer
Copy link
Collaborator

imeoer commented Jul 24, 2024

0.5s is not enough on my machine.

2s does work better. YMMV

Anyhow, this is inherently racy - maybe we can replace by an explicit check (curl the endpoint maybe?).

Thanks for the tip, I will work on this.

@apostasie
Copy link
Contributor

@thaJeztah
Sebastiaan, #606 fixes most of the CI issues (except the kubeconf regression) - if you can rebase on it as soon as it merges, that would be really nice.

Would love to see your PR go through ASAP - as it will definitely help with work going on in #604.

@thaJeztah
Copy link
Member Author

Let me already to a quick rebase to get a fresh run ahead of time. @apostasie are there plans to do a release before switching to containerd v2? If so, I can do one more follow-up to get it to containerd v1.7.20 (using the separate API module).

I'm trying to get rid of the "github.com/containerd/containerd/errdefs" package in our dependency tree, and the nydus-snapshotter is one of the last remaining ones using it 😅

@apostasie
Copy link
Contributor

Let me already to a quick rebase to get a fresh run ahead of time. @apostasie are there plans to do a release before switching to containerd v2? If so, I can do one more follow-up to get it to containerd v1.7.20 (using the separate API module).

I'm trying to get rid of the "github.com/containerd/containerd/errdefs" package in our dependency tree, and the nydus-snapshotter is one of the last remaining ones using it 😅

@thaJeztah I am not a maintainer here, so, do not know about timeline for releases.

My interest is in getting nerdctl to containerd v2 ASAP (we have a working build but it is using my forks: containerd/nerdctl#3173 ), so, I am onto getting dependencies to ctd v2 (on main).

I got stargz-snapshotter, data-accelerator/zdfs, and accelerated-image merged.
soci accepted the PR but want to wait for ctd v2 official release (awslabs/soci-snapshotter#1305).

nydus and hcsshim are the last on my list.

@apostasie
Copy link
Contributor

@thaJeztah CI fixes got merged.
Would you rebase this and get it in?
Then I would rebase #604 on top of this.

containerd 1.7.18 and up alias the errdefs package to the new module,
and deprecate the package.

Signed-off-by: Sebastiaan van Stijn <[email protected]>
@thaJeztah
Copy link
Member Author

Rebased 👍 CI looks to be happy now 🎉

Also prepared a follow-up for containerd v1.7.20, which deprecates another package (to help transition to v2); #610

@thaJeztah
Copy link
Member Author

@imeoer ptal 🤗

@imeoer imeoer merged commit 08b8d7e into containerd:main Aug 1, 2024
16 checks passed
@thaJeztah thaJeztah deleted the c8d_errdefs branch August 1, 2024 06:58
@thaJeztah
Copy link
Member Author

Thx! Rebasing #610 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants