Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support passing FDs (socket activation) #6296

Open
flokli opened this issue May 3, 2024 · 19 comments
Open

Support passing FDs (socket activation) #6296

flokli opened this issue May 3, 2024 · 19 comments
Labels
discussion 💬 The right solution needs to be found feature ⚙️ New feature or request

Comments

@flokli
Copy link

flokli commented May 3, 2024

I'd like to use caddy in a socket-activated environments, using FDs passed down from the service manager, rather than binding on addresses on its own.

Combined with signalling readyness (which caddy already does), this will give zero-downtime (re)deployments on Linux systems using systemd (if .socket files are used), by simply restarting the process - the socket is held open by systemd, and new connections are passed in once caddy is ready to accept new requests. In these cases, there wouldn't be a need for complicated reload logic anymore.

github.com/coreos/go-systemd/activation provides the necessary methods to check whether FDs are passed, including identifying them by their socket name. https://vincent.bernat.ch/en/blog/2018-systemd-golang-socket-activation gives a nice introduction into the feature itself.

In case no explicit listen addresses are specified, caddy could default to do that rather than binding on its own, if it detects it's running in such an environment.
Additionally, Caddyfile could be extended to allow specifying these passed fds as network addresses (something like sd-listen:$name or sd-listen:$idx maybe). This can become useful when you want to expose different things on different sockets.

@francislavoie
Copy link
Member

francislavoie commented May 3, 2024

Are you looking for the bind directive? https://caddyserver.com/docs/caddyfile/directives/bind

And see https://caddyserver.com/docs/conventions#network-addresses, you can use unix sockets in reverse_proxy upstreams.

I'm not sure what you're asking for if not that.

@mholt
Copy link
Member

mholt commented May 3, 2024

It sounds like what is being asked for is graceful upgrades/restarts.

Caddy 1 had this feature, and I quite liked how it worked: pass the socket directly to the next process. It worked on all Unix systems without relying on a separate system service, and it was smart enough to understand Caddy configuration: if the new config didn't use a socket, it wouldn't be kept; rather than blindly moving all the sockets over.

I'd probably rather bring the implementation from Caddy 1 into Caddy 2.

@mholt mholt added feature ⚙️ New feature or request discussion 💬 The right solution needs to be found labels May 3, 2024
@flokli
Copy link
Author

flokli commented May 3, 2024

It sounds like what is being asked for is graceful upgrades/restarts.

No, getting this for free is only one side-effect of supporting socket-activation.

socket-activation will also cause caddy to get started lazily whenever the first connection to the (externally configured) socket address happens, which simplifies declaring service dependencies too.

The article linked from my link elaborates a bit more on this.

Caddy 1 had this feature, and I quite liked how it worked: pass the socket directly to the next process.

This still requires caddy to do manual coordination with its new process and pass it around explicitly. The point of simply taking the FDs passed by the service manager is that caddy does not have to be aware of whether it's the first process being started on the system, or you start a new version with another config. caddy simply gets an FD, where new connections will appear on.

@flokli
Copy link
Author

flokli commented May 3, 2024

Ah yes, and because caddy just takes FDs, it doesn't need to bind() on its own, which allows applying stronger sandboxing from the outside.

@mholt
Copy link
Member

mholt commented May 3, 2024

@flokli

This still requires caddy to do manual coordination with its new process and pass it around explicitly. The point of simply taking the FDs passed by the service manager is that caddy does not have to be aware of whether it's the first process being started on the system, or you start a new version with another config. caddy simply gets an FD, where new connections will appear on.

But what is Caddy supposed to do with that socket? How does it know the configuration associated with it? You can't just hand a server a socket and expect it to know what to do with it, without any configuration... maybe I am missing something about how it works.

@flokli
Copy link
Author

flokli commented May 3, 2024

Sockets can have names attached (so the user can name them http and https for example, or api and metrics), and we could add a syntax to refer to them via these names in Caddyfile. I could say I want a http server on sd-listen:http, which would then expect a listener named http to be passed to caddy.

All these passed FDs also give you a net.Listener interface, so even without explicit config caddy could still check the properties of it and apply some heuristics too (detect port 80 and 443 if you got two unnamed TCP sockets), if we want to apply some out-of-the-box behaviour in these scenarios. But getting the basic support for it (using an externally-passed FD by its name/index) and defining the syntax for it would be a nice first step.

You can play around with this through systemd-socket-activate -l 8088 -l 8089 --fdname=foo:bar -- /path/to-caddy, which will give you two TCP sockets listening on the two ports, named foo and bar.

@mholt
Copy link
Member

mholt commented May 3, 2024

Oh I see, so you'd still have your Caddy config, you'd just specify a different network name for the listener address, and Caddy will then get it from the service manager rather than binding a new socket.

@flokli
Copy link
Author

flokli commented May 3, 2024

Yes! Or well, I don't want caddy to do any bind on its own at all, but pass in every socket via this mechanism.

@mholt
Copy link
Member

mholt commented May 3, 2024

In that case you can use bind in your site blocks to get the socket from the service manager. We'd just need to implement a package that calls caddy.RegisterNetwork(). For example the caddy-tailscale package does this so that Tailscale can provide a listener.

Anyone is welcome to pick this up.

@WeidiDeng
Copy link
Member

@mholt I did some experiments with registering custom network, it's too much trouble to be worth it. Every site block needs an explicit bind and that includes http port and http3 udp socket.

@flokli I'm thinking on unix, we can try preferring socket activation but fallback to the old behavior. What do you think of it? Or should caddy just exit unsuccessfully if socket activation environments variables are found but not sockets matching listening critertia are found? Or if some warning logs are emitted?

As mentioned above, you are responsible to pass every socket yourself, including 80 tcp and 443 udp if auto http->https and http3 are enabled respectively. And admin socket if enabled as well. Assuming you restart caddy instead of reload it.

@climba03003
Copy link

I would really see it happen and it can greatly reduce my network stack complexity.
Currently, I am have two caddy in front of server and I face a lot of instability because of podman networking.
I change to using socket to see if it works better (no more DNS resolution).

flowchart TD
    A[Caddy] -->|Reverse Proxy| B{Container Network}
    B -->|Serve Frontend| C[Caddy]
    C -->|Reverse Proxy| D[Server]

When socket activation becomes a thing, it can also reduce resources usage. Because the middle caddy can be terminated when no one connected for some time. If the outer one can be socket activated, it will directly pass the socket to inner one and benefit of direct network connection.

@flokli
Copy link
Author

flokli commented May 6, 2024

@mholt I did some experiments with registering custom network, it's too much trouble to be worth it. Every site block needs an explicit bind and that includes http port and http3 udp socket.

@flokli I'm thinking on unix, we can try preferring socket activation but fallback to the old behavior. What do you think of it? Or should caddy just exit unsuccessfully if socket activation environments variables are found but not sockets matching listening critertia are found? Or if some warning logs are emitted?

I think ti makes sense to first land the feature with explicit configuration, which might mean explicit bind statements, and once that's in, think about having more opinionated defaults in case we are in a socket-activated environment.

The good thing is, it's pretty safe to detect whether caddy is running in a socket-activated environment or not, so we are able to change defaults in this case, without breaking existing usecases.

@WeidiDeng
Copy link
Member

@flokli So that means you're fine with mixing passing FD and current binding behavior? And since you will use bind explicitly, it's an error to bind to an non existent FD.

The problem with names is that one name can map to many sockets with different addresses, how do you think caddy handle this situation?

@eliasp
Copy link

eliasp commented May 6, 2024

Until this is implemented: for those that just care about binding to ports <1024 AND not running Caddy as root, can use systemd's SocketBindAllow= (available since systemd 249)

@mohammed90
Copy link
Member

for those that just care about binding to ports <1024 AND not running Caddy as root,

There was never a need to run Caddy as root on Linux. Our standard systemd unit file is shipped with CAP_NET_BIND_SERVICE which allows the service to run without root. The SocketBindAllow and SocketBindDeny allows further restriction to specific ports rather than any port below 1024.

@flokli
Copy link
Author

flokli commented May 8, 2024

I'm aware of CAP_NET_BIND_SERVICE to allow non-root processes to bind to lower ports, that's not why I'm advocating for this feature.

Giving the option to move the whole socket binding business entirely out of caddy is what I'm advocating for, both from a sandboxing (it doesn't need to be allowed to bind() if it doesn't have to, it doesn't even need to have access to the network namespace the bind happens in) and zero downtime restart/configuration update controlled by the service manager.

@flokli So that means you're fine with mixing passing FD and current binding behavior? And since you will use bind explicitly, it's an error to bind to an non existent FD.

Yes, I think the bind syntax should be extended, to allow specifying "use this passed FD rather than binding yourself". Slightly unfortunate name, but well 🤷.

This would also mean, caddy would still bind on its own where we don't explicitly configure it to use the FD(s).

The problem with names is that one name can map to many sockets with different addresses, how do you think caddy handle this situation?

Indeed FileDescriptorName= describes such name applies to all sockets in that .socket file, so sd-listen:http would could identify multiple FDs, not just a single one.

I think I'd be fine landing support for having to explicitly use bind statements everywhere first, working out the syntax for it, and once that's stabilized, I'd think about how a nice out-of-the-box behaviour could look like, if caddy detects it is running in a socket activated scenario.

@WeidiDeng
Copy link
Member

@flokli You can try it with a plugin for now, xcaddy build --with github.com/WeidiDeng/caddy-socket-activation. Let me know what you think.

@caddyserver caddyserver deleted a comment from Karanvarm May 13, 2024
@balki
Copy link

balki commented May 22, 2024

I wrote a small go library to listen on socket activated fds.
https://github.com/balki/anyhttp/blob/main/anyhttp.go#L147

@francislavoie
Copy link
Member

Interesting, this could be turned into a Caddy plugin by using caddy.RegisterNetwork() @balki

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion 💬 The right solution needs to be found feature ⚙️ New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants