engine: nested exec simplifications and service fixes #7213

sipsma · 2024-04-28T03:11:00Z

This is a combo of:

A few parts extracted out from: [WIP] Finish isolating sessions to each client #6916
A cherry-pick of @vito's fix here: namespace services by server, not by client #6914

Basically, I finally got back to #6916 and found the bare minimum changes needed to get #6914 working fully, which is the first commit here. But then I also realized that I actually needed #6914 in order for services to work and existing tests to pass; basically, the changes ended up being codependent and I had to cherry pick that one.

Either way, each commit has the details in the message. It amounts to a lot of simplifications though besides the service bug fix:

ModuleCallerDigest is gone (it's just client ID now)
All nested execs share a server with the main client caller (whereas previously module functions shared the server but plain execs that were nested did not)
(from @vito's commit) ParentClientIDs is all gone too

Also worth noting this is hopefully just the beginning of simplifications, more that are made possible by these changes and some other recent ones:

ServerID and entire concept of DaggerServer can be rm'd (will most likely end up doing that as part of [WIP] Finish isolating sessions to each client #6916 now after revisiting it)
ftp proxy hack can go away (helped enormously by removing all these IDs, but the new custom executor added in CA support should seal the deal here)

Added an integ test for the service bug fixed and a backfilled one for #6951

sipsma · 2024-04-30T16:04:13Z

Only test failures are compute cache which has upstream buildkit fixes pending: #6111

Not even bothering with reruns since it's so flaky, passes locally and a known separate issue

engine/buildkit/socket.go

This is an internal only refactor, though it fixes a few bugs while also simplifying quite a bit and setting us up for more simplifications soon. The biggest change is that nested execs connect back to the same server as the main client caller rather than being completely independent. * This is required for the fix to services used in modules (separate PR) to fully work * It also should fix the lack of docker auth in many of our integ tests, specifically those that use nested execs, which leads to dockerhub rate limiting Along the way it also does some consolidation of IDs, removing ModuleCallerDigest and just exclusively using ClientID. This requires that we tell module functions and other nested execs which ID to use, but that itself is setup for even more simplifications in follow-ups (we can remove the need for the current DaggerServer construct entirely, among other things). Signed-off-by: Erik Sipsma <[email protected]>

Previously it was possible to start a dependent service in one module API call, and then use it again in a later call, only to have it fail because it cannot resolve the service address, even though it's still running. This happened because each invocation has its own client ID, and client IDs were used to build service addresses. This change brings service addresses into alignment with the recent change to uniq them by service ID instead of client ID. The overall effect is that services are deduped within a Dagger invocation, even across module calls. So with this change, the service will just stay running and be re-used by a later call, thanks to the grace period. Signed-off-by: Alex Suraci <[email protected]>

Signed-off-by: Erik Sipsma <[email protected]>

We previously never explicitly removed client ID -> secret token mappings because it theoretically opened more possibilities for malicious attempts to register a client ID with a different token. However, we need to deregister these now since Client IDs are a content hash of the function call/nested exec definition, which means the same client ID can connect and disconnect multiple times per server. The security implications of this also end up being extremely minimal. Registering a client ID with a different secret token was and still is possible *before* a client fully connects. It is possible to after a client disconnects now but this would only amount to a DOS since the "real" client would just be unable to connect. No information would be leaked. It also would have to be in the same server (i.e. a module or nested exec called by the main client directly or transitively). This issue can also be squashed by not leaking the buildkit sock to nested execs/modules, which is possible now by migrating functionality from our shim to our custom executor. There's no immediate plans to do this but the possibility is open whenever needed (or when we make that change for other reasons). Signed-off-by: Erik Sipsma <[email protected]>

Signed-off-by: Erik Sipsma <[email protected]>

vito

So happy to see this I might even attempt to explain it to my parents on vacation.

vito · 2024-04-30T19:29:49Z

core/git.go

- // NB: only configure search domains if we're directly using a service, or
- // if we're nested beneath another search domain.
- //
- // we have to be a bit selective here to avoid breaking Dockerfile builds
- // that use a Buildkit frontend (# syntax = ...) that doesn't have the
- // networks API cap.
- //
- // TODO: add API cap


👍 - I think this comment has been outdated for a while, should be fine to simplify. I might have only even noticed it while testing Bass's Dockerfile # syntax support, which has always been janky because of weird Docker <-> Buildkit <-> buildx versioning weirdness anyway (oci-layout:// source doesn't work for example, so no Container.import). Either way, the c2c networking code doesn't use a networks API cap anymore.

edit: oh wait, this is my own change lol. well I APPROVE

vito · 2024-04-30T19:32:15Z

core/integration/module_test.go

@@ -5628,6 +5628,118 @@ func TestModuleUnicodePath(t *testing.T) {
 require.JSONEq(t, `{"test":{"hello":"hello"}}`, out)
 }

+func TestModuleStartServices(t *testing.T) {


* engine: consolidate IDs and re-use servers for nested execs This is an internal only refactor, though it fixes a few bugs while also simplifying quite a bit and setting us up for more simplifications soon. The biggest change is that nested execs connect back to the same server as the main client caller rather than being completely independent. * This is required for the fix to services used in modules (separate PR) to fully work * It also should fix the lack of docker auth in many of our integ tests, specifically those that use nested execs, which leads to dockerhub rate limiting Along the way it also does some consolidation of IDs, removing ModuleCallerDigest and just exclusively using ClientID. This requires that we tell module functions and other nested execs which ID to use, but that itself is setup for even more simplifications in follow-ups (we can remove the need for the current DaggerServer construct entirely, among other things). Signed-off-by: Erik Sipsma <[email protected]> * namespace services by server, not by client Previously it was possible to start a dependent service in one module API call, and then use it again in a later call, only to have it fail because it cannot resolve the service address, even though it's still running. This happened because each invocation has its own client ID, and client IDs were used to build service addresses. This change brings service addresses into alignment with the recent change to uniq them by service ID instead of client ID. The overall effect is that services are deduped within a Dagger invocation, even across module calls. So with this change, the service will just stay running and be re-used by a later call, thanks to the grace period. Signed-off-by: Alex Suraci <[email protected]> * fix routing of host services to correct client Signed-off-by: Erik Sipsma <[email protected]> * deregister secret tokens once client disconnects We previously never explicitly removed client ID -> secret token mappings because it theoretically opened more possibilities for malicious attempts to register a client ID with a different token. However, we need to deregister these now since Client IDs are a content hash of the function call/nested exec definition, which means the same client ID can connect and disconnect multiple times per server. The security implications of this also end up being extremely minimal. Registering a client ID with a different secret token was and still is possible *before* a client fully connects. It is possible to after a client disconnects now but this would only amount to a DOS since the "real" client would just be unable to connect. No information would be leaked. It also would have to be in the same server (i.e. a module or nested exec called by the main client directly or transitively). This issue can also be squashed by not leaking the buildkit sock to nested execs/modules, which is possible now by migrating functionality from our shim to our custom executor. There's no immediate plans to do this but the possibility is open whenever needed (or when we make that change for other reasons). Signed-off-by: Erik Sipsma <[email protected]> * add integ test coverage Signed-off-by: Erik Sipsma <[email protected]> --------- Signed-off-by: Erik Sipsma <[email protected]> Signed-off-by: Alex Suraci <[email protected]> Signed-off-by: Erik Sipsma <[email protected]> Co-authored-by: Alex Suraci <[email protected]> Signed-off-by: kpenfound <[email protected]>

* engine: consolidate IDs and re-use servers for nested execs This is an internal only refactor, though it fixes a few bugs while also simplifying quite a bit and setting us up for more simplifications soon. The biggest change is that nested execs connect back to the same server as the main client caller rather than being completely independent. * This is required for the fix to services used in modules (separate PR) to fully work * It also should fix the lack of docker auth in many of our integ tests, specifically those that use nested execs, which leads to dockerhub rate limiting Along the way it also does some consolidation of IDs, removing ModuleCallerDigest and just exclusively using ClientID. This requires that we tell module functions and other nested execs which ID to use, but that itself is setup for even more simplifications in follow-ups (we can remove the need for the current DaggerServer construct entirely, among other things). Signed-off-by: Erik Sipsma <[email protected]> * namespace services by server, not by client Previously it was possible to start a dependent service in one module API call, and then use it again in a later call, only to have it fail because it cannot resolve the service address, even though it's still running. This happened because each invocation has its own client ID, and client IDs were used to build service addresses. This change brings service addresses into alignment with the recent change to uniq them by service ID instead of client ID. The overall effect is that services are deduped within a Dagger invocation, even across module calls. So with this change, the service will just stay running and be re-used by a later call, thanks to the grace period. Signed-off-by: Alex Suraci <[email protected]> * fix routing of host services to correct client Signed-off-by: Erik Sipsma <[email protected]> * deregister secret tokens once client disconnects We previously never explicitly removed client ID -> secret token mappings because it theoretically opened more possibilities for malicious attempts to register a client ID with a different token. However, we need to deregister these now since Client IDs are a content hash of the function call/nested exec definition, which means the same client ID can connect and disconnect multiple times per server. The security implications of this also end up being extremely minimal. Registering a client ID with a different secret token was and still is possible *before* a client fully connects. It is possible to after a client disconnects now but this would only amount to a DOS since the "real" client would just be unable to connect. No information would be leaked. It also would have to be in the same server (i.e. a module or nested exec called by the main client directly or transitively). This issue can also be squashed by not leaking the buildkit sock to nested execs/modules, which is possible now by migrating functionality from our shim to our custom executor. There's no immediate plans to do this but the possibility is open whenever needed (or when we make that change for other reasons). Signed-off-by: Erik Sipsma <[email protected]> * add integ test coverage Signed-off-by: Erik Sipsma <[email protected]> --------- Signed-off-by: Erik Sipsma <[email protected]> Signed-off-by: Alex Suraci <[email protected]> Signed-off-by: Erik Sipsma <[email protected]> Co-authored-by: Alex Suraci <[email protected]>

sipsma requested review from vito and jedevc April 28, 2024 03:11

sipsma force-pushed the consolidate-server-and-ids branch from a9c1c7b to 0940f0c Compare April 30, 2024 03:03

sipsma added this to the v0.11.3 milestone Apr 30, 2024

sipsma commented Apr 30, 2024

View reviewed changes

engine/buildkit/socket.go Outdated Show resolved Hide resolved

sipsma and others added 5 commits April 30, 2024 12:33

fix routing of host services to correct client

a8785b8

Signed-off-by: Erik Sipsma <[email protected]>

add integ test coverage

67c1050

Signed-off-by: Erik Sipsma <[email protected]>

sipsma force-pushed the consolidate-server-and-ids branch from 9dc8ff2 to 67c1050 Compare April 30, 2024 19:33

vito approved these changes Apr 30, 2024

View reviewed changes

sipsma merged commit 16f018a into dagger:main Apr 30, 2024
41 of 44 checks passed

This was referenced Apr 30, 2024

namespace services by server, not by client #6914

Closed

[WIP] Finish isolating sessions to each client #6916

Draft

sipsma mentioned this pull request May 10, 2024

Support Socket args from the CLI #6747

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

engine: nested exec simplifications and service fixes #7213

engine: nested exec simplifications and service fixes #7213

sipsma commented Apr 28, 2024 •

edited

sipsma commented Apr 30, 2024

vito left a comment

vito Apr 30, 2024

vito Apr 30, 2024

engine: nested exec simplifications and service fixes #7213

engine: nested exec simplifications and service fixes #7213

Conversation

sipsma commented Apr 28, 2024 • edited

sipsma commented Apr 30, 2024

vito left a comment

Choose a reason for hiding this comment

vito Apr 30, 2024

Choose a reason for hiding this comment

vito Apr 30, 2024

Choose a reason for hiding this comment

sipsma commented Apr 28, 2024 •

edited