Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

engine: nested exec simplifications and service fixes #7213

Merged
merged 5 commits into from Apr 30, 2024

Conversation

sipsma
Copy link
Contributor

@sipsma sipsma commented Apr 28, 2024

This is a combo of:

Basically, I finally got back to #6916 and found the bare minimum changes needed to get #6914 working fully, which is the first commit here. But then I also realized that I actually needed #6914 in order for services to work and existing tests to pass; basically, the changes ended up being codependent and I had to cherry pick that one.

Either way, each commit has the details in the message. It amounts to a lot of simplifications though besides the service bug fix:

  • ModuleCallerDigest is gone (it's just client ID now)
  • All nested execs share a server with the main client caller (whereas previously module functions shared the server but plain execs that were nested did not)
  • (from @vito's commit) ParentClientIDs is all gone too

Also worth noting this is hopefully just the beginning of simplifications, more that are made possible by these changes and some other recent ones:

  • ServerID and entire concept of DaggerServer can be rm'd (will most likely end up doing that as part of [WIP] Finish isolating sessions to each client #6916 now after revisiting it)
  • ftp proxy hack can go away (helped enormously by removing all these IDs, but the new custom executor added in CA support should seal the deal here)

Added an integ test for the service bug fixed and a backfilled one for #6951

@sipsma sipsma requested review from vito and jedevc April 28, 2024 03:11
@sipsma sipsma force-pushed the consolidate-server-and-ids branch from a9c1c7b to 0940f0c Compare April 30, 2024 03:03
@sipsma sipsma added this to the v0.11.3 milestone Apr 30, 2024
@sipsma
Copy link
Contributor Author

sipsma commented Apr 30, 2024

Only test failures are compute cache which has upstream buildkit fixes pending: #6111

Not even bothering with reruns since it's so flaky, passes locally and a known separate issue

engine/buildkit/socket.go Outdated Show resolved Hide resolved
sipsma and others added 5 commits April 30, 2024 12:33
This is an internal only refactor, though it fixes a few bugs while also
simplifying quite a bit and setting us up for more simplifications soon.

The biggest change is that nested execs connect back to the same server
as the main client caller rather than being completely independent.
* This is required for the fix to services used in modules (separate PR)
  to fully work
* It also should fix the lack of docker auth in many of our integ tests,
  specifically those that use nested execs, which leads to dockerhub rate
  limiting

Along the way it also does some consolidation of IDs, removing
ModuleCallerDigest and just exclusively using ClientID. This requires
that we tell module functions and other nested execs which ID to use,
but that itself is setup for even more simplifications in follow-ups (we
can remove the need for the current DaggerServer construct entirely,
among other things).

Signed-off-by: Erik Sipsma <[email protected]>
Previously it was possible to start a dependent service in one module
API call, and then use it again in a later call, only to have it fail
because it cannot resolve the service address, even though it's still
running. This happened because each invocation has its own client ID,
and client IDs were used to build service addresses.

This change brings service addresses into alignment with the recent
change to uniq them by service ID instead of client ID. The overall
effect is that services are deduped within a Dagger invocation, even
across module calls. So with this change, the service will just stay
running and be re-used by a later call, thanks to the grace period.

Signed-off-by: Alex Suraci <[email protected]>
We previously never explicitly removed client ID -> secret token
mappings because it theoretically opened more possibilities for
malicious attempts to register a client ID with a different token.

However, we need to deregister these now since Client IDs are a content
hash of the function call/nested exec definition, which means the same
client ID can connect and disconnect multiple times per server.

The security implications of this also end up being extremely minimal.
Registering a client ID with a different secret token was and still is
possible *before* a client fully connects. It is possible to after a
client disconnects now but this would only amount to a DOS since the
"real" client would just be unable to connect. No information would be
leaked. It also would have to be in the same server (i.e. a module or
nested exec called by the main client directly or transitively).

This issue can also be squashed by not leaking the buildkit sock to
nested execs/modules, which is possible now by migrating functionality
from our shim to our custom executor. There's no immediate plans to do
this but the possibility is open whenever needed (or when we make that
change for other reasons).

Signed-off-by: Erik Sipsma <[email protected]>
Signed-off-by: Erik Sipsma <[email protected]>
@sipsma sipsma force-pushed the consolidate-server-and-ids branch from 9dc8ff2 to 67c1050 Compare April 30, 2024 19:33
Copy link
Contributor

@vito vito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So happy to see this I might even attempt to explain it to my parents on vacation.

Comment on lines -108 to -115
// NB: only configure search domains if we're directly using a service, or
// if we're nested beneath another search domain.
//
// we have to be a bit selective here to avoid breaking Dockerfile builds
// that use a Buildkit frontend (# syntax = ...) that doesn't have the
// networks API cap.
//
// TODO: add API cap
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 - I think this comment has been outdated for a while, should be fine to simplify. I might have only even noticed it while testing Bass's Dockerfile # syntax support, which has always been janky because of weird Docker <-> Buildkit <-> buildx versioning weirdness anyway (oci-layout:// source doesn't work for example, so no Container.import). Either way, the c2c networking code doesn't use a networks API cap anymore.

edit: oh wait, this is my own change lol. well I APPROVE

@@ -5628,6 +5628,118 @@ func TestModuleUnicodePath(t *testing.T) {
require.JSONEq(t, `{"test":{"hello":"hello"}}`, out)
}

func TestModuleStartServices(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍

@sipsma sipsma merged commit 16f018a into dagger:main Apr 30, 2024
41 of 44 checks passed
kpenfound pushed a commit to kpenfound/dagger that referenced this pull request May 2, 2024
* engine: consolidate IDs and re-use servers for nested execs

This is an internal only refactor, though it fixes a few bugs while also
simplifying quite a bit and setting us up for more simplifications soon.

The biggest change is that nested execs connect back to the same server
as the main client caller rather than being completely independent.
* This is required for the fix to services used in modules (separate PR)
  to fully work
* It also should fix the lack of docker auth in many of our integ tests,
  specifically those that use nested execs, which leads to dockerhub rate
  limiting

Along the way it also does some consolidation of IDs, removing
ModuleCallerDigest and just exclusively using ClientID. This requires
that we tell module functions and other nested execs which ID to use,
but that itself is setup for even more simplifications in follow-ups (we
can remove the need for the current DaggerServer construct entirely,
among other things).

Signed-off-by: Erik Sipsma <[email protected]>

* namespace services by server, not by client

Previously it was possible to start a dependent service in one module
API call, and then use it again in a later call, only to have it fail
because it cannot resolve the service address, even though it's still
running. This happened because each invocation has its own client ID,
and client IDs were used to build service addresses.

This change brings service addresses into alignment with the recent
change to uniq them by service ID instead of client ID. The overall
effect is that services are deduped within a Dagger invocation, even
across module calls. So with this change, the service will just stay
running and be re-used by a later call, thanks to the grace period.

Signed-off-by: Alex Suraci <[email protected]>

* fix routing of host services to correct client

Signed-off-by: Erik Sipsma <[email protected]>

* deregister secret tokens once client disconnects

We previously never explicitly removed client ID -> secret token
mappings because it theoretically opened more possibilities for
malicious attempts to register a client ID with a different token.

However, we need to deregister these now since Client IDs are a content
hash of the function call/nested exec definition, which means the same
client ID can connect and disconnect multiple times per server.

The security implications of this also end up being extremely minimal.
Registering a client ID with a different secret token was and still is
possible *before* a client fully connects. It is possible to after a
client disconnects now but this would only amount to a DOS since the
"real" client would just be unable to connect. No information would be
leaked. It also would have to be in the same server (i.e. a module or
nested exec called by the main client directly or transitively).

This issue can also be squashed by not leaking the buildkit sock to
nested execs/modules, which is possible now by migrating functionality
from our shim to our custom executor. There's no immediate plans to do
this but the possibility is open whenever needed (or when we make that
change for other reasons).

Signed-off-by: Erik Sipsma <[email protected]>

* add integ test coverage

Signed-off-by: Erik Sipsma <[email protected]>

---------

Signed-off-by: Erik Sipsma <[email protected]>
Signed-off-by: Alex Suraci <[email protected]>
Signed-off-by: Erik Sipsma <[email protected]>
Co-authored-by: Alex Suraci <[email protected]>
Signed-off-by: kpenfound <[email protected]>
vikram-dagger pushed a commit to vikram-dagger/dagger that referenced this pull request May 3, 2024
* engine: consolidate IDs and re-use servers for nested execs

This is an internal only refactor, though it fixes a few bugs while also
simplifying quite a bit and setting us up for more simplifications soon.

The biggest change is that nested execs connect back to the same server
as the main client caller rather than being completely independent.
* This is required for the fix to services used in modules (separate PR)
  to fully work
* It also should fix the lack of docker auth in many of our integ tests,
  specifically those that use nested execs, which leads to dockerhub rate
  limiting

Along the way it also does some consolidation of IDs, removing
ModuleCallerDigest and just exclusively using ClientID. This requires
that we tell module functions and other nested execs which ID to use,
but that itself is setup for even more simplifications in follow-ups (we
can remove the need for the current DaggerServer construct entirely,
among other things).

Signed-off-by: Erik Sipsma <[email protected]>

* namespace services by server, not by client

Previously it was possible to start a dependent service in one module
API call, and then use it again in a later call, only to have it fail
because it cannot resolve the service address, even though it's still
running. This happened because each invocation has its own client ID,
and client IDs were used to build service addresses.

This change brings service addresses into alignment with the recent
change to uniq them by service ID instead of client ID. The overall
effect is that services are deduped within a Dagger invocation, even
across module calls. So with this change, the service will just stay
running and be re-used by a later call, thanks to the grace period.

Signed-off-by: Alex Suraci <[email protected]>

* fix routing of host services to correct client

Signed-off-by: Erik Sipsma <[email protected]>

* deregister secret tokens once client disconnects

We previously never explicitly removed client ID -> secret token
mappings because it theoretically opened more possibilities for
malicious attempts to register a client ID with a different token.

However, we need to deregister these now since Client IDs are a content
hash of the function call/nested exec definition, which means the same
client ID can connect and disconnect multiple times per server.

The security implications of this also end up being extremely minimal.
Registering a client ID with a different secret token was and still is
possible *before* a client fully connects. It is possible to after a
client disconnects now but this would only amount to a DOS since the
"real" client would just be unable to connect. No information would be
leaked. It also would have to be in the same server (i.e. a module or
nested exec called by the main client directly or transitively).

This issue can also be squashed by not leaking the buildkit sock to
nested execs/modules, which is possible now by migrating functionality
from our shim to our custom executor. There's no immediate plans to do
this but the possibility is open whenever needed (or when we make that
change for other reasons).

Signed-off-by: Erik Sipsma <[email protected]>

* add integ test coverage

Signed-off-by: Erik Sipsma <[email protected]>

---------

Signed-off-by: Erik Sipsma <[email protected]>
Signed-off-by: Alex Suraci <[email protected]>
Signed-off-by: Erik Sipsma <[email protected]>
Co-authored-by: Alex Suraci <[email protected]>
vikram-dagger pushed a commit to vikram-dagger/dagger that referenced this pull request May 3, 2024
* engine: consolidate IDs and re-use servers for nested execs

This is an internal only refactor, though it fixes a few bugs while also
simplifying quite a bit and setting us up for more simplifications soon.

The biggest change is that nested execs connect back to the same server
as the main client caller rather than being completely independent.
* This is required for the fix to services used in modules (separate PR)
  to fully work
* It also should fix the lack of docker auth in many of our integ tests,
  specifically those that use nested execs, which leads to dockerhub rate
  limiting

Along the way it also does some consolidation of IDs, removing
ModuleCallerDigest and just exclusively using ClientID. This requires
that we tell module functions and other nested execs which ID to use,
but that itself is setup for even more simplifications in follow-ups (we
can remove the need for the current DaggerServer construct entirely,
among other things).

Signed-off-by: Erik Sipsma <[email protected]>

* namespace services by server, not by client

Previously it was possible to start a dependent service in one module
API call, and then use it again in a later call, only to have it fail
because it cannot resolve the service address, even though it's still
running. This happened because each invocation has its own client ID,
and client IDs were used to build service addresses.

This change brings service addresses into alignment with the recent
change to uniq them by service ID instead of client ID. The overall
effect is that services are deduped within a Dagger invocation, even
across module calls. So with this change, the service will just stay
running and be re-used by a later call, thanks to the grace period.

Signed-off-by: Alex Suraci <[email protected]>

* fix routing of host services to correct client

Signed-off-by: Erik Sipsma <[email protected]>

* deregister secret tokens once client disconnects

We previously never explicitly removed client ID -> secret token
mappings because it theoretically opened more possibilities for
malicious attempts to register a client ID with a different token.

However, we need to deregister these now since Client IDs are a content
hash of the function call/nested exec definition, which means the same
client ID can connect and disconnect multiple times per server.

The security implications of this also end up being extremely minimal.
Registering a client ID with a different secret token was and still is
possible *before* a client fully connects. It is possible to after a
client disconnects now but this would only amount to a DOS since the
"real" client would just be unable to connect. No information would be
leaked. It also would have to be in the same server (i.e. a module or
nested exec called by the main client directly or transitively).

This issue can also be squashed by not leaking the buildkit sock to
nested execs/modules, which is possible now by migrating functionality
from our shim to our custom executor. There's no immediate plans to do
this but the possibility is open whenever needed (or when we make that
change for other reasons).

Signed-off-by: Erik Sipsma <[email protected]>

* add integ test coverage

Signed-off-by: Erik Sipsma <[email protected]>

---------

Signed-off-by: Erik Sipsma <[email protected]>
Signed-off-by: Alex Suraci <[email protected]>
Signed-off-by: Erik Sipsma <[email protected]>
Co-authored-by: Alex Suraci <[email protected]>
vikram-dagger pushed a commit to vikram-dagger/dagger that referenced this pull request May 3, 2024
* engine: consolidate IDs and re-use servers for nested execs

This is an internal only refactor, though it fixes a few bugs while also
simplifying quite a bit and setting us up for more simplifications soon.

The biggest change is that nested execs connect back to the same server
as the main client caller rather than being completely independent.
* This is required for the fix to services used in modules (separate PR)
  to fully work
* It also should fix the lack of docker auth in many of our integ tests,
  specifically those that use nested execs, which leads to dockerhub rate
  limiting

Along the way it also does some consolidation of IDs, removing
ModuleCallerDigest and just exclusively using ClientID. This requires
that we tell module functions and other nested execs which ID to use,
but that itself is setup for even more simplifications in follow-ups (we
can remove the need for the current DaggerServer construct entirely,
among other things).

Signed-off-by: Erik Sipsma <[email protected]>

* namespace services by server, not by client

Previously it was possible to start a dependent service in one module
API call, and then use it again in a later call, only to have it fail
because it cannot resolve the service address, even though it's still
running. This happened because each invocation has its own client ID,
and client IDs were used to build service addresses.

This change brings service addresses into alignment with the recent
change to uniq them by service ID instead of client ID. The overall
effect is that services are deduped within a Dagger invocation, even
across module calls. So with this change, the service will just stay
running and be re-used by a later call, thanks to the grace period.

Signed-off-by: Alex Suraci <[email protected]>

* fix routing of host services to correct client

Signed-off-by: Erik Sipsma <[email protected]>

* deregister secret tokens once client disconnects

We previously never explicitly removed client ID -> secret token
mappings because it theoretically opened more possibilities for
malicious attempts to register a client ID with a different token.

However, we need to deregister these now since Client IDs are a content
hash of the function call/nested exec definition, which means the same
client ID can connect and disconnect multiple times per server.

The security implications of this also end up being extremely minimal.
Registering a client ID with a different secret token was and still is
possible *before* a client fully connects. It is possible to after a
client disconnects now but this would only amount to a DOS since the
"real" client would just be unable to connect. No information would be
leaked. It also would have to be in the same server (i.e. a module or
nested exec called by the main client directly or transitively).

This issue can also be squashed by not leaking the buildkit sock to
nested execs/modules, which is possible now by migrating functionality
from our shim to our custom executor. There's no immediate plans to do
this but the possibility is open whenever needed (or when we make that
change for other reasons).

Signed-off-by: Erik Sipsma <[email protected]>

* add integ test coverage

Signed-off-by: Erik Sipsma <[email protected]>

---------

Signed-off-by: Erik Sipsma <[email protected]>
Signed-off-by: Alex Suraci <[email protected]>
Signed-off-by: Erik Sipsma <[email protected]>
Co-authored-by: Alex Suraci <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants