New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
engine: nested exec simplifications and service fixes #7213
Conversation
a9c1c7b
to
0940f0c
Compare
Only test failures are Not even bothering with reruns since it's so flaky, passes locally and a known separate issue |
This is an internal only refactor, though it fixes a few bugs while also simplifying quite a bit and setting us up for more simplifications soon. The biggest change is that nested execs connect back to the same server as the main client caller rather than being completely independent. * This is required for the fix to services used in modules (separate PR) to fully work * It also should fix the lack of docker auth in many of our integ tests, specifically those that use nested execs, which leads to dockerhub rate limiting Along the way it also does some consolidation of IDs, removing ModuleCallerDigest and just exclusively using ClientID. This requires that we tell module functions and other nested execs which ID to use, but that itself is setup for even more simplifications in follow-ups (we can remove the need for the current DaggerServer construct entirely, among other things). Signed-off-by: Erik Sipsma <[email protected]>
Previously it was possible to start a dependent service in one module API call, and then use it again in a later call, only to have it fail because it cannot resolve the service address, even though it's still running. This happened because each invocation has its own client ID, and client IDs were used to build service addresses. This change brings service addresses into alignment with the recent change to uniq them by service ID instead of client ID. The overall effect is that services are deduped within a Dagger invocation, even across module calls. So with this change, the service will just stay running and be re-used by a later call, thanks to the grace period. Signed-off-by: Alex Suraci <[email protected]>
Signed-off-by: Erik Sipsma <[email protected]>
We previously never explicitly removed client ID -> secret token mappings because it theoretically opened more possibilities for malicious attempts to register a client ID with a different token. However, we need to deregister these now since Client IDs are a content hash of the function call/nested exec definition, which means the same client ID can connect and disconnect multiple times per server. The security implications of this also end up being extremely minimal. Registering a client ID with a different secret token was and still is possible *before* a client fully connects. It is possible to after a client disconnects now but this would only amount to a DOS since the "real" client would just be unable to connect. No information would be leaked. It also would have to be in the same server (i.e. a module or nested exec called by the main client directly or transitively). This issue can also be squashed by not leaking the buildkit sock to nested execs/modules, which is possible now by migrating functionality from our shim to our custom executor. There's no immediate plans to do this but the possibility is open whenever needed (or when we make that change for other reasons). Signed-off-by: Erik Sipsma <[email protected]>
Signed-off-by: Erik Sipsma <[email protected]>
9dc8ff2
to
67c1050
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So happy to see this I might even attempt to explain it to my parents on vacation.
// NB: only configure search domains if we're directly using a service, or | ||
// if we're nested beneath another search domain. | ||
// | ||
// we have to be a bit selective here to avoid breaking Dockerfile builds | ||
// that use a Buildkit frontend (# syntax = ...) that doesn't have the | ||
// networks API cap. | ||
// | ||
// TODO: add API cap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 - I think this comment has been outdated for a while, should be fine to simplify. I might have only even noticed it while testing Bass's Dockerfile # syntax
support, which has always been janky because of weird Docker <-> Buildkit <-> buildx versioning weirdness anyway (oci-layout://
source doesn't work for example, so no Container.import). Either way, the c2c networking code doesn't use a networks API cap anymore.
edit: oh wait, this is my own change lol. well I APPROVE
@@ -5628,6 +5628,118 @@ func TestModuleUnicodePath(t *testing.T) { | |||
require.JSONEq(t, `{"test":{"hello":"hello"}}`, out) | |||
} | |||
|
|||
func TestModuleStartServices(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😍
* engine: consolidate IDs and re-use servers for nested execs This is an internal only refactor, though it fixes a few bugs while also simplifying quite a bit and setting us up for more simplifications soon. The biggest change is that nested execs connect back to the same server as the main client caller rather than being completely independent. * This is required for the fix to services used in modules (separate PR) to fully work * It also should fix the lack of docker auth in many of our integ tests, specifically those that use nested execs, which leads to dockerhub rate limiting Along the way it also does some consolidation of IDs, removing ModuleCallerDigest and just exclusively using ClientID. This requires that we tell module functions and other nested execs which ID to use, but that itself is setup for even more simplifications in follow-ups (we can remove the need for the current DaggerServer construct entirely, among other things). Signed-off-by: Erik Sipsma <[email protected]> * namespace services by server, not by client Previously it was possible to start a dependent service in one module API call, and then use it again in a later call, only to have it fail because it cannot resolve the service address, even though it's still running. This happened because each invocation has its own client ID, and client IDs were used to build service addresses. This change brings service addresses into alignment with the recent change to uniq them by service ID instead of client ID. The overall effect is that services are deduped within a Dagger invocation, even across module calls. So with this change, the service will just stay running and be re-used by a later call, thanks to the grace period. Signed-off-by: Alex Suraci <[email protected]> * fix routing of host services to correct client Signed-off-by: Erik Sipsma <[email protected]> * deregister secret tokens once client disconnects We previously never explicitly removed client ID -> secret token mappings because it theoretically opened more possibilities for malicious attempts to register a client ID with a different token. However, we need to deregister these now since Client IDs are a content hash of the function call/nested exec definition, which means the same client ID can connect and disconnect multiple times per server. The security implications of this also end up being extremely minimal. Registering a client ID with a different secret token was and still is possible *before* a client fully connects. It is possible to after a client disconnects now but this would only amount to a DOS since the "real" client would just be unable to connect. No information would be leaked. It also would have to be in the same server (i.e. a module or nested exec called by the main client directly or transitively). This issue can also be squashed by not leaking the buildkit sock to nested execs/modules, which is possible now by migrating functionality from our shim to our custom executor. There's no immediate plans to do this but the possibility is open whenever needed (or when we make that change for other reasons). Signed-off-by: Erik Sipsma <[email protected]> * add integ test coverage Signed-off-by: Erik Sipsma <[email protected]> --------- Signed-off-by: Erik Sipsma <[email protected]> Signed-off-by: Alex Suraci <[email protected]> Signed-off-by: Erik Sipsma <[email protected]> Co-authored-by: Alex Suraci <[email protected]> Signed-off-by: kpenfound <[email protected]>
* engine: consolidate IDs and re-use servers for nested execs This is an internal only refactor, though it fixes a few bugs while also simplifying quite a bit and setting us up for more simplifications soon. The biggest change is that nested execs connect back to the same server as the main client caller rather than being completely independent. * This is required for the fix to services used in modules (separate PR) to fully work * It also should fix the lack of docker auth in many of our integ tests, specifically those that use nested execs, which leads to dockerhub rate limiting Along the way it also does some consolidation of IDs, removing ModuleCallerDigest and just exclusively using ClientID. This requires that we tell module functions and other nested execs which ID to use, but that itself is setup for even more simplifications in follow-ups (we can remove the need for the current DaggerServer construct entirely, among other things). Signed-off-by: Erik Sipsma <[email protected]> * namespace services by server, not by client Previously it was possible to start a dependent service in one module API call, and then use it again in a later call, only to have it fail because it cannot resolve the service address, even though it's still running. This happened because each invocation has its own client ID, and client IDs were used to build service addresses. This change brings service addresses into alignment with the recent change to uniq them by service ID instead of client ID. The overall effect is that services are deduped within a Dagger invocation, even across module calls. So with this change, the service will just stay running and be re-used by a later call, thanks to the grace period. Signed-off-by: Alex Suraci <[email protected]> * fix routing of host services to correct client Signed-off-by: Erik Sipsma <[email protected]> * deregister secret tokens once client disconnects We previously never explicitly removed client ID -> secret token mappings because it theoretically opened more possibilities for malicious attempts to register a client ID with a different token. However, we need to deregister these now since Client IDs are a content hash of the function call/nested exec definition, which means the same client ID can connect and disconnect multiple times per server. The security implications of this also end up being extremely minimal. Registering a client ID with a different secret token was and still is possible *before* a client fully connects. It is possible to after a client disconnects now but this would only amount to a DOS since the "real" client would just be unable to connect. No information would be leaked. It also would have to be in the same server (i.e. a module or nested exec called by the main client directly or transitively). This issue can also be squashed by not leaking the buildkit sock to nested execs/modules, which is possible now by migrating functionality from our shim to our custom executor. There's no immediate plans to do this but the possibility is open whenever needed (or when we make that change for other reasons). Signed-off-by: Erik Sipsma <[email protected]> * add integ test coverage Signed-off-by: Erik Sipsma <[email protected]> --------- Signed-off-by: Erik Sipsma <[email protected]> Signed-off-by: Alex Suraci <[email protected]> Signed-off-by: Erik Sipsma <[email protected]> Co-authored-by: Alex Suraci <[email protected]>
* engine: consolidate IDs and re-use servers for nested execs This is an internal only refactor, though it fixes a few bugs while also simplifying quite a bit and setting us up for more simplifications soon. The biggest change is that nested execs connect back to the same server as the main client caller rather than being completely independent. * This is required for the fix to services used in modules (separate PR) to fully work * It also should fix the lack of docker auth in many of our integ tests, specifically those that use nested execs, which leads to dockerhub rate limiting Along the way it also does some consolidation of IDs, removing ModuleCallerDigest and just exclusively using ClientID. This requires that we tell module functions and other nested execs which ID to use, but that itself is setup for even more simplifications in follow-ups (we can remove the need for the current DaggerServer construct entirely, among other things). Signed-off-by: Erik Sipsma <[email protected]> * namespace services by server, not by client Previously it was possible to start a dependent service in one module API call, and then use it again in a later call, only to have it fail because it cannot resolve the service address, even though it's still running. This happened because each invocation has its own client ID, and client IDs were used to build service addresses. This change brings service addresses into alignment with the recent change to uniq them by service ID instead of client ID. The overall effect is that services are deduped within a Dagger invocation, even across module calls. So with this change, the service will just stay running and be re-used by a later call, thanks to the grace period. Signed-off-by: Alex Suraci <[email protected]> * fix routing of host services to correct client Signed-off-by: Erik Sipsma <[email protected]> * deregister secret tokens once client disconnects We previously never explicitly removed client ID -> secret token mappings because it theoretically opened more possibilities for malicious attempts to register a client ID with a different token. However, we need to deregister these now since Client IDs are a content hash of the function call/nested exec definition, which means the same client ID can connect and disconnect multiple times per server. The security implications of this also end up being extremely minimal. Registering a client ID with a different secret token was and still is possible *before* a client fully connects. It is possible to after a client disconnects now but this would only amount to a DOS since the "real" client would just be unable to connect. No information would be leaked. It also would have to be in the same server (i.e. a module or nested exec called by the main client directly or transitively). This issue can also be squashed by not leaking the buildkit sock to nested execs/modules, which is possible now by migrating functionality from our shim to our custom executor. There's no immediate plans to do this but the possibility is open whenever needed (or when we make that change for other reasons). Signed-off-by: Erik Sipsma <[email protected]> * add integ test coverage Signed-off-by: Erik Sipsma <[email protected]> --------- Signed-off-by: Erik Sipsma <[email protected]> Signed-off-by: Alex Suraci <[email protected]> Signed-off-by: Erik Sipsma <[email protected]> Co-authored-by: Alex Suraci <[email protected]>
* engine: consolidate IDs and re-use servers for nested execs This is an internal only refactor, though it fixes a few bugs while also simplifying quite a bit and setting us up for more simplifications soon. The biggest change is that nested execs connect back to the same server as the main client caller rather than being completely independent. * This is required for the fix to services used in modules (separate PR) to fully work * It also should fix the lack of docker auth in many of our integ tests, specifically those that use nested execs, which leads to dockerhub rate limiting Along the way it also does some consolidation of IDs, removing ModuleCallerDigest and just exclusively using ClientID. This requires that we tell module functions and other nested execs which ID to use, but that itself is setup for even more simplifications in follow-ups (we can remove the need for the current DaggerServer construct entirely, among other things). Signed-off-by: Erik Sipsma <[email protected]> * namespace services by server, not by client Previously it was possible to start a dependent service in one module API call, and then use it again in a later call, only to have it fail because it cannot resolve the service address, even though it's still running. This happened because each invocation has its own client ID, and client IDs were used to build service addresses. This change brings service addresses into alignment with the recent change to uniq them by service ID instead of client ID. The overall effect is that services are deduped within a Dagger invocation, even across module calls. So with this change, the service will just stay running and be re-used by a later call, thanks to the grace period. Signed-off-by: Alex Suraci <[email protected]> * fix routing of host services to correct client Signed-off-by: Erik Sipsma <[email protected]> * deregister secret tokens once client disconnects We previously never explicitly removed client ID -> secret token mappings because it theoretically opened more possibilities for malicious attempts to register a client ID with a different token. However, we need to deregister these now since Client IDs are a content hash of the function call/nested exec definition, which means the same client ID can connect and disconnect multiple times per server. The security implications of this also end up being extremely minimal. Registering a client ID with a different secret token was and still is possible *before* a client fully connects. It is possible to after a client disconnects now but this would only amount to a DOS since the "real" client would just be unable to connect. No information would be leaked. It also would have to be in the same server (i.e. a module or nested exec called by the main client directly or transitively). This issue can also be squashed by not leaking the buildkit sock to nested execs/modules, which is possible now by migrating functionality from our shim to our custom executor. There's no immediate plans to do this but the possibility is open whenever needed (or when we make that change for other reasons). Signed-off-by: Erik Sipsma <[email protected]> * add integ test coverage Signed-off-by: Erik Sipsma <[email protected]> --------- Signed-off-by: Erik Sipsma <[email protected]> Signed-off-by: Alex Suraci <[email protected]> Signed-off-by: Erik Sipsma <[email protected]> Co-authored-by: Alex Suraci <[email protected]>
This is a combo of:
Basically, I finally got back to #6916 and found the bare minimum changes needed to get #6914 working fully, which is the first commit here. But then I also realized that I actually needed #6914 in order for services to work and existing tests to pass; basically, the changes ended up being codependent and I had to cherry pick that one.
Either way, each commit has the details in the message. It amounts to a lot of simplifications though besides the service bug fix:
Also worth noting this is hopefully just the beginning of simplifications, more that are made possible by these changes and some other recent ones:
DaggerServer
can be rm'd (will most likely end up doing that as part of [WIP] Finish isolating sessions to each client #6916 now after revisiting it)Added an integ test for the service bug fixed and a backfilled one for #6951