Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contextual modules #7199

Open
shykes opened this issue Apr 26, 2024 · 40 comments
Open

Contextual modules #7199

shykes opened this issue Apr 26, 2024 · 40 comments

Comments

@shykes
Copy link
Contributor

shykes commented Apr 26, 2024

Problem

There are two types of Dagger modules: those that are standalone software projects, and those that exist in the context of another software project. Let's call those standalone modules and contextual modules, respectively.

Dagger is designed to support both, which is good, but causes friction for users in some areas.

  • Contextual modules almost always need access to their context directory. Today this requires an explicit argument, (which usually looks like dagger call MYFUNC --source=.), which is verbose.
  • Contextual modules often need to filter their context directory, usually to avoid the prohibitive cost of uploading a huge monorepo on every task.
  • The best practice for contextual modules is to make the context directory itself a module, and store the module source code in a configurable subdirectory. This additional concept of "module source" being different from "module root", adds complexity for the user.
  • For standalone modules, the recommended value for source is .. For contextual modules, it is ./dagger. The current default value is ./dagger, which hurts the experience of creating a standalone module. If we change it to ., it will hurt the experience of creating contextual modules. Either way, the user experience suffers.

Solution

I propose making contextual modules a first-class concept in Dagger. Here's how it would work.

What is a contextual module?

A contextual module is a module that exists in the context of a larger software project, and needs special access to its context directory to perform tasks.

This is similar to Dockerfiles, which usually exist in a context directory, which they can access with operations such as COPY. These Dockerfiles are contextual.

A module that doesn't have a context is called a standalone module.

Conventions for contextual modules

The only difference between contextual and standalone module is its path. The contents of the module directory is always the same.

A contextual module is recognizable by its directory name: .dagger. The module's parent directory (which contains .dagger) is the context.

Creating a contextual module

By default, dagger init will create a contextual module: the current directory is used as context, and .dagger is created to contain the module. Use this when using Dagger to configure your project's CI.

To create a standalone module, call dagger init --standalone. This will initialize the module in the current directory, without creating .dagger. Use this when creating a standalone module that is its own software project.

Loading

Contextual modules can be loaded directly (at their exact path), or indirectly (at the path of their context).

If the context for a module is itself a module, the context wins.

To summarize the loader algorithm:

flowchart LR
    start([Start]) --> check1{Is there a module at $target?}
    check1 -->|Yes| return1[Return that module ✅]
    check1 -->|No| check2{Is there a module at $target/.dagger?}
    check2 -->|Yes| return2[Return that module ✅]
    check2 -->|No| error[Return an error ❌]

Installing

Since a contextual module can be loaded (either directly or indirectly), it can also be installed.

Accessing the context directory

Functions in a contextual module can access their context directory with a new core API call:

{
 "Interact with the current context"
 context {
  "Access the context directory"
  directory(path: String!=".", include: [String!], exclude: [String!]): Directory!
  "Access a file in the context directory"
  file(path: String!): File!
 }
}

Example in Go:

func (m *MyModule) Source() *Directory {
 return dag.
  Context().
  Directory(ContextDirectoryOpts{Include: []string{"*.go"})
}

Future expansion of context API

In the future, the context() API could be expanded to centralize access to the current execution context.

These are not in scope for this proposal, but here are examples to give a general idea of what could be added later:

  • context().service() to connect to network services in the caller's context (this would replace host.service())
  • context().module() to replace currentModule()
  • context().terminal() to access the caller's terminal
  • context().status() for a more advanced status API than "return error or not"? Perhaps a possible bridge to integrating with eg. Github Checks and CI job status
  • `context().config() to expose config files in the context directory as a GraphQL schema?
  • context().ssh().agent() to get an ssh agent socket
  • context().docker().socket() to get a docker engine unix socket
  • context().docker().auth() to get docker credentials
  • context().aws().auth() to get aws credentials
  • context().platform() to get client's platform information
  • context().gpu() to access GPUs (insert future webgpu device streaming here)
  • context().watch() to watch for changes on the context directory (for running dev environments)
  • context().environment() to get environment variables from the context (this is not an endorsement! my objections to - doing this are well-known... but listing for completeness)
  • context().editor() to open the user's file editor (for IDE integration?)
  • context().window() or context.dom() to render a web window to the user (to add GUI capabilities, backstage-style)
  • context().cache() for hypothetical future interaction with the caching subsystem (?)
  • context().peer() for hypothetical future clustering features: lookup another engine to dispatch jobs to. A peer exposes its own dag object recursively.

FYI @vito @sipsma @jedevc @helderco, lots of speculation and extrapolation in this part, let me know if any of them triggers a positive or negative reaction. ☝️

Do standalone modules have a context?

  • Every module has an execution context, which includes a context directory.
  • But only contextual modules (modules in a .dagger directory) have a non-empty context directory.
  • Standalone modules can access their context directory, but it will always be empty

Deprecations

With native support for contextual modules, some features become redundant and can be deprecated:

  • Deprecate dagger develop --source and dagger init --source
  • Deprecate the source field in dagger.json
  • Deprecate "directory views". They were meant to be a stopgap anyway

Status

Request For Bikeshedding :)

cc @sagikazarmark @helderco @vito @sipsma @jedevc @kpenfound @jpadams

@jpadams
Copy link
Contributor

jpadams commented Apr 26, 2024

deprecating dir views: #6857
Yes, would love to keep things simpler.

In a standalone module, the dagger.json would live at the root of the module, right?

standalone (dir contents accessible via dag.client().context())
└── dagger.json, (rest of module implementation)

Where does the dagger.json live in a contextual module? Inside the .dagger directory, I'm guessing, right? Thus the module dir contents are the same in either case.

contextual (dir contents accessible via dag.client().context())
├── .dagger
│   └── dagger.json, (rest of module implementation)
├── bar
├── baz
└── foo

@jpadams
Copy link
Contributor

jpadams commented Apr 26, 2024

This also allowed, right?

contextual
├── .dagger
│   └── dagger.json, (rest of module implementation)
├── bar
│   └── .dagger
│       └── dagger.json, (rest of module implementation)
├── baz
│   └── .dagger
│       └── dagger.json, (rest of module implementation)
└── foo
    └── .dagger
        └── dagger.json, (rest of module implementation)

The top-level module (under contextual/) could invoke

dag.client().context(include=foo)

and perhaps dagger-in-dagger call that module's functions which would have access to the foo/ directory through dag.client().context().

@sagikazarmark
Copy link
Contributor

I like the idea of calling it "context". It's familiar terminology from Docker/Buildkit.

It would probably make sense to default to contextual modules in dagger init. I'd expect most people to write those, not standalone/reusable modules.

Standalone modules could be created with dagger init --standalone.

Although I've been advocating for making contextual modules first-class citizen in Dagger, I'm not sure how I would explain the difference to someone. Starting with the name: what does contextual mean? There is a context for standalone modules after all.

When you think about "modules" in the software world in general, there is no such distinction between reusable packages and applications (eg. Go modules).

So while I like the idea of making the user experience better for this use case, I'm not sure about the name and how to explain the difference to others.

@sagikazarmark
Copy link
Contributor

What happens when I try to dagger install a contextual module? Should that be allowed?

@shykes shykes closed this as completed Apr 26, 2024
@sagikazarmark
Copy link
Contributor

???

That was fast! 😄

@shykes shykes reopened this Apr 26, 2024
@shykes
Copy link
Contributor Author

shykes commented Apr 26, 2024

Sorry I fat-fingered while reading your insightful comments 😁

@smolinari
Copy link
Contributor

I'd just like to confirm this suggestion is a start to simplify my "Dagger functioning in a Temporal Activity" use case, where the Temporal Activity is the "wrapping context" needing a ./dagger installation of Dagger. Is this correct?

It would probably make sense to default to contextual modules in dagger init. I'd expect most people to write those, not standalone/reusable modules.

From my lurking the community, it has been said by the Dagger team a couple of times IIRC that this kind of usage is considered the "niche" way to use Dagger (and why Dagger has gone CLI-centric). However, I also agree with your suggestion and if my question above is confirmed, I applaud this push!!! 👏 😁

Scott

@wingyplus
Copy link
Contributor

I just hit this problem with monorepo at my work recently. I really like the idea. And I also have a questions

  • Can standalone module import standalone module?
  • What if we use dagger.json to determine root context directory? Like we do dagger init add the root repository then we accessing to context directory api. So we don't need to separate between standalone and module library.

@shykes
Copy link
Contributor Author

shykes commented Apr 26, 2024

In a standalone module, the dagger.json would live at the root of the module, right?

dagger.json always lives at the root of the module. The contents of a module is always the same: the only difference between a standalone and contextual module is their path.

standalone (dir contents accessible via dag.client().context())
└── dagger.json, (rest of module implementation)

This part is correct:

standalone
└── dagger.json, (rest of module implementation)

But this part is not:

standalone ~(dir contents accessible via dag.client().context())

A standalone module by definition doesn't have a context. If it tries to access its context, it will get either an error or an empty directory (TBD).

Where does the dagger.json live in a contextual module? Inside the .dagger directory, I'm guessing, right? Thus the module dir contents are the same in either case.

Yes exactly.

context (dir contents accessible via dag.client().context())
├── .dagger (this is the contextual module)
│   └── dagger.json, (rest of module implementation)
├── bar
├── baz
└── foo

I renamed contextual to context, to clarify that .dagger is the contextual module.

@shykes
Copy link
Contributor Author

shykes commented Apr 26, 2024

What happens when I try to dagger install a contextual module? Should that be allowed?

Yes, all modules should always be loadable, and therefore installable. See the section "Loading":

Contextual modules can be loaded directly (at their exact path), or indirectly (at the path of their context).
If the context for a module is itself a module, the context wins.

So, in the context of your project:

github.com/openmeterio/openmeter
└── .dagger
           ├── dagger.json
           ├── go.mod
           ├── ...
           └── main.go

If you remove the root dagger.json, these two install commands would be equivalent:

  • dagger install github.com/openmeterio/openmeter (no standalone module, fall back to contextual module)
  • dagger install github.com/openmeterio/openmeter/.dagger (import contextual module directly)

@shykes
Copy link
Contributor Author

shykes commented Apr 26, 2024

It would probably make sense to default to contextual modules in dagger init. I'd expect most people to write those, not standalone/reusable modules.

Standalone modules could be created with dagger init --standalone.

Agreed, I added this to the proposal. Thanks!

Although I've been advocating for making contextual modules first-class citizen in Dagger, I'm not sure how I would explain the difference to someone. Starting with the name: what does contextual mean? There is a context for standalone modules after all.

When you think about "modules" in the software world in general, there is no such distinction between reusable packages and applications (eg. Go modules).

So while I like the idea of making the user experience better for this use case, I'm not sure about the name and how to explain the difference to others.

I tried to address this in the proposal, by making the comparison to Dockerfiles more explicit.

@shykes
Copy link
Contributor Author

shykes commented Apr 26, 2024

This also allowed, right?

contextual
├── .dagger
│   └── dagger.json, (rest of module implementation)
├── bar
│   └── .dagger
│       └── dagger.json, (rest of module implementation)
├── baz
│   └── .dagger
│       └── dagger.json, (rest of module implementation)
└── foo
    └── .dagger
        └── dagger.json, (rest of module implementation)

The top-level module (under contextual/) could invoke

I'm guessing you mean the module under contextual/.dagger, just double-checking. Also as mentioned in my other reply, the directory contextual is actually the context.

dag.client().context(include=foo)

and perhaps dagger-in-dagger call that module's functions which would have access to the foo/ directory through dag.client().context().

Yes that would be possible, BUT it would not be necessary, because you can simply dagger install ../foo and let foo worry about accessing its context.

Example:

// contextual/.dagger/main.go
// Build foo and bar together
func Build() *Directory {
  fooSource := dag.Foo().Source()
  barSource := dag.Bar().Source()
  // continue integration logic
}
// contextual/foo/.dagger/main.go

func Source() *Directory {
 return dag.client().context(ClientContextOpts{Include: []string{"*.js", "*.mjs", "package.json"}})
}
// contextual/bar/.dagger/main.go
func Source() *Directory {
 return dag.client().context(ClientContextOpts{Include: []string{"*.go", "go.mod", "go.sum"}})
}

This way the include/exclude logic is neatly encapsulated into each module.

This greatly improves the experience in large monorepos, because the dependency graph of the monorepo's components can be modeled directly as a dependency graph of dagger modules.

@helderco
Copy link
Contributor

From #7199 (comment):

When you think about "modules" in the software world in general, there is no such distinction between reusable packages and applications (eg. Go modules).

Not in Go, but it’s actually pretty common in my experience, when you have a package registry and a manifest file. I see the terms “library” vs “application” more commonly. Libraries are meant for distribution (and thus installing and importing) while applications are meant to be facing users directly.

There’s not necessarily a technical limitation that prevents you to distribute an application or vice versa, it’s just that to distribute you have more requirements because you need the right package metadata or file structure. But there’s also all sorts of conventions and best practices when building a library, for example for not using upper bounds version constraints.

In our case it’s mostly intent, besides more access to the current “environment”. If it’s a reusable module you want to be a good citizen in thinking how others will use it, and not make any assumptions about their environment. While an application module is likely to be supporting a specific code base and so it benefits from having easier access to it and you don’t care that it’s reusable or not.

This brings me back to one of the early ideas in:

Formalize the ideas of "main" and "library" modules, make reading env vars an exclusive feature of "main" modules

  • We've talked vaguely about how there are certain modules that are meant to be invoked directly (what I called a "main" module here) and those that are just providing re-usable functionality for other modules ("library" modules)
  • The idea here would be to actually encode that idea in the module API. Main modules have to be invoked directly by a caller (i.e. either the CLI or an SDK running outside of a module). They are still sandboxed but are given extra abilities for more directly interacting with the caller.

So I’m wondering if “contextual modules” could have a bit more access and solve a few other issues as well. As for the trust model, we could for example have a limitation that a main module can’t be installed unless it’s from a relative path inside the same git repo. That would allow bringing together application silos in a monorepo (like our own repo with the SDKs), but not be able to run a module off the Internet that could be accessing something that it shouldn’t.

Not necessarily pushing for this specific idea to solve that, especially not right now (my preference is in another solution). Just food for thought as contextual modules looks like a step in that direction by starting this distinction.

@helderco
Copy link
Contributor

helderco commented Apr 27, 2024

I love how it makes things simpler, both in removing the need for source and that you can define include/exclude patterns for a module that’s tied to a code base.

I wonder if anyone will miss the source property and being able to put sources in any depth. In our monorepo we want SDKs to have their own contextual modules that are then installed and used by the root’s “ci” contextual module. In sdk/python, I have several modules in development, and thinking about how your proposal changes things, it feels simpler and very usable!

Deprecate "directory views". They were meant to be a stopgap anyway

This isn’t the only use case for it. Views are useful in standalone modules too:

Let’s say you have a function for linting go files so you only want *.go, go.mod, etc. If you pass any directory via the CLI, it’ll be unfiltered because you don’t currently have any way to filter these arguments through the CLI except to create views.

But I’d rather define those patterns in code somehow, or make the view names match <object name>.<function name>.<arg name>. That way users wouldn’t have to specify the view name in dagger call.

By definition, a standalone module does not have a context. When calling client.context() from a standalone module, the result will always be an empty directory.

👍

Bikeshedding

Name: “contextual module”

I’m not sure about this name. Since we’re making a distinction here, if we want to expand this module’s capabilities a bit at some point, we may need to change the name again as it’s very tied to the parent directory.

To be clear, we already have a “context” directory currently, and it’s accessible, but only via normal OS file system access since dag.Host has been disabled in functions. So this proposal adds the formal API that makes it ok in this specific case.

More context for those that don’t know what I mean

Modules have 3 directories:

  • Source directory: which is where the SDK specific sources are and defaults to the root directory.
  • Root directory: the directory where dagger.json is, a.k.a., the “directory of the module”.
  • Context directory: The directory where .git is, or the root directory if not in a git repo.

In runtime modules, the working directory is in an empty /scratch directory, but the context directory is mounted on /src. The root and source directories are naturally mounted under /src. So if a module is at the root of a repo or not in a repo and "source": ".", then it’s all the same directory, mounted in /src.

Directory name: .dagger

Not entirely convinced of the name .dagger. While it follows .workflows and other CI directories, those are usually yaml files, not real code like dagger has. Even though IDEs usually display these directories (maybe if not in .gitignore, I don’t know), usually they’re hidden otherwise. Also, I’ve imagined at one point in the future we’d need a .dagger directory to store “runtime” data from running some CLI commands, similar to .direnv or .devenv (we also had it in the CUE days).

On the other hand, I like that it stands out and isn’t confused for any normal dagger directory.

API name: client

“Client” is very overloaded in Dagger alone:

  • dag is an instance of dagger.Client which wraps the GraphQL client and the query builder for making API requests.
  • You have a dag.GraphQLClient() for the underlying object that’s responsible for making the API requests.
  • There’s been references to “external client” or “custom client” to mean pre-modules dagger code that runs through dagger run or automatic provisioning.

We have the following right now:

dag.CurrentModule().Source()

Even if we deprecate --source, you still need to access a module’s sources apart from the context, so I’d just add this:

dag.CurrentModule().Context()

@shykes
Copy link
Contributor Author

shykes commented May 1, 2024

Thanks to the feedback here, and a mind-bending discussion on Discord, we are hitting on some interesting ideas.

Here is a slightly updated thought experiment:

  • Modules are always contextual. Their directory is always called .dagger.
  • To daggerize a project, create a module at .dagger in the project directory. This is done with dagger init.
  • The entire contents of the module are in .dagger, including .dagger/dagger.json. To un-daggerize a directory, remove .dagger and you are done.
  • A dagger module can itself be daggerized! So you can create a module to help develop your module. That module will be in .dagger/.dagger
  • The directory containing a module is called its context directory. A module can read and write to its context directory.

Comparison: .git

FYI @aluzzardi @kpenfound @helderco @sagikazarmark

@shykes
Copy link
Contributor Author

shykes commented May 1, 2024

I just hit this problem with monorepo at my work recently. I really like the idea.

Nice! I really want the experience for monorepos to be excellent. I think this could help.

  • Can standalone module import standalone module?

Yes, in my opinion any module should always be importable by any other module (as long as you can reach the files of course).

  • What if we use dagger.json to determine root context directory? Like we do dagger init add the root repository then we accessing to context directory api. So we don't need to separate between standalone and module library.

That is kind of what we had before with the root field in dagger.json, now deprecated. It gave lots of flexibility to module developers, but also made modules potentially more complex, and their loading less intuitive.

I'm looking for the perfect balance of control and constraints. If we find it, the experience will be 10x better than today.

@shykes
Copy link
Contributor Author

shykes commented May 1, 2024

“Client” is very overloaded in Dagger alone:

What about just "context"?

func Source() *Directory {
 return dag.Context().Directory(".", ContextDirectory{Include: []string{"*.go", "go.mod", "go.sum"})
}

func Dockerfile() *File {
 return dag.Context().File("Dockerfile")
}

Note that I'm keeping the option open to adding more resources to the context, beyond files and directories :)

  • I think dag.Context() could potentially replace dag.Host(). It's a more accurate word.
  • One low-hanging fruit: dag.CurrentModule() could become dag.Context().Module() which is a tiny detail, but I like the consolidation.
  • Other random ideas: dag.Context().Terminal() to access the caller's terminal?
  • dag.Context().Status() for a more advanced status API than "return error or not"?
  • dag.Context().Config() to expose config files in the context directory as a GraphQL schema?

@sagikazarmark
Copy link
Contributor

  • Modules are always contextual. Their directory is always called .dagger.

That's going to end up with a file tree like this for "standalone" modules:

module
└── .dagger
    └── dagger.json

2 directories, 1 file

Or with a dev/test modules:

module
├── .dagger
│   └── dagger.json
└── dev
    └── .dagger
        └── dagger.json

4 directories, 2 files

It feels kinda weird to be honest.

.dagger/.dagger isn't much better IMO.

Also, how do you interact with it? dagger call -m .dagger? I can already see the confusion on people's faces (or mine): which .dagger am I in? What happens when I do dagger call? Or is it dagger call -m .dagger?

Comparison: .git

I'm not sure .git is a good analogy: you generally don't interact with files directly in .git.

Also, the contents of .git is not the primary content of a repository, your code is. That's not true for a standalone module: the source code is the content.

From what I can tell there are two questions to be answered:

  • What is the root/context of the module?
  • Where is the source located for the module?

The answer to the first question seems easier to answer: wherever dagger.json/.dagger is.

The second answer depends on the type of module (application or library). Maybe the complexity can be hidden inside Dagger: dagger.json always denotes the root of the module, thus the context. If there is a .dagger directory, look for the source there. Otherwise look in the same directory as dagger.json. Or maybe even keep the source parameter as a hidden option and to keep things backwards compatible.

@sagikazarmark
Copy link
Contributor

Note that I'm keeping the option open to adding more resources to the context, beyond files and directories :)

* I think `dag.Context()` could potentially replace `dag.Host()`. It's a more accurate word.

* One low-hanging fruit: `dag.CurrentModule()` could become `dag.Context().Module()` which is a tiny detail, but I like the consolidation.

Exactly what I was about to suggest!

@shykes
Copy link
Contributor Author

shykes commented May 1, 2024

Or with a dev/test modules:

Actually there would no longer be the need for dev, so the tree would be simpler:

module
├── .dagger
   └── dagger.json
   └── .dagger
     └── dagger.json

It feels kinda weird to be honest.
.dagger/.dagger isn't much better IMO.

Maybe the feeling of weirdness is superficial, and will subside if it actually solves a bunch of painful problems for us? Give it a chance :)

Also, how do you interact with it? dagger call -m .dagger? I can already see the confusion on people's faces (or mine): which .dagger am I in? What happens when I do dagger call? Or is it dagger call -m .dagger?

This is mostly confusing because we're trying to hold two systems in our heads simultaneously: the system we know, and the one described here. Beginners won't have this problem though. If all they know is the new system, then I think it can be as simple to understand as today, and probably simpler.

It's just a few simple rules:

  • TLDR: it works like git
  • Since modules are always contextual, they are always loaded from their context directory. So the address of a module is the address of its context directory.
  • If -m is not specified, the current context is found by walking up to the nearest .dagger. The same way git finds .git.

@sagikazarmark
Copy link
Contributor

Maybe the feeling of weirdness is superficial, and will subside if it actually solves a bunch of painful problems for us? Give it a chance :)

I'm willing to give it a chance and fully admit that this was only my first reaction.

But the feeling of weirdness is mostly due to this:

Comparison: .git

I'm not sure .git is a good analogy: you generally don't interact with files directly in .git.

Also, the contents of .git is not the primary content of a repository, your code is. That's not true for a standalone module: the source code is the content.

Maybe a better analogy would be Git with submodules, but we all hate submodules, so let's not do that. :)

This is mostly confusing because we're trying to hold two systems in our heads simultaneously

Maybe I wasn't clear: the confusion comes from the fact that both modules are called .dagger. You can pretty much do that today as well, it has nothing to do with context. Moving back and forth, I imagine it's going to be easy to get lost.

@shykes
Copy link
Contributor Author

shykes commented May 1, 2024

Also, the contents of .git is not the primary content of a repository, your code is. That's not true for a standalone module: the source code is the content.

What if standalone modules were really overlays for existing upstream repos? Taking your daggerverse repo as an example:

To enable this, modules could optionally configure a remote source as their context directory, with a new context field in dagger.json. Example:

dagger init gh --context https://github.com/cli/cli
dagger init bats --context https://github.com/bats-core/bats-core
dagger init kafka --context https://github.com/apache/kafka
dagger init checksum # no remote context

With this pattern, every daggerverse repo can basically become a distro :)

@shykes
Copy link
Contributor Author

shykes commented May 1, 2024

Continuing the list of possible extensions of dag.Context...

  • context().ssh().agent() to get an ssh agent socket
  • context().docker().socket() to get a docker engine unix socket
  • context().docker().auth() to get docker credentials
  • context().aws().auth() to get aws credentials
  • context().platform() to get client's platform information
  • context().gpu() to access GPUs (insert future webgpu device streaming here)
  • context().watch() to watch for changes on the context directory (for running dev environments)
  • context().environment() to get environment variables from the context (this is not an endorsement! my objections to doing this are well-known... but listing for completeness)
  • context().editor() to open the user's file editor (for IDE integration?)
  • context().window() or context.dom() to render a web window to the user (to add GUI capabilities, backstage-style)
  • context().cache() for hypothetical future interaction with the caching subsystem (?)
  • context().peer() for hypothetical future clustering features: lookup another engine to dispatch jobs to. A peer exposes its own dag object recursively.

Example of job dispatch with context().peer():

func Build(source *Directory, arch string) *Container {
  if dag.Context().Platform().Arch() == arch {
    return actualBuildLogic(source)
  } else {
    return dag.Context().Peer(PeerOpts{Arch: arch}).Build(source, arch)
}

cc @sipsma @jedevc, join me in this madness :-P

@sagikazarmark
Copy link
Contributor

Would it make sense to split this into two issues?

Code organization and context seems like two separate ones.

@sagikazarmark
Copy link
Contributor

To enable this, modules could optionally configure a remote source as their context directory, with a new context field in dagger.json. Example:

Aren't we back to the root/source/context conundrum with that? :)

It feels like these are separate issues.

@shykes
Copy link
Contributor Author

shykes commented May 1, 2024

I think it would be hard to split 1) contextual modules, and 2) how to manage the context. Since "context" doesn't exist without contextual modules. They are part of the same design.

Aren't we back to the root/source/context conundrum with that? :)

Obviously it's a related problem ("where do the directories go?") but "context" in this design is not the same as "source" in the current design or "root" in the earlier versions. These words are tightly coupled to the overall design they're part of.

@sagikazarmark
Copy link
Contributor

I think it would be hard to split 1) contextual modules, and 2) how to manage the context.

I don't see how dag.context().platform() depends on the code organization discussed here. I meant making that a separate proposal.

@shykes
Copy link
Contributor Author

shykes commented May 1, 2024

I think it would be hard to split 1) contextual modules, and 2) how to manage the context.

I don't see how dag.context().platform() depends on the code organization discussed here. I meant making that a separate proposal.

Ah I see. Yes that's fine. I'll go over my last comments, and either update this proposal or start a new one. Agreed on scope.

@shykes
Copy link
Contributor Author

shykes commented May 2, 2024

Follow-up after discussing with @kpenfound

  • Contextual modules make sense more than ever. Solves a lot of problems
  • Still probably makes sense to keep standalone modules also. There's a place for both
  • Example: daggerizing kubernetes, and developing a kubernetes integration module, are not the same thing. The first makes sense as a contextual module. The second as a standalone module. Possibly, the standalone module could import the contextual one, for building kubernetes itself.
  • We discussed a future "overlay" feature which would be really nice, for more flexibility in importing contextual modules. To be discussed later :)
  • When installing a module that is standalone + is itself daggerized (ie. developing dagger module with dagger), it makes sense for the standalone module to win, with contextual as a fallback.
  • When developing locally a module, and calling dagger without specifying a module, there is no good answer. If the contextual wins, it's more convenient to develop the module, but inconsistent with install logic. If standalone wins, it becomes very inconvenient to use .dagger, you have to explicit dagger call -m .dagger which is bad. So not sure how to resolve that.

@sagikazarmark
Copy link
Contributor

sagikazarmark commented May 3, 2024

Accessing the context directory

What happened to the dag.Context().[Source|Env|Platform]() proposal? I think I liked it better.

Do standalone modules have a context?

How about the module source (dag.CurrentModule().Source())? Making that context seems logical to me.

When calling client.context() from a standalone module, the result will always be an empty directory.

Maybe don't generate that function at all for standalone modules? Or would that introduce too much complexity into codegen?

Contextual modules can be loaded directly (at their exact path), or indirectly (at the path of their context).

Does this serve a specific purpose? Doing the same thing one way is better IMO.


Maybe don't autopublish contextual modules to daggerverse.dev?

@shykes
Copy link
Contributor Author

shykes commented May 3, 2024

What happened to the dag.Context().[Source|Env|Platform]() proposal? I think I liked it better.

Me too! That's an oversight, I'll update the proposal.

Do standalone modules have a context?

How about the module source (dag.CurrentModule().Source())? Making that context seems logical to me.

I thought about it, but I think it would create inconsistency. There's already a separate call for getting your source code. In fact that could be moved to context.Module().Source() as we discussed in the comments :)

When calling client.context() from a standalone module, the result will always be an empty directory.

Maybe don't generate that function at all for standalone modules? Or would that introduce too much complexity into codegen?

I want to leave the door open to the same module code being usable with or without context. Also it feels weird to change the API depending on a property of your module.

Contextual modules can be loaded directly (at their exact path), or indirectly (at the path of their context).

Does this serve a specific purpose? Doing the same thing one way is better IMO.

Mostly that if we picked only one way to do it, the more logical one to keep would be the actual path of the module. Which means we would be forced to type dagger call -m github.com/dagger/dagger/.dagger which is very inconvenient. So we add a convenience to load from the context. But if we only support loading from the context, it's confusing because sometimes loading a module directory works, sometimes not, it depends on whether it's contextual -> confusing.

Anyway, module loading is definitely an area that needs some bikeshedding.

Maybe don't autopublish contextual modules to daggerverse.dev?

We would definitely not publish them in the same section. But daggerverse could now show you a fantastic catalog of examples of daggerized projects. Also, there are legitimate use cases for installing a contextual module. Why bother downloading a binary build of kubectl, when I can dagger install github.com/kubernetes/kubernetes, and build it myself from source? :)

@shykes
Copy link
Contributor Author

shykes commented May 3, 2024

What happened to the dag.Context().[Source|Env|Platform]() proposal? I think I liked it better.

Me too! That's an oversight, I'll update the proposal.

Done

Do standalone modules have a context?

I also updated that section, because expanding the context API beyond the context directory, changes the definition of "context". Now every module always has a context: it just may or may not have a non-empty context directory.

@nipuna-perera
Copy link

I finally got through most of this discussion. I like where we stand now in terms of evolution of this idea. I think the "context" will make more sense to a lot of folks (myself included). I love the similarities to docker and git.

context().environment() to get environment variables from the context (this is not an endorsement! my objections to doing this are well-known... but listing for completeness)

This is funny, but something worth re-visiting if we are planning on exposing a bunch of contextual constructs to the module. The local environment would also fall in that category. Will definitely have to think through how to put some guardrails to prevent developers from writing context bounded modules that aren't very reproducible in a different context. But then again, that's where standalone modules come into play right? Feels good to have that flexibility.

All in all, great idea!

@aweris
Copy link
Contributor

aweris commented May 4, 2024

I finally got through most of this discussion.

Plus one to this one.

I like the high-level proposal and this solves so many issues I encountered before. I hope this will bring to sub-modules love they deserve.

About future extensions, I agreed with the scope but especially the host side of the future extensions is too exciting.

cc @sipsma @jedevc, join me in this madness :-P

I like this madness @shykes 🔥

@sipsma
Copy link
Contributor

sipsma commented May 4, 2024

My general thoughts are:

  • Directly exposing the full context directory to some modules and not exposing any to others makes sense. Originally we were considering those to be enabling "shadow arguments" but I think it's too opinionated and not worth it.
    • I do think we still need to support include/exclude on those directories since even a contextual module is not always going to want to load every single file in a massive monorepo.
  • Centralizing some of the existing APIs to dag.Context() makes sense
  • I don't see why this is really tied to things like the module existing in .dagger/. It seems like whether or not a module has a context directory can just be configuration on it and where it exists is independent.
    • But no problem with calling these modules "contextual" or some other term, separating them conceptually. Also no problem with making certain patterns best practices, like daggerizing a project meaning to put a module with a context directory under .dagger in the root of the repo. I just don't see why they need to be inherently tied to that. Maybe that's the idea here already, but want to get clarification.

In terms of implementing, if a first step is just:

  • expose dag.Context().Directory()
  • make whether a module has a context directory or not configurable
  • consolidate some of the existing APIs under CurrentModule to dag.Context()

Then it's very straightforward. The context directory is already available internally, we just chose not to expose it to clients. And the rest is just re-arrangement/configuration bikeshedding.

@shykes
Copy link
Contributor Author

shykes commented May 6, 2024

My general thoughts are:

  • Directly exposing the full context directory to some modules and not exposing any to others makes sense. Originally we were considering those to be enabling "shadow arguments" but I think it's too opinionated and not worth it.

I do think it was the right call to not rush to doing it right away. The resulting design will be better, because it will be more cleanly layered, and informed by specific user feedback.

  • I do think we still need to support include/exclude on those directories since even a contextual module is not always going to want to load every single file in a massive monorepo.

I completely agree. The proposal has include and exclude arguments to Context.directory(), do you think that's sufficient or is there the need for something else?

  • I don't see why this is really tied to things like the module existing in .dagger/. It seems like whether or not a module has a context directory can just be configuration on it and where it exists is independent.

I think it's important to make .dagger more than a convention, for 2 reasons:

  1. naming a module .dagger encodes unique information: whether a module expects a context directory. In that way, the .dagger directory name becomes part of the Dagger API.

That's actually an argument in favor of forbidding non-contextual modules from ever accessing a context directory. So if we add a dagger call --context argument (which makes sense for conveniently overriding the context directory), that flag would cause an error when applied to a non-contextual module. If we don't do this, it will be impossible to know which modules require a context, and which don't. We will start seeing READMEs saying "this module requires a context, make sure to pass --context. Here are examples of a valid context directory.". To avoid fragmenting the user experience in this way, we should enforce the simple rule ".dagger = contextual module = has a context directory".

  1. Loading convenience. If the module loader is not aware of the special meaning of naming a module .dagger, then loading that module from its context directory won't work. Users will need to point to the full address ending in /.dagger, which is awkward and limits what you can do with contextual modules.
  • But no problem with calling these modules "contextual" or some other term, separating them conceptually. Also no problem with making certain patterns best practices, like daggerizing a project meaning to put a module with a context directory under .dagger in the root of the repo. I just don't see why they need to be inherently tied to that. Maybe that's the idea here already, but want to get clarification.

See above for why core feature vs. best practice.

  • expose dag.Context().Directory()

  • make whether a module has a context directory or not configurable

Agreed, in the current proposal that configuration is done via the module name (.dagger means contextual). Open to bikeshedding alternative forms of configuration. I don't see a good alternative at the moment, but open to discussing anything as usual!

  • consolidate some of the existing APIs under CurrentModule to dag.Context()

Then it's very straightforward. The context directory is already available internally, we just chose not to expose it to clients. And the rest is just re-arrangement/configuration bikeshedding.

I have a question about that. @helderco also mentioned that we already have a concept of "context directory" internally. Are you referring to the same thing? And if so - are we sure that internal "context directory" is exactly the same as the user-facing context directory proposed here?

If you're talking about the git repository root that contains the module - a directory the current loader needs in order to support local dependencies within a repo - then I see that as different. In this proposal, a monorepo could contain multiple contextual modules, each named .dagger at a different location in the repo. Each of those modules would have a different context - mapped to their respective parent directories. Sorry if I'm completely off-topic here.

@shykes
Copy link
Contributor Author

shykes commented May 7, 2024

I finally got through most of this discussion. I like where we stand now in terms of evolution of this idea. I think the "context" will make more sense to a lot of folks (myself included). I love the similarities to docker and git.

Thanks Nipuna! This iteration feels right to me too.

context().environment() to get environment variables from the context (this is not an endorsement! my objections to doing this are well-known... but listing for completeness)

This is funny, but something worth re-visiting if we are planning on exposing a bunch of contextual constructs to the module. The local environment would also fall in that category. Will definitely have to think through how to put some guardrails to prevent developers from writing context bounded modules that aren't very reproducible in a different context. But then again, that's where standalone modules come into play right? Feels good to have that flexibility.

I think the key is to shift how we use env variables:

  • Usually we think of them as a very loose low-level config API: no typing, no static checking, no namespacing, mix of tool-specific and user-specific conventions. Messy but unavoidable. I strongly feel this approach must die. It's unfixable and will corrupt any platform that ingests it into its core API.
  • Instead, we should wrap it in a domain-specific typed API. Why call context.getenv("SSH_AUTH_SOCK") then context.unixSocket() when you can call context.ssh().Auth() and get a socket object? This is especially useful when the task to accomplish (for example "authenticate to a docker registry") requires more than environment variables. Perhaps there is a native auth helper to execute. That can't be solved by exposing getenv(). But context.docker().auth() could wrap all that as well.

So, in my opinion we have an opportunity to solve the underlying problem that prompt users to ask for context.getenv(), without actually shipping it :)

@helderco
Copy link
Contributor

helderco commented May 7, 2024

Then it's very straightforward. The context directory is already available internally, we just chose not to expose it to clients. And the rest is just re-arrangement/configuration bikeshedding.

I have a question about that. @helderco also mentioned that we already have a concept of "context directory" internally. Are you referring to the same thing? And if so - are we sure that internal "context directory" is exactly the same as the user-facing context directory proposed here?

The current internal "context directory" is different from this proposal, yes.

If you're talking about the git repository root that contains the module - a directory the current loader needs in order to support local dependencies within a repo - then I see that as different.

That's right. The internal "context" is either the repo root or the root directory if not in a repo.

In this proposal, a monorepo could contain multiple contextual modules, each named .dagger at a different location in the repo. Each of those modules would have a different context - mapped to their respective parent directories. Sorry if I'm completely off-topic here.

Yes, that's right. They're different concepts but may overlap if .dagger is next to .git. However it may be easy to adapt anyway. Each module has it's own context directory irregardless of another module's context directory.

Not familiar with how it factors into being able to reference another module in the same monorepo by relative path though, or for a go.mod to reference a "parent" go.mod in the monorepo use case for example.

@shykes
Copy link
Contributor Author

shykes commented May 7, 2024

Not familiar with how it factors into being able to reference another module in the same monorepo by relative path though, or for a go.mod to reference a "parent" go.mod in the monorepo use case for example.

My thinking is that we would keep that orthogonal, so relative imports and shared sdk material would work the same. But open to changing that ofc.

@shykes
Copy link
Contributor Author

shykes commented May 11, 2024

  • I don't see why this is really tied to things like the module existing in .dagger/. It seems like whether or not a module has a context directory can just be configuration on it and where it exists is independent.

I think it's important to make .dagger more than a convention, for 2 reasons:

@sipsma just double-checking how you feel about my response to this particular concern of yours?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants