Skip to content

Improve handling of parallel CUDA stacks #179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ncoghlan opened this issue May 22, 2025 · 6 comments
Open

Improve handling of parallel CUDA stacks #179

ncoghlan opened this issue May 22, 2025 · 6 comments
Labels
Affects: Metadata Affects the stack output metadata Affects: Spec Format Affect the stack specification format Category: Enhancement New feature or request

Comments

@ncoghlan
Copy link
Collaborator

The Python ecosystem doesn't currently have a universal way of handling variations in low level hardware support outside the combination of operating system and CPU architecture represented in wheel platform tags.

While there is work in progress to comprehensively address that issue via "wheel variants", venvstacks still needs its own mechanism for handling this problem (preferably in a way that will work with, rather than against, the ongoing wheel variant design work).

As a concrete example, consider the following scenario:

  • access to docling is to be exposed via a venvstacks application layer
  • both CPU-only and CUDA 12.8 based versions of the application should be made available

For this scenario, the desired outcomes are that:

  • there is only a single docling framework layer that the app layer combines with the relevant pytorch layers
  • version consistency is enforced across the layer definitions that correspond to what will become variant wheels

Some potential approaches are considered in the comments below.

Other issues potentially impacted by this one:

@ncoghlan ncoghlan added Category: Enhancement New feature or request Affects: Metadata Affects the stack output metadata Affects: Spec Format Affect the stack specification format labels May 22, 2025
@ncoghlan
Copy link
Collaborator Author

ncoghlan commented May 22, 2025

Possible approach: explicit matrix layers

Define a matrix layer (for example pytorch), higher layers that depend on it are automatically duplicated checked for compatibility with all the matrix entries. (Edit: duplication was a dealbreaker, so this switched to being a different way of requesting the "compatible framework layers" functional behaviour)

Sketch (requires reserving { and } in layer names to specify where the matrix fragment goes - improved explicit layer versioning support would probably be a simpler place to introduce that):

[[runtimes]]
name = "cpython-3.11"
# ...

[[frameworks]]
name = "pytorch-{variant}"
requirements = [
    "torch==2.7.0",
]
variants = {
    "cpu" = {
        "requirements" = [],
    },
    "cu128" = {
        "requirements" = [
            "torch @ https://download.pytorch.org/whl/cu128/torch-2.7.0%2Bcu128-cp311-cp311-win_amd64.whl",
        ],
    },
}
# ...

[[frameworks]]
name = "docling"
# Builds against pytorch-cpu, runs against any variant
frameworks = ["pytorch-{cpu}"]
# ...

[[applications]]
name = "docling-{variant}"
# Builds against every pytorch variant
# Use a comma separated list to build against a subset of variants
# Only one `*` is permitted in the framework list
frameworks = ["pytorch-{*}"]
# ...

The application definition above would be a shorthand for:

[[applications]]
name = "docling-{variant}"
variants = {
    "cpu" = {
        "frameworks" = ["pytorch-cpu"],
    },
    "cu128" = {
        "frameworks" = ["pytorch-cu128"],
    },
}
# ...

Frameworks would also be permitted to specify star-variant dependencies, which would be suitable for use cases like splitting pytorch and cuda into separate framework layers.

The sketch uses the "variant" terminology because this feature is intended specifically for the same cases as the "wheel variants" work. Locking two different variants of a layer should give a consistent set of requirements. If they're arbitrarily different, then those are different layer definitions, not layer variants. The syntax is designed such that as the standardisation work progresses, we should be able to specify the relevant wheel variant selection criteria as part of the layer variant definitions.

@ncoghlan
Copy link
Collaborator Author

ncoghlan commented May 22, 2025

Possible approach: compatible framework layers

Instead of automatically duplicating higher layers, allow higher layers to declare "One of X" style dependencies (potentially based on a layer tagging mechanism rather than specifically naming layers). One layer is nominated as the layer to use when building, resulting layer lock is checked for consistency with the other compatible layers.

Edit: after expanding on the explicit matrix idea, it's hard to see any real benefits in requiring users to define each compatible layer separately. That kind of flexibility makes sense when deployment is a true "mix and match" exercise, but outside wheel variants, that isn't the intended usage model for venvstacks.

@neilmehta24
Copy link
Member

Just to make sure we're on the the same page: the problem we want to avoid is having two copies of the same docling dependencies on the user's machines. When we want to deploy an incremental upgrade to the docling framework, we want to deploy only one docling framework layer, and two app layers -- one for CUDA and one for CPU.

The matrix solution sounds like there would be two copies of the docling framework on the users' machines.

I think these two issues are relevant:

@ncoghlan
Copy link
Collaborator Author

Given the dealbreaker, I reworked the matrix idea to allow for framework layers that are built against a default version of a matrix layer, but are expected to be runtime compatible with all the variants of that layer.

@ncoghlan
Copy link
Collaborator Author

Pre-requisite steps before embarking on the wheel variant support:

  • add the show command
  • add syntactic support for explicit versioning (to build in the field substitution support feature)
  • replace direct URLs with a better wheel override mechanism (this isn't strictly required, but the UX of direct URLs is problematic due to the way it interacts with the locking mechanism)

@neilmehta24
Copy link
Member

The matrix idea looks reasonable to me. Wanted to point out two things, (1) the variants entry is not valid toml; and (2) the dynlib_exclude setting would need to be defined per variant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Affects: Metadata Affects the stack output metadata Affects: Spec Format Affect the stack specification format Category: Enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants