Improve handling of parallel CUDA stacks #179

ncoghlan · 2025-05-22T12:46:51Z

The Python ecosystem doesn't currently have a universal way of handling variations in low level hardware support outside the combination of operating system and CPU architecture represented in wheel platform tags.

While there is work in progress to comprehensively address that issue via "wheel variants", venvstacks still needs its own mechanism for handling this problem (preferably in a way that will work with, rather than against, the ongoing wheel variant design work).

As a concrete example, consider the following scenario:

access to docling is to be exposed via a venvstacks application layer
both CPU-only and CUDA 12.8 based versions of the application should be made available

For this scenario, the desired outcomes are that:

there is only a single docling framework layer that the app layer combines with the relevant pytorch layers
version consistency is enforced across the layer definitions that correspond to what will become variant wheels

Some potential approaches are considered in the comments below.

Other issues potentially impacted by this one:

Allow specification of package index URLs #144 (due to the way PyTorch exposes its CUDA variants)
Explicit support for direct wheel URL overrides #157 (for the same reason)
Add a show command that summarises a parsed stack definition #159 (as the mapping from the stack config as written to the deployed stacks gets more complex, it becomes harder to visualise)

The text was updated successfully, but these errors were encountered:

ncoghlan · 2025-05-22T12:50:12Z

Possible approach: explicit matrix layers

Define a matrix layer (for example pytorch), higher layers that depend on it are automatically ~~duplicated~~ checked for compatibility with all the matrix entries. (Edit: duplication was a dealbreaker, so this switched to being a different way of requesting the "compatible framework layers" functional behaviour)

Sketch (requires reserving { and } in layer names to specify where the matrix fragment goes - improved explicit layer versioning support would probably be a simpler place to introduce that):

[[runtimes]]
name = "cpython-3.11"
# ...

[[frameworks]]
name = "pytorch-{variant}"
requirements = [
    "torch==2.7.0",
]
variants = {
    "cpu" = {
        "requirements" = [],
    },
    "cu128" = {
        "requirements" = [
            "torch @ https://download.pytorch.org/whl/cu128/torch-2.7.0%2Bcu128-cp311-cp311-win_amd64.whl",
        ],
    },
}
# ...

[[frameworks]]
name = "docling"
# Builds against pytorch-cpu, runs against any variant
frameworks = ["pytorch-{cpu}"]
# ...

[[applications]]
name = "docling-{variant}"
# Builds against every pytorch variant
# Use a comma separated list to build against a subset of variants
# Only one `*` is permitted in the framework list
frameworks = ["pytorch-{*}"]
# ...

The application definition above would be a shorthand for:

[[applications]]
name = "docling-{variant}"
variants = {
    "cpu" = {
        "frameworks" = ["pytorch-cpu"],
    },
    "cu128" = {
        "frameworks" = ["pytorch-cu128"],
    },
}
# ...

Frameworks would also be permitted to specify star-variant dependencies, which would be suitable for use cases like splitting pytorch and cuda into separate framework layers.

The sketch uses the "variant" terminology because this feature is intended specifically for the same cases as the "wheel variants" work. Locking two different variants of a layer should give a consistent set of requirements. If they're arbitrarily different, then those are different layer definitions, not layer variants. The syntax is designed such that as the standardisation work progresses, we should be able to specify the relevant wheel variant selection criteria as part of the layer variant definitions.

ncoghlan · 2025-05-22T12:53:29Z

Possible approach: compatible framework layers

Instead of automatically duplicating higher layers, allow higher layers to declare "One of X" style dependencies (potentially based on a layer tagging mechanism rather than specifically naming layers). One layer is nominated as the layer to use when building, resulting layer lock is checked for consistency with the other compatible layers.

Edit: after expanding on the explicit matrix idea, it's hard to see any real benefits in requiring users to define each compatible layer separately. That kind of flexibility makes sense when deployment is a true "mix and match" exercise, but outside wheel variants, that isn't the intended usage model for venvstacks.

neilmehta24 · 2025-05-22T15:06:41Z

Just to make sure we're on the the same page: the problem we want to avoid is having two copies of the same docling dependencies on the user's machines. When we want to deploy an incremental upgrade to the docling framework, we want to deploy only one docling framework layer, and two app layers -- one for CUDA and one for CPU.

The matrix solution sounds like there would be two copies of the docling framework on the users' machines.

I think these two issues are relevant:

Support excluding packages from the build #95 : Perhaps we can mark some dependencies as "installed elsewhere"
Add a "conceptual" layer archiving mode #146 : This would give us true flexibility for the deployed layers while minimizing size-on-disk

ncoghlan · 2025-05-22T16:23:54Z

Given the dealbreaker, I reworked the matrix idea to allow for framework layers that are built against a default version of a matrix layer, but are expected to be runtime compatible with all the variants of that layer.

ncoghlan · 2025-05-22T18:24:10Z

Pre-requisite steps before embarking on the wheel variant support:

add the show command
add syntactic support for explicit versioning (to build in the field substitution support feature)
replace direct URLs with a better wheel override mechanism (this isn't strictly required, but the UX of direct URLs is problematic due to the way it interacts with the locking mechanism)

neilmehta24 · 2025-05-23T15:52:27Z

The matrix idea looks reasonable to me. Wanted to point out two things, (1) the variants entry is not valid toml; and (2) the dynlib_exclude setting would need to be defined per variant

ncoghlan added Category: Enhancement New feature or request Affects: Metadata Affects the stack output metadata Affects: Spec Format Affect the stack specification format labels May 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve handling of parallel CUDA stacks #179

Improve handling of parallel CUDA stacks #179

ncoghlan commented May 22, 2025

ncoghlan commented May 22, 2025 •

edited

Loading

Uh oh!

ncoghlan commented May 22, 2025 •

edited

Loading

Uh oh!

neilmehta24 commented May 22, 2025

Uh oh!

ncoghlan commented May 22, 2025

Uh oh!

ncoghlan commented May 22, 2025

Uh oh!

neilmehta24 commented May 23, 2025

Uh oh!

Improve handling of parallel CUDA stacks #179

Improve handling of parallel CUDA stacks #179

Comments

ncoghlan commented May 22, 2025

ncoghlan commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ncoghlan commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

neilmehta24 commented May 22, 2025

Uh oh!

ncoghlan commented May 22, 2025

Uh oh!

ncoghlan commented May 22, 2025

Uh oh!

neilmehta24 commented May 23, 2025

Uh oh!

ncoghlan commented May 22, 2025 •

edited

Loading

ncoghlan commented May 22, 2025 •

edited

Loading