Skip to content

Proposal: handling conflicting libraries in the same package #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
114 changes: 114 additions & 0 deletions seps/sep-0002.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# Use-variants: for packages with conflicting libraries

## The problem

The `intel-oneapi-mkl` package is installed with many libraries. Some
of these libraries define the same symbols but differ in their
implementation. For example there is a set of libraries built with
threading support, and another set which offers the same
functionality but are built without threading support.

Right now, the only mechanism Spack offers to choose between these
libraries is a variant: you could add a `threading` variant and then
choose which libraries to supply to dependents based on the value
of that variant. The problem with a variant is that different
variant values generate distinct installations of a package: we would
install `intel-oneapi-mkl` once for each choice of `threading`, even
though the installations would be the same.

See also: https://github.com/spack/spack/discussions/22749

## Proposed Changes

* Modify variant declarations with a flag to indicate that the variant
determines how the package is used rather than how it is installed.
* During concretization, this variant is treated like any other
variant (no changes to the concretization algorithm are required).
* After concretization, the variant is stripped from the node and
stored on the edge connecting the node to its parent.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are three features mentioned in this SEP, but their semantics aren't really laid out here. I see:

  • interface=True
  • request=True
  • provider=mkl

I think the first two should really be type=interface, type=request, etc.

The main concern I have is that the proposed changes should really include a description of how each of these options affects concretization semantics and the final DAG.

Copy link
Member Author

@scheibelp scheibelp May 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interface=True has the same concretization semantics as other variants (the difference is in terms of hashing).

Likewise provider=mkl doesn't modify this, only restricts when the variant affects the hash.

request=True doesn't modify the concretization semantics or the hash: it is only theoretically useful in terms of saving time when implementing .libs or .headers for a dependency but is not required (it doesn't have the same importance as interface=True, which is required for correct semantics).

It is true that interface=True and request=True are mutually exclusive (interface=True is used in .libs and .headers like request=True, but established additional constraints), and for that reason I suppose they could be options for a variant type. Originally I found it confusing though since their use cases are different.

Copy link
Member Author

@scheibelp scheibelp May 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for that reason I suppose they could be options for a variant type. Originally I found it confusing though since their use cases are different.

More than that, I should say that the Boost case is simple enough that it may not need to be addressed at all here (it can definitely be addressed separately) - that section doesn't state this explicitly but I think it makes a case for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add this stuff to the SEP?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but to be clear I think this is a reasonable way to get to that: IMO it's worth making sure we agree before I push changes to this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

41565f3 adds some clarifications based on this conversation


## Example Use Cases

### Intel oneAPI MKL

Described in the introduction. This would be resolved by adding a
`threading` variant:

```
variant('threading', values=['openmp', 'sequential'],
interface=True)
Comment on lines +38 to +39
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a more generic argument that can take a single value or a list of them:

Suggested change
variant('threading', values=['openmp', 'sequential'],
interface=True)
variant('threading', values=['openmp', 'sequential'], type='interface')

In this way we would use a single directive to declare variants that are relevant for both deploying a package and using it. With the proposed interface such a case would need two declarations:

variant('libs', values=('shared', 'static'), multi=True, interface=True)
variant('libs', values=('shared', 'static'), multi=True, interface=False)

while using a more generic argument one suffices:

variant('libs', values=('shared', 'static'), multi=True, type=('deploy', 'interface'))

Another minor benefit of this would be that we'll leave the type of variants open for extensions, even though it's very likely that with these two different types ("deploy" and "interface") we'll be good for a long time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this though not necessarily on the names "interface" and "deploy", but we can work those out later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgamblin Sure, even "type" as the name of the argument is just a placeholder at the moment.

```

The new logic is the specification of `interface=True`: this
indicates that this variant describes how the package is used rather
than how it is installed.

Dependents do not have to specify a value for `threading`: if they
don't choose a value, then all dependents still need to "agree" on
using the same libraries. Both the new and old concretizer already
enforce consistency of variants amongst dependents.

### Boost

If users want to create a boost installation with many components,
then dependents may only need a subset of the Boost libraries; this
is a bigger issue if the unneeded Boost libraries define conflicting
symbols (not with one another, but with symbols defined in the
dependent or some sibling dependency of Boost).

Because Boost component libraries don't conflict with each other,
they do not need to be constrained like the dependents of
`intel-oneapi-mkl`. However, when we call
`root_spec['boost'].libs`, we want to know which libraries the
dependent actually needs: to handle this, the dependent can
recompute its dependencies based on its concrete spec, then pass the
dependency variants it requires as parameters to the dependency when
requesting `.libs`.

We could avoid doing this universally by marking variants where this
is a concern:

```
variant('regex', request=True)
```

This is not essential though: the dependent could universally
recompute dependency information without any hints from dependents.
For this reason, the Boost use case can be handled completely
separately, since any variant modifications would only be used
to speed up computations and are not required to achieve the
semantics desired for the use case.

### Intel Parallel Studio

This provides multiple virtual packages: `mkl`, `blas`, `lapack`,
`mpi`, and more. Each implementation has multiple sets of conflicting
libraries. If a dependent is only using `intel-parallel-studio` for
`mpi`, then we don't want use variants related to `mkl` to be
recorded.

This is murky pending a completed implementation of finer selection
of virtual providers: Currently we cannot say that we depend on
`intel-parallel-studio` *only* for `mpi` - if you ask for
`intel-parallel-studio` it is assumed that you are using it for
everything it provides (and there is no alternative `blas`
implementation for example).

Once there is a means of recording which virtuals are being used
by a dependent, we will want to only propagate an interface variant
to the dependent edge if the dependent actually uses the virtual
associated with that variant. For that purpose, I think it would
be suitable to name the associated virtual in the variant
declaration:

```
variant('threading', interface=True, provider='mkl')
```

In other words, this has the same concretization semantics discussed
for the Intel oneAPI MKL use case (i.e. that this is treated
exactly the same as other variants during concretization), but
adds controls on when the variant influences the hash of dependents:
it only affects the dependent's hash when the dependent is actually
using the package for the virtual named in the variant's `provider=`
option.