Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add proposal E #13

Merged
merged 2 commits into from
May 1, 2024
Merged

Add proposal E #13

merged 2 commits into from
May 1, 2024

Conversation

sudo-bmitch
Copy link
Collaborator

@sudo-bmitch sudo-bmitch commented Mar 5, 2024

This proposal uses the previously reserved "platform.features" field for specifying strict hardware requirements. This is based on a discussion with a few of the containerd maintainers.

Fixes #12.

@sudo-bmitch
Copy link
Collaborator Author

sudo-bmitch commented Mar 5, 2024

cc @stevvooe @estesp

@vsoch
Copy link
Collaborator

vsoch commented Mar 6, 2024

From my standpoint, this is functionally almost equivalent to proposal A, it's just putting the metadata attributes in a slightly different spot. I'm indifferent to the two I think, and defer to the runtime tool maintainers.

@sudo-bmitch
Copy link
Collaborator Author

Switching from user extendable annotations to spec defined values means that scenarios like the MPI version dependency wouldn't be possible unless image-spec decided to support it and containerd implemented it. There's a trade-off between flexibility and control. I'm not pushing for this over A, just getting it documented so it's available as an option if we decide to combine a few of these proposals together later.

@@ -0,0 +1,138 @@
# Proposal E - Platform Features

This proposal leverages the previously reserved `features` field in the platform to give a finer grain of control to runtimes selecting an image to run.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's also the idea I mentioned in the proposal C for image selection: #8 (comment)

@vsoch
Copy link
Collaborator

vsoch commented Mar 6, 2024

If there is still going to be an external artifact, then I can remain indifferent! But if this is just it, this is probably far worse for all the use cases I care about.

@mfranczy
Copy link
Collaborator

Overall the proposal looks good to me, but I am wondering if platform.features should be array of strings or key value pair.
Also it would be good to hear from @stevvooe @estesp if they would consider usage of the platform.features for image selection.

@sudo-bmitch
Copy link
Collaborator Author

Overall the proposal looks good to me, but I am wondering if platform.features should be array of strings or key value pair.

It's already defined in the spec, so changing the data structure would be a breaking change.

https://github.com/opencontainers/image-spec/blob/main/image-index.md

@mfranczy
Copy link
Collaborator

@sudo-bmitch right, but:

This property is RESERVED for future versions of the specification.

At least in theory, no one has used that field so far.

@sudo-bmitch
Copy link
Collaborator Author

sudo-bmitch commented Mar 27, 2024

@sudo-bmitch right, but:

This property is RESERVED for future versions of the specification.

At least in theory, no one has used that field so far.

Damn, I just got called a "no one" 😂
https://github.com/regclient/regclient/blob/1eccd0755c352e37e56a170917857e6dd5711985/types/platform/platform.go#L50-L51

If we need a map instead of a slice of strings, that would probably best be done with a new field. But before making a change like that, I'd want to establish the need.

@vsoch
Copy link
Collaborator

vsoch commented Mar 27, 2024

Damn, I just got called a "no one" 😂

Can be an honor depending on who does the calling!

https://youtu.be/rywUS-ohqeE?si=bj_HxsY4CRikiUou

@mfranczy
Copy link
Collaborator

mfranczy commented Apr 5, 2024

In some aspects this overlaps in the ongoing discussion for proposal H.

If we need a map instead of a slice of strings, that would probably best be done with a new field. But before making a change like that, I'd want to establish the need.

I think it would be a good idea, as @ChaoyiHuang proposed, to introduce a new field platform.compatibilities.
Additionally I would add very fundamental labels that are supported by OCI and cover generic use cases.

Those would be (open for discussion):

cpu.features.<FEATURE_NAME>.present: <bool>
cpu.model.vendor_id: <string>
cpu.model.family: <int>
cpu.model.id: <int>
pci.<class>.<vendor>.<id>.present: <bool>
usb.<class>.<vendor>.<id>.present: <bool>
kernel.config.<CONFIG>: <string>
kernel.modules.<NAME>: <string>
kernel.modules.<NAME>.<PARAMETER>: <string>
kernel.version.range: <string>

I would also allow for custom labels, OCI would not care what is in there.
Example of that

custom.sr_iov.enabled: true // telco specific use case
custom.mpi.version: 5.0 // hpc specific use case
...

My reasoning behind that is: we have to come up with a standard way of describing generic compatibility. Other specific use cases (like telco, hpc) would have to use custom labels. We agreed that we have to build our own tools to solve our specific use cases, so the tools would recognize the custom labels which are known to the use case.

Finally, if community and OCI see a need to graduate some specific custom label to well known labels we could do it carefully over time.

EDIT: Note that fields with (like cpu.features.<FEATURE_NAME>) do not imply to list all possible features under OCI. We have to meet the half way and trust that whatever users defined in there potentially reflects in the reality on the host. If it doesn't it's user's business to fix this. Keeping all the possible cpu features, kernel modules etc. would be very difficult.

@sudo-bmitch
Copy link
Collaborator Author

Additionally I would add very fundamental labels that are supported by OCI and cover generic use cases.

I think we want that, but that's not what this specific proposal is looking at. This is focused at things like CPU features that would prevent a binary from executing on the host, and a finer grain difference than linux/amd64/v1 vs linux/amd64/v2 that pull in a collection of CPU features. The more generic use cases would be better handled by the other proposals.

The goal of submitting this is two fold. First, it's putting up the scenario we discussed with the containerd maintainers so we can properly compare it to the other proposals (it's hard to compare something that doesn't get written down). And second is to have it as a potential building block in an eventual proposal that we adopt, not used on its own, but as part of a collection of solutions that work together.

"platform": {
"architecture": "amd64",
"os": "linux",
"features": [ "avx2", "bmi2" ]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My last question is about this as a list - is it already defined as a list? And the idea is that we'd need an ICD style design of features where every combination of features is represented as a single identifier?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially it would be a list of CPU features. So you could build a generic amd64/v1 image, and a second image for amd64/v1 plus sse3 to run some application more efficiently.

For context, the OCI platform definition is based on the Go architecture list: https://tip.golang.org/wiki/MinimumRequirements#microarchitecture-support

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes, it is already defined in the spec as a list.

ChaoyiHuang
ChaoyiHuang previously approved these changes Apr 23, 2024
Copy link
Collaborator

@ChaoyiHuang ChaoyiHuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

This proposal uses the previously reserved "platform.features" field for specifying strict hardware requirements.

Signed-off-by: Brandon Mitchell <[email protected]>
@sudo-bmitch
Copy link
Collaborator Author

sudo-bmitch commented Apr 23, 2024

I've updated this with a link to a Go package that defines a list of possible feature values to further clarify the proposal. I'm leaning on Go since OCI already does that for the architecture support today.

Edit: related is the Go supported platform details, where OCI defines the variants: https://go.dev/wiki/MinimumRequirements#microarchitecture-support

```

In the above scenario, the generic `linux/amd64` entry is listed first and should be used by default.
Runtimes that recognize and support the `avx2` AND `bmi2` features on a "v2" or newer variant would prefer the second entry as a better match.
Copy link
Collaborator

@ChaoyiHuang ChaoyiHuang Apr 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like the proposal for its simplicity in image selection as I mentioned in previous comment.

just one question here, does it assume that the matching criteria is the more features match, the more prefered image. How does the runtime decide when two entries have same number of features, and both matched with the node's CPU features. (I am OK first come first serve)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! Entries with features listed would want to be listed after entries without any features to avoid breaking older clients that are unaware of features and would pick the first entry. I think among multiple matching entries with features specified, picking the first may be the most flexible for implementation.

Otherwise, trying to count the number of matching features and using the first match for ties could also work. My concern with that approach is that I could foresee the fallback to fast instruction being a combination of several slower instructions, and we don't want to prioritize the latter.

@mfranczy mfranczy merged commit 95026a4 into opencontainers:main May 1, 2024
2 checks passed
@sudo-bmitch sudo-bmitch deleted the pr-proposal-e branch May 1, 2024 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a proposal for using platform.features
5 participants