-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce RFC 18 Workload Report Artifact Provenance #519
base: main
Are you sure you want to change the base?
Conversation
✔️ Deploy Preview for elated-stonebraker-105904 canceled. 🔨 Explore the source changes: 2dd4ee0 🔍 Inspect the deploy log: https://app.netlify.com/sites/elated-stonebraker-105904/deploys/61fb0b5bc32ed7000746e187 |
- id: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 | ||
- config: | ||
id: 22fe69c924b55327ce3e8ca079d7295a947a3641b5ab74bdf7541dc680258c81 | ||
config: | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could get pretty big. RFC 14 proposed that this should be shasum(giant-yaml)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't object to encoding the yaml in order to shrink it down. But I wouldn't hash the value. This RFC should make the flow of information through the supply chain observable. Hashing the value makes it possible to see that some change has occurred, but not what that change is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also a 1MB limit on the size of a single resource, and the values might be repeated twice due to the addition of managedFields
. I'd be careful about including large other documents directly for that reason. It seems like it would be better to include a resourceVersion
or generation
field referencing the resource's version, rather than the entire contents.
Copying over my comments from RFC14 that are still relevant: In the office hours this week (Jan 10th, 2022), we talked about how we could extend this type of artifact tracing to track artifacts across clusters. Assuming you started with a SupplyChain that looked something like the following (from git-writer example): [source-provider] -> [image-builder] -> [config] -> [git-writer] But you made a small change and wrapped the git-writer in the SupplyChain with a So your workload status might end up looking like this: artifacts:
- source:
id: 000
url: https://github.com/my-org/my-source
revision: abc123
passed:
- resource-name: source-provider
apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: GitRepository
- image:
id: 111
image: gcri.io/asdfsdfasdf
passed:
- resource-name: image-builder
apiVersion: kpack.io/v1alpha2
kind: Image
from:
- id: 000
- config:
id: 222
config: shasum(of-some-config)
passed:
- resource-name: my-config
apiVersion: v1
kind: ConfigMap
from:
- id: 111
- source:
id: 333
url: https://github.com/my-org/my-config
revision: xyz789
passed:
- resource-name: config-writer
apiVersion: carto.run/v1alpha1
kind: Runnable
from:
- id: 222 Now lets assume you are following the basic-delivery example on another cluster. You would end up with a Deliverable with the following status: artifacts:
- source:
id: 333
url: https://github.com/my-org/my-config
revision: xyz789
passed:
- resource-name: config-provider
apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: GitRepository
- source: some other-artifact
... You can see how we could correlate these two completely independent resources, based solely on their artifacts. |
Copying over my comments from RFC14 that are still relevant: @rawlingsj made a really insightful comment in another unrelated thread that we should consider here:
My comment here definitely assumes read access to all clusters. So maybe we should start thinking about writing out artifacts as k8s events at the same time as writing them to the workload/deliverable status. That way if someone is already emitting k8s events from their prod clusters, we can integrate into their existing tooling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favor of capturing a reference to the resources created for a Workload on the status, but I'm skeptical that capturing the output values will give users the information they think they are getting. There's little point in merely duplicating state that could instead be looked up on the child resources directly unless the aggregation provides some value. It's not clear what questions you're tying to answer by providing this metadata.
# namespace of the resource | ||
namespace: <:string> | ||
# resource version of the object | ||
resourceVersion: <:string> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resource version is highly volatile. Any change (or even no change for an "empty" update/patch) to the resource will create a new version. This value is typically only used for direct comparisons (are these two resource the same).
Unless you are very careful, it's easy to get into an infinite loop. As updating the Workload status will cause the Workload to be reprocessed, which will cause the templated resource to be updated, even if that update is a noop the resrouceVersion will bump, which need to be reflected on the Workload's status...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the underlying User Story/Axiom for wanting the RV?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As updating the Workload status will cause the Workload to be reprocessed, which will cause the templated resource to be updated, even if that update is a noop the resrouceVersion will bump
When the workload reconciles, Cartographer first checks for changes to a stamped object. If there is no proposed change to the spec, Cartographer will not update the object. This should protect us from the infinite loop scenario posited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the underlying User Story/Axiom for wanting the RV?
Given an artifact, I would like to know the object and spec that created it. The other information can tell us what GVK/Name/Namespace the object was. But that object could have changed in arbitrary ways. The resourceVersion will point to a unique spec for the given object.
Leveraging this knowledge in practice will require additional historical persistence of objects, as etcd is only gauranteed to keep the resourceVersions for 5 minutes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As updating the Workload status will cause the Workload to be reprocessed, which will cause the templated resource to be updated, even if that update is a noop the resourceVersion will bump
When the workload reconciles, Cartographer first checks for changes to a stamped object. If there is no proposed change to the spec, Cartographer will not update the object. This should protect us from the infinite loop scenario posited.
I think Scott's point is that resourceVersion
can change even if the underlying resource's spec
does not. For CRDs, the metadata.generation
field tracks spec
and metadata
changes (but not status
changes).
- id: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 | ||
- config: | ||
id: 22fe69c924b55327ce3e8ca079d7295a947a3641b5ab74bdf7541dc680258c81 | ||
config: | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this example be a map[string]string instead of a string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The config artifact has one field: config
which is a string. Similar to the image artifact.
@jwntrs I think perhaps this thinking is solid, and should be enforced, that all steps, no matter what, produce an artifact. Even in the case of a kpack deploy, we should have a kpack deploy ID or something of that nature, as an artifact. That way, every Supply Chain Resource has representation in the artifacts list. |
The problem that we are trying to address is make it easy for the user to be able to identify the artifact graph that gets produced by a supply chain ( |
You'll never actually know that " |
Some open questions that I've heard raised:
|
No. A resource should reflect the current known state of the world on its status to the best of its ability. There will inherently be latency that's ok and expected. The status should indicate the recency of the status' content. (Generation/ObservedGeneration, LastTransitionTime, etc)
A controller should almost never wait for an async condition to become true. The general pattern is to reflect the current state and watch "interesting" resources for changes. When one of the watched resources changes, reprocess the whole control loop and pickup the latest "current" values. |
I would support such an RFC. @squeedee Do you think this should be a separate RFC, or folded into this one. I'm going to fold it into this one! |
@scothis @jwntrs This is an excellent discussion to move to RFC 20, our attempt to ensure that we can tie output N to input M. |
@waciumawanjohi any thoughts on this point?
|
Cartographer's current behavior is to continue promoting artifacts that were stamped with a previous workload spec. As long as that is the behavior, we should not wipe out artifacts from the workload status simply because the workload spec changes. What would be the implication of no longer promoting artifacts built from previous workloads? Let's consider an example: a workload paired with a simple source (flux gitrepo) -> image (kpack) -> deploy (knative) supply chain. The workload has been doing work and there are valid artifacts at each stage. The user then updates the
To handle this scenario we would need to do one of two things.
What would we gain by adopting one of these strategies? Well one thing that isn't gained is determinism about what is deployed to prod. The old workload spec would have delivered an arbitrary number of revisions through the supply chain. What we gain is clarity about which of those revisions were driven by the old workload spec and which were driven by the new workload spec. There is a simpler way to achieve that goal: consider the workload version as a field to be captured in the |
Yes. As long as there is a ClusterTemplate we can capture information about what objects were created on the cluster and what inputs led to them. I'll add that to the RFC. |
The rfc specifies that artifacts are |
Also link to RFC 14 discussion
d9f124d
to
ef44ba4
Compare
RFC 20 now proposes exactly this. Assuming RFC 20 is accepted, we could then either wipe the artifact field for a workload when the spec is updated or note the generation of the workload in the |
Borrowing from the intent behind this RFC and 20, I'd like to propose an alternate status shape for Workloads. Not all of the fields are expected to be populatable immediately, but it gives a direction to grow towards while leaving room for additions over time. Other changes to the cartographer model will be required to be able to surface some of these fields. Key differences:
...
status:
observedGeneration: # int64
conditions: # []metav1.Condition
supplyChainRef:
apiVersion: # string
kind: # string
namespace: # string (empty for cluster scope)
name: # string
resources:
- name: # string (resource name as defined by the supply chain)
ref: # corev1.ObjectReference (stamped resource)
apiVersion: # string
kind: # string
namespace: # string (empty for cluster scope)
name: # string
templateRef: # corev1.ObjectReference (template resource)
apiVersion: # string
kind: # string
namespace: # string (empty for cluster scope)
name: # string
inputs: # (I'm not sure how practical this is today given Cartographer's loose relationship between templates)
- name: # string (resource name)
value: # optional json.RawExtension
outputs:
- name: # string
value: # json.RawExtension (may be too large to reasonably include)
lastTransitionTime: # metav1.Time
# these fields are modeled after metav1.Condition
type: # enum 'Source', 'Image', or 'Config' (consider dropping if redundant with templateRef.kind)
status: # enum 'True', 'False', or 'Unknown' (based on metav1.ConditionStatus)
observedGeneration: # int64 (alternate to resourceVersion)
lastTransitionTime: # metav1.Time
reason: # string
message: # string |
interesting @scothis, trying to make some sense of it making a little more status:
resources:
- name: source-provider
ref:
kind: GitRepository ...
templateRef:
kind: ClusterSourceTemplate ...
inputs: []
outputs:
- name: ?
value: {url: git.io/foo/bar, ref: b4db33f}
lastTransitionTime: 1
- name: ?
value: {url: git.io/foo/bar, ref: c001b33f}
lastTransitionTime: 2
type: Source
status: True
lastTransitionTime: 2
- name: image-builder
ref:
kind: Image
templateRef:
kind: ClusterImageTemplate
inputs:
- name: source-provider
value: {url: git.io/foo/bar, ref: b4db33f
- name: source-provider
value: {url: git.io/foo/bar, ref: c001b33f
outputs:
- name: ? # (!!)
value: {image: registry.io/foo@sha256:010101}
lastTransitionTime: 3
type: Image
status: True
lastTransitionTime: 3 in the example above, just looking at thx! |
@cirocosta inputs/outputs were not intended to be a historical record of how a value changed over time, but to capture the "current" output/input for a resource. The definition of current will likely evolve based on other proposals. So what you sketched as: outputs:
- name: ?
value: {url: git.io/foo/bar, ref: b4db33f}
lastTransitionTime: 1
- name: ?
value: {url: git.io/foo/bar, ref: c001b33f}
lastTransitionTime: 2 would actually be outputs:
- name: url
value: git.io/foo/bar
lastTransitionTime: 1
- name: ref
value: c001b33f
lastTransitionTime: 2 Note: the transition times are different because the url didn't change, but the ref did in your example. |
@scothis |
Why do we want this to be typed? Aren't these two stipulations enough?
|
In the workshopped format, the type is the top level field. Shall we change that to: status:
artifacts:
- # the sha256 of the ordered JSON of all other non-from fields
id: <SHA:string>
# optional: oneOf(source,image,config)
source:
# exposed fields - in this case url and revision
uri: <:string>
revision: <:string>
# the object which produced this artifact
resource:
# name of the resource in the supply chain
resource-name: <:string>
# GVK of the resource
kind: <:string>
apiVersion: <:string>
# name of the resource on the cluster
name: <:string>
# namespace of the resource
namespace: <:string>
# resource version of the object
resourceVersion: <:string>
from:
- # id of any artifact(s) that were inputs to the template
id: <SHA:string> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few notes, but I'd be super-excited to even start with the list of resources / stages / artifacts managed by the Workload, and add in the "triggered by" separately.
# the object which produced this artifact | ||
resource: | ||
# name of the resource in the supply chain | ||
resource-name: <:string> | ||
# GVK of the resource | ||
kind: <:string> | ||
apiVersion: <:string> | ||
# name of the resource on the cluster | ||
name: <:string> | ||
# namespace of the resource | ||
namespace: <:string> | ||
# resource version of the object | ||
resourceVersion: <:string> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be an ObjectReference
or TypedLocalObjectReference
?
id: <SHA:string> | ||
# exposed fields - in this case url and revision | ||
uri: <:string> | ||
revision: <:string> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be top-level fields, or possibly under an outputs
key?
from: | ||
- # id of any artifact(s) that were inputs to the template | ||
id: <SHA:string> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does Cartographer track the from
information -- does it require that each resource expose an observedGeneration
or equivalent in status
? If not, Cartographer may observe a resource's status
before it has reconciled a spec
change and assume that the status
goes with the current spec
, when it actually applies to an older spec
.
- id: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 | ||
- config: | ||
id: 22fe69c924b55327ce3e8ca079d7295a947a3641b5ab74bdf7541dc680258c81 | ||
config: | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also a 1MB limit on the size of a single resource, and the values might be repeated twice due to the addition of managedFields
. I'd be careful about including large other documents directly for that reason. It seems like it would be better to include a resourceVersion
or generation
field referencing the resource's version, rather than the entire contents.
namespace: my-namespace | ||
resourceVersion: "11125545" | ||
from: | ||
- id: 23156c7ac2170fe95f85a1ad42522c408e67038f8393eea6cd8551e07457c5d7 # <--- The id noted above, as the source-tester consumes the source artifact from source-provider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a component like kpack
can generate a new build based on external stimulus (i.e. addition of a new base image), what happens to from
here?
name: RFC 18
about: Workload Report Artifact Provenance
Readable
Changes proposed by this PR
Opens RFC 18
Release Note
PR Checklist
Note: Please do not remove items. Mark items as done
[x]
or usestrikethroughif you believe they are not relevant- [ ] Linked to a relevant issue. Eg:Fixes #123
orUpdates #123
- [ ] Removed non-atomic orwip
commits- [ ] Filled in the Release Note section above- [ ] Modified the docs to match changes