-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
manifest delete is leaking indexes when deleting client-managed referrer #1228
Comments
ORAS will not clean up old referrers index after attaching using referrers tag schema. If I understand correctly(see below), both the old index and old referrers are garbage. To do GC from client side, you need to delete both flowchart LR
A("old referrer index\n29cd94faa1e") --> C("old referrer\n78bab7e405d")
With that implementation |
BTW ORAS CLI used to support referrers index cleanup in v1.1.0-rc.1 but it's too error prone and got reverted in the v1.1.0 release. |
@toddysm didn’t we decide to not clean up the old manifest because docker hub didn’t support manifest delete? Should we have flag and make delete the default given that DH is now turning out to be a special case? |
I think in the context of gc, these are two different things. The old referrer is the thing the user deleted, so not only garbage but it should be deleted. Whereas, the index is managed by the client, and there's only an "old" one because we can't edit it in place. So the new index clobbers the old in that we move the
I think that makes sense. I couldn't think of a use case where we'd want to retain the old index. If the only reason is that DH doesn't allow deletion, that seems more like a workaround to me. So, agree with you @sajayantony seems like GC by default would be good, with an optional flag to workaround it as needed. |
@sajayantony my recollection of the discussion was to have the flag to force the deletion, but the default behavior is to not delete (i.e. user needs to explicitly ask for deletion). This is inconvenient and inconsistent because you normally don't know if the registry supports deletion until you hit the error but this is the most we can do to support all registries. The other option we discussed is to degrade the deletion error to warning to ensure automation passes (else automated tasks break for registries that don't support deletion). Hope this helps. |
@sajayantony @toddysm Here is a chronological view of supporting GC old referrers index
|
Thanks for the conversation, all! I certainly understand the goal to not have an action like "manifest push" or "attach" result in a weird error about deletion failing. But it seems like we traded off working around the error on some registries by keeping the stale copies always.
How does a user do that? That would work (but I still think the default should be the other way around). But, there is no option that I can see to ask for this deletion? I have to say, the use of the term garbage collection confuses me here, and I wonder if some of the conversation isn't in context of referrers, vs the stale index copy. These stale indexes are a side effect of how the client manages updates to the referrers index. It doesn't seem like garbage collection, but more a step in that operation that we aren't doing. To further complicate things on ECR, I cannot even delete the referrer I am trying to delete, because ECR protects against deletion of manifests which are referred to in an existing index. Since the old stale index is still in the repo, I have to manually delete it. I might still be misunderstanding the past conversations here, but it seems the default behavior should be to delete the stale and now useless copy of the index, with an option to retain in on those registries that don't permit me to delete it (like Docker Hub, which implements delete through the DH API vs the OCI API). It could also certainly work to have an option to force delete (though again, that seems like the expected behavior and should probably be default?). |
Managing referrers index is just a client-side workaround when Referrers API is not available. As a workaround, it's not perfect and there are several known issues of the workaround, e.g. there is no way to guarantee the data consistency when a referrers index is updated concurrently. To provide production-level registry service, I believe eventually ECR will implement Referrers API. If ECR uses referrers index as a temporary solution of Referrers API, ORAS CLI can bring back the |
Yes, I fully understand and am really looking forward to a spec release for
For posterity, the implementation is done (and has been for a long while) and we maintain it against the upstream changes. We just need a spec to GA so we can release it. All this said, there will no doubt be 1.0-only registries going into the future, as some registries may never adopt the proposed 1.1 features. Unless the project intends to deprecate the client-managed options proposed in the Distribution spec 1.1 RC, I think we should consider this as a permanent option.
Ok, I'll look into a PR for this for your review. I think the name is a bit confusing, what do you think about |
Just curious, why did we decide to deprecate the flag in RC2? I may have missed that and justification for it. |
@jlbutler @qweeah @sajayantony @toddysm Thanks for the conversation above related to garbage collecting the outdated referrers index. To put us on the same page, I'd like to summarize the issue statement, its cause, and related history. Issue Analysis
According to @jlbutler, ECR does not support the referrers API and falls back to the referrers tag schema. Since ECR ensures graph consistency, the last step (step 7) fails as the target referrer manifest is still referenced by the outdated referrers index. As @qweeah mentioned, bring back the flag of setting Related DiscussionsThe With the same rationale as the #1228 (comment) by @jlbutler, While the flag |
Thanks for this detailed summary, @shizhMSFT. It's great for this conversation as well as for posterity's sake. One remaining question regarding a new option and its default value. In 1.0 without options, the stale index was deleted by default and there was no option to skip that. In 1.1 the default is to not delete the stale index, and there is no option to delete it. What are our thoughts at this point around backward compatibility? Is it ok to once again delete it as with 1.0, and offer an option to preserve? Or should we not delete it as with 1.1, and offer an option to delete? Once we are agreed on this, I'll follow up with a PR. |
Based on the intention of #1041 (especially the alignment with registry behavior), I prefer not changing the default behavior. In other words, I prefer
|
The question in my opinion is really just around backward compatibility of the Oras client, rather than alignment with registry behavior. As we have discussed in this thread, registry behavior between implementations is inconsistent. These differences are OCI compliant, in that registries MAY implement deletion. To bring the context back to the history of client-managed index behavior, from above:
To me, 4) seems like a workaround for an integration bug found with 1.0 for registries that don't support deletion, while 5) changes default behavior to work around that bug permanently, but introduces other issues by dropping the option altogether. 1.0 traded an integration issue for a different one in 1.1. It also introduced a breaking change, not by dropping an experimental option, but by changing default behavior. This 1.1 default behavior also has the side effect of creating stale content in the repository. Given all this, I lean toward wanting to revert to 1.0 default behavior, with an optional skip for delete (as in As I said, I'll support maintainers' preference on this, I just want to make sure we're looking at the whole issue. And I know I missed prior conversations, but I think this statement from #1041 was fairly hopeful:
I think the project should assume that even beyond 1.1 GA it's likely that some registries won't ever support its features. It seems in a lot of threads the client-managed index is seen as temporary, but it might be worth considering it more of a legacy support option. Sorry this got long, just want to make sure I'm making my point clearly. I'll start going in the direction of default retain stale index, but we can hopefully discuss further. |
@jlbutler If we agree that this is an valid enhancement request, would you be able to own it? Do you want to bring this fix into the upcoming release (ORAS v1.2.0) by Feb 6? |
Obviously I didn't get this handled by Feb 6. I'm still willing to do it, just got pulled away on some other things. Can get back to it in the next week or so. |
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days. |
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days. |
This issue was closed because it has been stalled for 30 days with no activity. |
What happened in your environment?
oras manifest delete
failed to delete a referrer in an ECR repo. ECR will protect against deletion of content listed in an index, but the operation removes the reference, then deletes.It turns out that when the index is updated, a new one is pushed but the old one remains. ECR will not delete the manifest as it is still listed in the old index.
What did you expect to happen?
Manifest deleted, updated index in place with referrer removed
How can we reproduce it?
// current client-managed referrers index
$ curl -sX GET -H "Authorization: Basic $TOKEN" https://510431938379.dkr.ecr.us-west-2.amazonaws.com/v2/test-delete/manifests/sha256:29cd94faa1e63cf0052e95e39b222fb3e4bea8de4025b92817c1e447f51737f4 | jq . | grep 78ba
"digest": "sha256:78bab7e405d0d2bac6b410e8873e67a004780e57e71006be8f5c85700cbc080a",
// delete fails
$ oras manifest delete --force --distribution-spec v1.1-referrers-tag 510431938379.dkr.ecr.us-west-2.amazonaws.com/test-delete@sha256:78bab7e405d0d2bac6b410e8873e67a004780e57e71006be8f5c85700cbc080a
Error: failed to delete 510431938379.dkr.ecr.us-west-2.amazonaws.com/test-delete@sha256:78bab7e405d0d2bac6b410e8873e67a004780e57e71006be8f5c85700cbc080a: DELETE "https://510431938379.dkr.ecr.us-west-2.amazonaws.com/v2/test-delete/manifests/sha256:78bab7e405d0d2bac6b410e8873e67a004780e57e71006be8f5c85700cbc080a": response status code 500: Internal Server Error
// new index has been updated
$ curl -sX GET -H "Authorization: Basic $TOKEN" https://510431938379.dkr.ecr.us-west-2.amazonaws.com/v2/test-delete/manifests/sha256:d49c218d6a49444fbba31546481402bd2d18db14072b304463d210515e3e2df8 | jq . | grep 78ba
// old index remains
$ curl -sX GET -H "Authorization: Basic $TOKEN" https://510431938379.dkr.ecr.us-west-2.amazonaws.com/v2/test-delete/manifests/sha256:29cd94faa1e63cf0052e95e39b222fb3e4bea8de4025b92817c1e447f51737f4 | jq . | grep 78ba
"digest": "sha256:78bab7e405d0d2bac6b410e8873e67a004780e57e71006be8f5c85700cbc080a",
What is the version of your ORAS CLI?
1.1.0
What is your OS environment?
macOS / M2
Are you willing to submit PRs to fix it?
The text was updated successfully, but these errors were encountered: