Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A stop gap solution for blob implementation until invocation spec #1339

Open
vasco-santos opened this issue Mar 22, 2024 · 9 comments
Open

Comments

@vasco-santos
Copy link
Contributor

vasco-santos commented Mar 22, 2024

See blob protocol storacha/specs#115

  1. In the meantime I suggest we go with blob/* instead of space/content/*/blob and web3.storage/blob/* instead of service/blob/* for new capabilities until invocation spec given we are still not at UCAN 1.0 and links to tasks vs invocations are going to be a pain to deal with otherwise ✅
  2. I think we have a similar pattern as w3-filecoin. Here is where my mind is:
    a. blob/add queues for allocation if content is not already allocated in given space (resource of invocation)
    b. blobAllocateQueue consumer reads blob requests for allocation and self invokes (by w3s) blob/allocate , which if space has enough bytes available will insert in an allocationStore record with content+space.
  3. We store in our store/table (counterpart of allocationStore) this input. Looking at blob/allocate we are missing issuer and size in the args. Size is part of blob, so everything is fine there. We have the taskCid though, which would allow us to find out the issuer if needed, but it may be an old artefact and not needed ✅
  4. I assume we are opting on the implementation to invoke blob/add second time when bytes were written to presigned URL. Can you confirm my assumption? ✅
  5. car/multihash checks on migration path ✅
    a. we currently have carpark-prod-0 in R2/S3. When we perform a store/add we check bytes are there so that we ask users to write things.
    b. we now move to write things by multihash and I am wondering if you have any thoughts on deduping here? Could we simply test multihash with CAR codec and give location claim?
    c. actually same also happen on our storeTable today. We key it by the link…
@Gozala
Copy link
Contributor

Gozala commented Mar 22, 2024

I assume we are opting on the implementation to invoke blob/add second time when bytes were written to presigned URL. Can you confirm my assumption?

In terms of implementing /service/blob/accept and specifically how does client signal that they've completed upload I suggest that we do whichever is easiest option from following two:

  1. Make client sent second blob/add invocation after they've done upload so we can perform a lookup.
  2. Add another temp capability with empty output either very specific like blob/add/poll or very general like invocation/wake { task: CID } .

@Gozala
Copy link
Contributor

Gozala commented Mar 22, 2024

In the meantime I suggest we go with blob/* for new capabilities until invocation spec given the current implementation limitation with task/delegations. Just not sure if we also want a temporary prefix for service ones?

Spec is written with UCAN 1.0 format and assuming storacha/RFC#12 however in terms of immediate implementation I suggest that we instead

  1. Use blob/add instead of /space/content/add/blob
  2. Use web3.storage/blob/* in place of /service/blob/

I suggest above because I think it would make more sense to transition to storacha/RFC#12 once we have [email protected] implemented, because I suspect links to tasks vs invocations are going to be a pain to deal with otherwise. This will give us cleaner break.

@Gozala
Copy link
Contributor

Gozala commented Mar 22, 2024

I think we have a similar pattern as w3-filecoin. Here is where my mind is:
a. blob/add queues for allocation if content is not already allocated in given space (resource of invocation)
b. blobAllocateQueue consumer reads blob requests for allocation and self invokes (by w3s) blob/allocate , which if space has enough bytes available will insert in an allocationStore record with content+space.

I wrote some comments on how I imagine this could work, but I think we need to try to determine if it is in fact viable.

@Gozala
Copy link
Contributor

Gozala commented Mar 22, 2024

3. We store in our store/table (counterpart of allocationStore) this input. Looking at blob/allocate we are missing issuer and size in the args.

We have the taskCid though, which would allow us to find out those two, however I would prefer to make them just part of the param instead. Would you think this is reasonable? I honestly do not know why we store the issuer and the size . Size may be nice to have to query all items in space and get sum of their sizes without HEAD request, and we actually return it in the proposed receipt. So I think that I would def add it

We aren't missing size because Blob is { size, content } see

https://github.com/web3-storage/specs/blob/9cc1b039431ed2853b30e5874c5fe3a1acf7876e/w3-blob.md?plain=1#L263-L267
https://github.com/web3-storage/specs/blob/9cc1b039431ed2853b30e5874c5fe3a1acf7876e/w3-blob.md?plain=1#L188-L191

I wonder why do we care about issuer though, I'll try to investigate. Without further context I would argue we should not care who the issuer is as long as space has authorized them to do the invocation, but maybe I'm missing some context here ?

I also would suggest say that we can simply require that linked task cause got embedded in the invocation, that way we'll have an issuer info (at least before [email protected]), but even that we should probably figure out why do we have it in first place.

@Gozala
Copy link
Contributor

Gozala commented Mar 22, 2024

I did little bit of archeological digging to try to find out where the issuer is coming from and it looks like it came from this PR storacha/w3infra#48

Which in turn seems to have come from hugo's table https://github.com/web3-storage/w3up/blob/cc0f4f44c06a5884c6af50cf739dad6164d8e468/packages/access-api/migrations/0000_create_spaces_table.sql

So I have feeling that nothing really depends on this and we could probably drop the field, but we probably need to change types and see if there is anything that depends an that field

@Gozala
Copy link
Contributor

Gozala commented Mar 22, 2024

car/multihash checks on migration path
a. we currently have carpark-prod-0 in R2/S3. When we perform a store/add we check bytes are there so that we ask users to write things.
b. we now move to write things by multihash and I am wondering if you have any thoughts on deduping here? Could we simply test multihash with CAR codec and give location claim?
c. actually same also happen on our storeTable today. We key it by the link…

I'm not sure I fully understand the question here. But I'll try my best to address what I think you're getting at

a. We can derive CAR cid from the blob multihash, simply by calling Link.create with CAR.code and multihash. Se we could perform the same check.
b. Not sure I fully understand, but yeah we could check if we have CAR for that multihash and if we do we could issue location claim that strips of the CID header, basically taking a cid.multihash.bytes
c. I'm not entirely sure what are the assumptions in the store table, but we should:

  1. Consider if we should reuse store table or make another one.
  2. We could always map multihash to CID to keep link invariant, but I would suggest using RAW codec instead of CAR if we go that route.
  3. Consider what queries we run over that table and how those would be affected by non CAR cids.

@vasco-santos
Copy link
Contributor Author

Thanks for the follow ups @Gozala and looking into the issue thing. I think we are quite aligned on 1, 3 and 4 ✅ I will follow up on points for the remaining discussion items

@vasco-santos
Copy link
Contributor Author

vasco-santos commented Mar 25, 2024

on point 5 @Gozala your assumptions were kind of right on for the first part. So yes, my proposal was to do exactly that "simply by calling Link.create with CAR.code and multihash", but the general question/discussion topic is IF and HOW we want to deal with old world.

More specifically for both stores we have at the moment: carpark (Bucket for car bytes keyed as link/link.car) and store-table. If we do not check on these with injecting the CAR codec to compute such link, we will receive a lot of repeated content. But also needing to deal with two versions of the world is hard and makes more expensive needing to verify for everything if we have in these tables/buckets new or old keys... So, we need to define what migration/rollout strategy we would like here. Probably worth to involve Hannah, so I will write a more specific issue for this

EDIT: #1343

@vasco-santos
Copy link
Contributor Author

vasco-santos commented Mar 25, 2024

@Gozala regarding point 2, as well as your thoughts provided on #1340 (comment):

  1. ok, so we are actually more closely aligned than I thought based on previous code snipped shared in spec. This was as I understood an example on more futuristic Invocation spec, and not what we can do today or we would need to block on accept.
  2. I will answer on feat: blob implementation #1342 (review)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants