Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation outline router.storage #25

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

hannahhoward
Copy link
Member

@hannahhoward hannahhoward commented Apr 19, 2024

Preview

This attempts to specify router.storage operation based on the designs we discussed in federated MVP

I am not sure if I am getting the UCAN handoffs correct. The reinvoking of blob/allocate and blob/accept on the delegated node feels off... but also there doesn't seem to be an obvious way to not re-invoke -- because blob/allocate inherently involves a negotiation process. Maybe though it would be better to try to do all of blob/allocate synchronously though cause then blob/allocate and blob/accept could be direct calls on the chosen node? I struggling to get my delegations vs invocations correctly.

another question in my mind: is this all compatible with one of these storage nodes also serving customers directly (i.e. they receive /space/content/add/blob calls).

Copy link
Contributor

@Gozala Gozala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hannahhoward for putting it together I think this is a great first draft, although we probably need to iterate bit more in regards to selection of storage nodes.

Also for what it's worth I have been sketching rules syntax for kind of tasks that simply coordinate invocations https://observablehq.com/@gozala/invocation-rules-syntax I think it might be a good idea to describe coordination somewhat along those lines that way all of it could be compiled into next field and we'd only need to implement some generic operators to do candidate selection. I'll try to take stab at it and share here.

sub: SpaceDID
args: {
blob: Blob
storagePreferances?: StoragePreferances
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is worth considering if we want to allow user to change preference after the fact as in is preference mutable ? If so I would then suggest we do that at the space lever (as opposed to each blob level) because that is our primary primitive for mutability. If we do not expect those preference to ever change it still worth a though if preference at block level is better than at the space level ?

Personally I'm inclined towards space level preferences as I expect changing those preferences is going to be a thing because as comment below suggest we intent to expand preferences over time.

rfc/router-storage.md Outdated Show resolved Hide resolved
rfc/router-storage.md Outdated Show resolved Hide resolved

// intentionally left blank for now to expand over time
type StorageProperties = {
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something in me begged to call this StorageCapabilities, which in turn made me wonder if what you have in mind should indeed be described in terms of capabilities provided.

For what it's worth my long term goal had been ability for ucanto endpoint to self describe protocol provided (kind of like GraphQL does) if we had something like that it would remove need for the comment saying endpoint should implement these things and probably could cover stuff here as well.

In other words I'm guessing this is for negotiation purposes and also exactly what client (rewrite) provides you can tell it I want to do this kind of thing and it will find you capabilities (from stored delegations) or tell no can do. I think it would be a good idea to leverage that here also.

Anyway I did not imply we should block this on any of the above, but thought it was worth sharing anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I'm thinking about things that might line up with storage preferences here -- i.e. location, SOC-2 compliance, price, etc.

I dunno if those are true capabilities

Copy link
Contributor

@Gozala Gozala May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what we have coming in UCAN 1.0 and something I already have implemented (but not shipped) is UCAN policy system which is roughly equivalent of WHERE clause in SQL, so when I say StorageCapibaliites I imply something like

{ cmd: '/memory/allocate',
  with: did:...,
  where: [
    // size of the blob should be < 1024
    ["<=", ".blob.size", 1024]],
    // reigion should be aisa or europe
    ["some",
       ["asia", "europe"],
       ["==", ".region", "."]
    ]
  ]
}

As mentioned benefit of using UCAN policy system would be that it would be well defined and will not require us to roll out custom code for figuring out which node providers satisfy desired criteria as we would be able to reuse most logic. It will also give us framework for expressing properties, which will be limiting but in my experience good guardrails to have.

rfc/router-storage.md Outdated Show resolved Hide resolved

#### Router handling of blob accept

After the user puts their blob to the presigned URL, they signal to the router that `blob/accept` can proceed by sending them the UCAN receipt for `http/put`(currently this is handled differently between legacy and 1.0 UCAN)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why do we need to route these as opposed to letting user deal directly with the storage node that allocated and let them perform their own accept ? I would expect that blob/add simply describes coordination with storage nodes by forking set of allocation → put → accept tasks flows and then combines their results into an output.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not going to comment on this section of the spec as I think we probably should not try to route blob/accept but rather combine outputs from storage nodes. It also would remove dependency on local state that otherwise requires side-effects and can be racy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does the router know about the resulting location commitment then?

arguably I guess instead of keeping a table of blob allocations, it could just query our indexing system.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it need to know of it ? If it needs to we can add a step in the pipeline to that effect. That said I don't think router needs to know or care about it, if user wants to utilize received commitment it would have to authorize gateway by delegating commitment effectively giving it permission to perform reads at user expense. If it never shared with gateway or libp2p node user does not wish to share read permission and that seems very reasonable choice IMO.


4. Router fetches location commitment, and creates new commitment for the user.

**Note: initially I thought this was a redelegation, but I think it's maybe not -- the commitment here is to a limited download URL of JUST the blob -- it's not a w3s gateway url with full download capabilities.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think it should be re-delegation, which BTW can impose extra restrictions. I think it should be re-delegation because it would enable us to track reads.

I think gateway URLs are separate deal, specifically I think user should re-delegate location commitment to the gateway to authorize them to serve content through gateway because they will be billed for each read gateway will utilize. That also would enable user to revoke such delegation to stop gateway from serving said content at their expense.

- the web3.storage read pipeline
- the web3.storage write pipeline

As long as we're the only ones running the read pipeline, it can all go through w3s.link, but I imagine a future situation where other people want to use another read pipeline provider.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, which is why I think user can delegate location commitment to say did:web:w3s.link authorizing it to serve this content at their expense, but they also could go and authorize other gateways or libp2p nodes for different read pipelines.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

```json
{
"iss": "did:web:node1.storage",
"aud": "did:web:web3.storage",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes me wonder if we actually want to be in the middle of the authorization chain or whether storage nodes should delegate to the space directly. I imagine we may want different choice here for different use cases. If we are in the middle we are taking on some custody but could potentially reuse stored blob (e.g. store once but charge multiple users) but if things fail we're accountable. If we aren't in the middle than it is all between user (space) and the storage node itself, no deduplication opportunity here and if things fail it's storage node who is accountable to the user (space).

["==", ".range[1]", 2_097_152],
],
// does not expire
"exp": null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we are in the middle we could issue non-expiring commitments and take on the task of renewing deals with storage nodes to uphold this commitment. On the other hand if we aren't in the middle then commitment really should have a expiry.

Copy link
Contributor

@vasco-santos vasco-santos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting, as it goes a bit on a different direction that I have been thinking. I would like to call out how I was thinking this interactions would look like and would be great to at least compare trade offs.

I would expect web3.storage did to always be the audience of blob/allocate and blob/accept (this also allows the client to derive task CID of them, in order to query receipts given they would have all the information).

The core difference in my mind, would be that web3.storage would rely on such kind of router when running allocate task with a new set of capabilities, let's consider a flow:

  1. client invokes blob/add to be executed by did:web:web3.storage
  2. client derives blob/allocate and blob/accept task CIDs and subscribes for receipts available for them
  3. did:web:web3.storage schedules allocate task to run on their schedule
  4. once blob/allocate runs, it would invoke a storage-node/offer to a router principal.
  5. router principal would discover available hot storage nodes on the network to store content based on the provided StorageConfiguration
  6. blob/allocate finishes when it can find a receipt for storage-node/offer with the write target address for the client to use
  7. client writes the bytes
  8. client issues receipt for http/put
  9. did:web:web3.storage verifies bytes are there and runs blob/accept

As previously mentioned, I think the core advantage here is that client has all the information to derive taskCID for the pending tasks service will run. If blob/allocate and blob/accept become dynamic, that would not happen and we can still put on blob/allocate effects for the storage-node/offer + storage-node/accept receipts so that user can follow entire receipt chain. Moreover, this provides a much easier avenue to introduce the billing/usage capabilities, as well as metrics tracking.

On the billing side of things, it is not clear to me on this approach how would we have a billing related capability tied with allocation if allocation runs on a third party.

@Gozala
Copy link
Contributor

Gozala commented Apr 22, 2024

Put together the sketch what I think it could look like if with the rule syntax I've linked to prior

;; Type definition for the Blob. It has no input attributes and an
;; output with two attributes `digest` and `size`.
;; ℹ️ It is kind of like a rule without a body
(Blob : { :digest Bytes :size Int})

;; Site is also a type definition with no input and only an output
;; with two `url` and `headers` attributes
(Site : { :url String :headers (Map :key String :value String)})

;; Result is a type constructor that has two input attributes. Primary 
;; unlabeled attribute attribute addressed by `.self` symbol and a labeled
;; attribute `error`, addressed by `.error`. Result is a variant (sum) type
;; that is either {:ok .self} or {:error .error}.
(Result : [ :ok .self :error .error])

(Error : {:message String})

;; Storage node is a DID and UCAN authorization chains (in bytes) that
;; authorize router to invoke `memeory/allocate/blob` and `memory/commit/blob`
;; capabilities on that storage node.
(StorageNode :
  {:did ProviderDID
   :access {:allocate Bytes :commit Bytes}})

;; Alloctaded memory represents a memory address on a storage node where
;; corresponding blob can be stored. It has `headers` attribute that is expected
;; to provide authorization needed to perform `http/put` operation.
(AllocatedMemory :
  {:url URL
   :headers (Map :key String :value String)
   :expires Int})

;; User space for data, analogous to the bucket in S3. It is expected to have
;; various preferences like replication factor, region, etc. So far those are
;; are ommited as they are not yet defined.
(Space : {:did DID})


;; Protocol that Storage Node MUST implement

;; Allocates memory for a blob on a storage node. Can fail if node is unable
;; to allocate memory for the given blob.
(memory/allocate
 :self ProviderDID
 :blob Blob
 :space Space
 : (Result AllocatedMemory :error Error))

;; Validates memory and issues commitement
(memory/commit
 :self ProviderDID
 :memory AllocatedMemory
 : (Result Commitment :error Error))




;; Describes storage node candidate, by annotating it with a rate and status.
;; The higher rate corresponds to a user preference. Error status indicates
;; is disabled due to some error, most likely due to a previously failed attempt.
(StorageCandidate :
  {:node StorageNode
   ;; Higher the rate the better. If status is an error candidate is not to
   ;; be considered.
   :status (Result {:rate Int} Error)})
   

;; Internal function that is used to choose the most suitable storage node
;; given a blob target space and a list state of storage node candidates.
(candidate/choose
  :blob Blob
  :space Space
  :candidates StorageNodeCandidates
  : (Result StorageCandidate :error Error))

;; Internal function that is used to disable a candidate due to some error.
(candidate/disable
  :candidates StorageNodeCandidates
  :node ProviderDID
  :error Error
  : (Result StorageNodeCandidates :error Error))

;; Internal function that returns list of all storage node candidates
(candidate/list : (Result StorageNodeCandidates :error Error))

;; Internal function that is used to derive DID from a the blob digest such that
;; blob holder can derive a DID for it also.
(did/from/digest
  :digest Bytes
  : (Result {:did DID} :error Error))

(blob/add
  :self DID
  :blob Blob
  : (Result {:site Site} :error Error)
  {
   ;; Derive DID for the actor responsible for uploading the blob
   :user (did/from/digest .blob.digest)
   ;; get list of all available storage node candidates
   :candidates (candidate/list)
   ;; get space for the given DID
   :space (space/get .self)
   ;; invoke blob/add router to handle the rest of the task
   : (router/blob/add
      :blob .blob
      :user user.ok.did
      :space space.ok
      :candidates candidates.ok) })

(router/blob/add
  :space Space
  :user DID
  :blob Blob
  :candidates (Map :key ProviderDID :value StorageCandidate)
  : (Result {:site Site} :error Error)

  {
   ;; chooses the best storage node candidate
   :storage (candidate/choose :blob .blob :candidates .candidates)
   ;; alloctes memory on the chosen storage node for the given blob
   :memory (memory/allocate/blob
            :self storage.ok.node.did
            :space .self
            :blob .blob)
   ;; request blob to be uploaded to a given url
   :put (http/put
         :self .user
         :url :memory.ok.url
         :headers :memory.ok.headers
         :body :blob.digest)
   
   :commit (memory/commit/blob
            :self .storage.ok.node.did
            :memory .memory.ok
            :: put.ok)
   
   ;; If candidate failed to allocate memory we want to retry with another one
   ;; candidate. Note that this only runs if memory allocation failed because
   ;; we reference `.memory.error` which does not exist if memory allocation
    ;; was successful.
   :retry (router/blob/add
           :self .self
           :blob .blob
           :candidates (candidate/disable
                         :candidates .candidates
                         :node .storage.ok.node.did
                         :error .memory.error))
   
   ;; If allocation failed we want to retry with another candidate
   ;; otherwise we commit with current candidate. Note that we put
   ;; retry first which will fail if memory allocation was successful
   ;; as it will have no error branch used in retry.
   : (result/or [retry commit])})

@Gozala
Copy link
Contributor

Gozala commented Apr 22, 2024

I would expect web3.storage did to always be the audience of blob/allocate and blob/accept (this also allows the client to derive task CID of them, in order to query receipts given they would have all the information).

Please note that per my sketch all of those will be in the next field so client will not need to derive it themself, but they would be able to query by receipt.

@Gozala
Copy link
Contributor

Gozala commented Apr 22, 2024

The core difference in my mind, would be that web3.storage would rely on such kind of router when running allocate task with a new set of capabilities, let's consider a flow:

I think in your version routing logic is made opaque, meaning it is implementation detail not exposed to the caller nor it provides trail of how candidate selection occured or if system had to backtrack and try different candidates.

There are different tradeoffs at play, making that opaque means router gets far more flexibility to make choices or to change behavior is it see fit. It is probably simpler to implement and faster to iterate over.

Alternative version exposes all the node decision making process and attempt to tie it with preferences on the space. It will produce a lot more complete trail of steps and decisions made and potentially provide more granular status updates along the way.

Now that I think about it I imagine candidate/choose would probably be a capability user can install on their space so that it can reflect their preference. Obviously we would have a default in place kind of like auto region in CF.

On the billing side of things, it is not clear to me on this approach how would we have a billing related capability tied with allocation if allocation runs on a third party.

I think billing commands would be part of the sketch workflow, specifically we would issue billing command when allocation succeeds with the duration and blob size, but I think we need to spec out new billing interface before we can do that

@vasco-santos
Copy link
Contributor

vasco-santos commented Apr 22, 2024

There are different tradeoffs at play, making that opaque means router gets far more flexibility to make choices or to change behavior is it see fit. It is probably simpler to implement and faster to iterate over.

Yes, that is why I think this RFC should include these tradeoffs and we should decide on what makes sense. For instance, in w3filecoin makes whole sense you can follow the whole flow because it also MAY take a long time. However, here we should be optimising for all this flow to be fast, and therefore I am looking to understand what would be the advantages of opting for a full receipt chain solution. There may exist merits on it, but I am until now failing to understand them.

Alternative version exposes all the node decision making process and attempt to tie it with preferences on the space. It will produce a lot more complete trail of steps and decisions made and potentially provide more granular status updates along the way.

Now that I think about it I imagine candidate/choose would probably be a capability user can install on their space so that it can reflect their preference. Obviously we would have a default in place kind of like auto region in CF.

Here I think we can agree, as well as on making something with space specific characteristics.

For instance, billing capabilities will probably be opaque, unless we need some kind of on chain primitive that make it required to show a proof. Here, I think it MAY actually make sense to treat this as 2 different systems, with their own receipt chain that you can track. So web3.storage being a kind of a client of router.storage like a user is a client of web3.storage. So, on blob/allocate, I would expect a well known invocation to discover a memory to allocate to a audience called router.storage. router.storage receipt could then have a CID where it links to all the "internal" receipt chain that was made.

@Gozala
Copy link
Contributor

Gozala commented Apr 23, 2024

However, here we should be optimising for all this flow to be fast, and therefore I am looking to understand what would be the advantages of opting for a full receipt chain solution. There may exist merits on it, but I am until now failing to understand them.

My main arguments in favor are

  1. Transparency - All the decisions are recorded and exposed to users for transparency.
  2. User Agency - I don't think one size fits all, that is different users may have different preferences in regards to who they want to store data with. By making process transparent and making decision making process programmable users gain authority to drive decisions.
    • ℹ️ I don't believe you can achieve 2nd without 1st, unless provenance of decision is captured in signed chain there is no way of knowing how did system chose specific storage node and whether it did really reflected user preferences
  3. System introspection - Unless we log all the steps it's really hard to introspect what has occured. We could use conventional logging but I think signed log has advantages above and I believe overhead should be negligible.
  4. Optimization opportunities - Each step is a computation on data, if inputs are the same we can reuse outputs and avoid recomputing same thing like choosing a best candidate storage node.

@Gozala
Copy link
Contributor

Gozala commented Apr 23, 2024

For instance, billing capabilities will probably be opaque, unless we need some kind of on chain primitive that make it required to show a proof.

I don't think it should be opaque, I think it would be really nice if receipt chain captured precisely when we charge user how much and why. Maybe we mean different things by what it means to make it opaque. But I would system to have something like

{ 
  cmd: '/balance/add',
  sub: 'did:web:web3.storage',
  args: {
    consumer: 'did:key:zAlice',
    product: 'storage',
    amount: 128,
    cause: { '/': 'bafy..inv' },
  }
}

Here, I think it MAY actually make sense to treat this as 2 different systems, with their own receipt chain that you can track. So web3.storage being a kind of a client of router.storage like a user is a client of web3.storage. So, on blob/allocate, I would expect a well known invocation to discover a memory to allocate to a audience called router.storage. router.storage receipt could then have a CID where it links to all the "internal" receipt chain that was made.

Yeah so it boils down to whether things should be encapsulated or exposed. If you expose internal chains outside there is no real difference especially if progress tracking can be built into the low level library (like ucanto). If we do want to hide internal details that is different matter, but I personally do not think we should more transparency make it easier to introspect, builds trust and is likely more welcome by w3 community.

It is also worth calling out that I think you are making an assumption that there is a single allocation per blob, however if were to support with higher replication factor that may not be the case. In such scenario we would probably want to allocate memory in several storage nodes and now that becomes a lot more difficult with more opaque system as are no longer able to parallelize allocations and uploads, you'd have to allocate n locations and then upload to them as opposed to doing things concurrently. Or alternatively you'd have to allocate one memory location upload and then replicate it in other nodes, but now there is added costs that user could cut if they were to upload in each memory location.

Co-authored-by: Irakli Gozalishvili <[email protected]>
Copy link
Member Author

@hannahhoward hannahhoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok @Gozala

I think the central question I'm struggling with is how much does traffic go through the router to the storage nodes vs the router simply selects nodes and let's the client communicate directly.

In my design, everything goes through the router, with the usage of nodes largely an implementation detail.

I gather from a number of your comments you feel that the client should quickly start communicate with nodes directly, with the routers central task to produce a priority list of nodes.

So like for /blob/select/store, if it produces a bunch of blob/allocates do you imagine the client trying them one by one? In the current assumption, the router communicates with them one by one till it succeeds, albeit asynchronously and transparently to the client. How would all this get represented in terms of UCAN?

I think a good next step is a discussion between you, myself, and Alan -- I'll try to find time on calendars for that. I do not want this design to spiral out too much, but also I'm pretty sure there are weaknesses in the way it's specified.

For me the most ideal design is no change to the client. For this, I think the right approach is to just make all of the logic of attempting allocations a part of the router, and also synchronous (change from current design), with the client simply receiving the same ucan chain it does currently, which it can use identical logic to process.


// intentionally left blank for now to expand over time
type StorageProperties = {
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I'm thinking about things that might line up with storage preferences here -- i.e. location, SOC-2 compliance, price, etc.

I dunno if those are true capabilities

rfc/router-storage.md Outdated Show resolved Hide resolved
// 1. System attempts to allocate memory in user space for the blob.
{ // "/": "bafy...alloc",
"cmd": "/service/blob/allocate",
"sub": "did:web:web3.storage",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree, I am honestly a tad perplexed on subject dids for some of these operations.

I like the idea of a registration process where they delegate /service/blob/allocate

"sub": "did:web:web3.storage",
"args": {
"space": "did:key:zAlice",
"blob": {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note space is included here?


// 3. System will attempt to accept uploaded content that matches blob
// multihash and size.
"join": { // "/": "bafy...accept",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current implementation does do a join though I'm not sure -- just reading @vasco-santos code

"att": [
{
"can": "/web3.storage/blob/allocate",
"with": "did:web:web3.storage",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see lengthy comment above

}
```

4. On success, router creates a success receipt for the original user
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blob allocate is included


4. On fail, router adds the failed nodes to a list of "alreadyFailed", and returns to step 2.

5. If all nodes fail, return a fail receipt to the user for `blob/allocate`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a loop until success operation.


#### Router handling of blob accept

After the user puts their blob to the presigned URL, they signal to the router that `blob/accept` can proceed by sending them the UCAN receipt for `http/put`(currently this is handled differently between legacy and 1.0 UCAN)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does the router know about the resulting location commitment then?

arguably I guess instead of keeping a table of blob allocations, it could just query our indexing system.

- the web3.storage read pipeline
- the web3.storage write pipeline

As long as we're the only ones running the read pipeline, it can all go through w3s.link, but I imagine a future situation where other people want to use another read pipeline provider.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

@Gozala
Copy link
Contributor

Gozala commented May 8, 2024

Ok @Gozala

I think the central question I'm struggling with is how much does traffic go through the router to the storage nodes vs the router simply selects nodes and let's the client communicate directly.

In my design, everything goes through the router, with the usage of nodes largely an implementation detail.

I gather from a number of your comments you feel that the client should quickly start communicate with nodes directly, with the routers central task to produce a priority list of nodes.

Correct. I think we should get out of the way as much as possible as it costs less and makes role of router less significant.

So like for /blob/select/store, if it produces a bunch of blob/allocates do you imagine the client trying them one by one? In the current assumption, the router communicates with them one by one till it succeeds, albeit asynchronously and transparently to the client. How would all this get represented in terms of UCAN?

What I'm suggesting is to work in tandem with a client in the process of node selection, once candidate is selected let client interact with it if fails authorize another allocation and so on until successful candidate storage node is found. I attempted to illustrate that in #25 (comment)

Note that I do not expect that user will send allocation requests to the storage nodes as they will not have an authorization to do so. Instead we just describe a flow of tasks where we try an allocation on a candidate node
if succeeded get out of the way letting user provide content and receive the commitment, if failed run another task that attempts to choose another candidate and allocate there and recur until all fail or we succeed.

The code I've put together in the linked comment is effectively an illustration of UCAN task coordination system a.k.a promise pipelines (from invocation spec) just utilizing s-expressions. I also demo-ed it last week where I went into more details. General idea is that UCAN scheduler can take care of coordinating all those tasks given that primitive ones are provided via regular handlers.

I think a good next step is a discussion between you, myself, and Alan -- I'll try to find time on calendars for that. I do not want this design to spiral out too much, but also I'm pretty sure there are weaknesses in the way it's specified.

👍

For me the most ideal design is no change to the client. For this, I think the right approach is to just make all of the logic of attempting allocations a part of the router, and also synchronous (change from current design), with the client simply receiving the same ucan chain it does currently, which it can use identical logic to process.

I think we're on the same page. It's more about how much we want to roll out as hand written code hiding details vs expose primitives as UCAN capabilities and describe decision making process a.k.a routing through task coordination (a.k.a promise pipelines)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants