Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed curio api #34

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Proposed curio api #34

wants to merge 2 commits into from

Conversation

hannahhoward
Copy link
Member

Proposes an HTTP API to curio that could be used by either Storacha or another hot storage market provider for PDP proofs

Preview

proposes an HTTP API to curio that could be used by either Storacha or another hot storage market
provider for PDP proofs

With the additional stipulation that they should probably ONLY accept v2 Piece CID: https://github.com/filecoin-project/FIPs/blob/master/FRCs/frc-0069.md

### PUT /piece/{piece cid v2}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might require a UCAN header for auth

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also for scalability it would be a good idea to have the main endpoint tell you where to go with the data.

Unfortunately redirecting PUT/POST with 3xx is really wonky (pretty much broken with the Go http.Client), so it might just require some two-endpoint setup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok. I get it -- our main endpoint is like this too! :)


*TODO: do we need an interim response given this is a chain transaction with a place to fetch the set-id later?*

### GET /proof-sets/{set-id}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe link to some spec which lays out how those proof-sets look like / what they are

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Truth be told I'm just betting based off the PDP service contract doc

There are addition considerations we should consider:
1. Authorization -- In storacha's network, it's important that the original end user maintain control of authorization for any action performed (including retrieval). We accomplish this through UCANs. We should discuss how we can maintain this without forcing curio to implement a full UCAN authorization process.
2. Aggregation - storacha's data is at times extremely small (<1mb in certain cases). Our understanding is that economically, it makes more sense to do some light aggregation of data before adding it to the proof set. The proposal below outlines a facility for doing this. While storacha would store pieces as it receives them, we would add them to the proof set in a seperate step, with a root that could optionally be an aggregate of several pieces.
3. IPNI announcements -- we plan to use IPNI announcements in a specific way with our pieces. Our understanding is the curio IPNI flow is in flux. We can try to integrate your IPNI api or just do it ourselves.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IPNI is pretty much implemented now, with the following schema that coordinates it on the curio side:

-- Table for storing IPNI ads
CREATE TABLE ipni (
    order_number BIGSERIAL PRIMARY KEY, -- Unique increasing order number
    ad_cid TEXT NOT NULL,
    context_id BYTEA NOT NULL, -- abi.PieceInfo in Curio
    -- metadata column in not required as Curio only supports one type of metadata(HTTP)
    is_rm BOOLEAN NOT NULL,

    previous TEXT, -- previous ad will only be null for first ad in chain

    provider TEXT NOT NULL, -- peerID from libp2p, this is main identifier on IPNI side
    addresses TEXT NOT NULL, -- HTTP retrieval server addresses

    signature BYTEA NOT NULL,
    entries TEXT NOT NULL, -- CID of first link in entry chain

    unique (ad_cid)
);

CREATE TABLE ipni_head (
    provider TEXT NOT NULL PRIMARY KEY, -- PeerID from libp2p, this is the main identifier
    head TEXT NOT NULL, -- ad_cid from the ipni table, representing the head of the ad chain

    FOREIGN KEY (head) REFERENCES ipni(ad_cid) ON DELETE RESTRICT -- Prevents deletion if it's referenced
);

-- This table stores metadata for ipni ad entry chunks. This metadata is used to reconstruct the original ad entry from
-- on-disk .car block headers or from data in the piece index database.
CREATE TABLE ipni_chunks (
    cid TEXT PRIMARY KEY, -- CID of the chunk
    piece_cid TEXT NOT NULL, -- Related Piece CID
    chunk_num INTEGER NOT NULL, -- Chunk number within the piece. Chunk 0 has no "next" link.
    first_cid TEXT, -- In case of db-based chunks, the CID of the first cid in the chunk
    start_offset BIGINT, -- In case of .car-based chunks, the offset in the .car file where the chunk starts
    num_blocks BIGINT NOT NULL, -- Number of blocks in the chunk
    from_car BOOLEAN NOT NULL, -- Whether the chunk is from a .car file or from the database
    CHECK (
        (from_car = FALSE AND first_cid IS NOT NULL AND start_offset IS NULL) OR
        (from_car = TRUE AND first_cid IS NULL AND start_offset IS NOT NULL)
    ),

    UNIQUE (piece_cid, chunk_num)
);

Now, IPNI likes larger ads, so ideall storacha would create aggregate ads for multiple pieces; we can extend ipni_chunk to support reading from storacha-stored pieces (though really technically just the piececid works fine there)

## Piece Storage


### POST /piece
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely need to define how authorization works on this endpoint. This can't just be entirely open.

Also should define the lifecycle of the uploaded data somehow:

  • How long is it expected to stick around in storage after upload before being included in a proof-set? When should the data be removed if not added to a proof set?
    • Signalling for expected indexing with IPNI / ipfs-type (trustless gateway/bitswap) retrievals, and who can retrieve the piece?
  • What is the contract for retrieval - is it retrievable atomically when the notify hook is called? After inclusion in a proof set?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also what are the size bounds for pieces that you expect curio to support? We can support even very large pieces (100G+), but I don't think a client-push model is a good idea above 1GB, where managing short-term buffers becomes a real concern, and download retry becomes non-optional

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants