-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dealing with partially valid claims #37
Comments
I think this is adjacent to, but not quite the same as the issue that @vasco-santos is raising around "if we make it easy and permissionless to create claims and we read all claims in an undifferentiated manner during a read, we allow for the possibility of folks slowing down or denying access to content". An example there would be "create a million invalid partition claims for a popular content cid" Vasco is offering us some ideas for mitigation of this in the short term... make the system more restrictive about who can make claims about what until we have a proposal for some kind of reputation system or per claim scoring. To my ears it sounds like "you can only make claims that are about or include CIDs that you have in your space" sounds like a quick win to slow down the (potential) spammers. If we prefer claims signed by the system, we could get away with nothing more than listening out for complaints and having a CLI or CI job that we can invoke ourselves to make a valid claim for any cid that is getting spammed (until such time as claim spam becomes a thing) |
I think our current problem is misalignment of what we need on reads vs what we put in claims. Specifically if claims does not include all the information we need for specific read it is going to be incomplete yet not incorrect. If claim contains more information than we need for specific read we are not able to determine if claim is correct. This tells me that our claims are too generic trying to cover different reads and consequently be valid for some reads and invalid (incomplete) for the other reads. Solution here probably is to have different claims for different kind of reads we support so they can't be incomplete. As far as I understand we currently support following types of reads
Lets consider each one and how could we satisfy them CAR read by CIDThis is the most straight forward using
Block read by CIDI don't think we currently have the claim to describe this well, so I'll suggest something like type AssertDigest = {
digest: Uint8Array // multihash digest
payload: CID // perhaps should be multihash as well
offset?: number // offset within the payload
length?: number // number of byes been hashed
} We could possibly use even more compact representation if would like. Also we don't care the block CID because codec and cid version is irrelevant.
DAG read by CIDI think that is more or less what type AssertLayout = {
// Root node
content: Link
// Links across all the DAG nodes
edges: Record<Link, Link[]>
// Reference to all the nodes required to build this DAG
source: Record<ToString<Shard>, Record<ToString<Multihash>, {offset?: number, length?: number}>>
} We could probably do even better for UnixFS stuff without storing all the intermediary nodes as those could be assembled on the fly, but that probably would introduce enough complexity that it's best to leave out for now |
We could probably fold block read by CID and DAG read by CID into same claim, if there are edges it in the DAG it effectively becomes the block |
I agree that this is a "we need to iterate on the claims" problem. I don't want folks to have to send us a claim per block. For example, what if the existence of an inclusion claim (carCid, indexCid) expanded such that you could then query for any location in the index. The user has already provided block level "this multihash is at this position in that car" assertions, we're just not exposing it... You can only reach it via already having the car cid. In this case we want to facilitate the reverse look up. |
Note that we share batches of multihashes in a single IPNI advert, but we can query it by any multihash in the batch. By using the carCID as the ipni advert context id, we can use IPNI to map from any multihash to the CAR it's in, which is probably how I'm gonna keep bitswap working. |
@Gozala i like where these new claim shapes are going. for e.g if content is a single block in car we can make a location claim that says "this block lives at that url at that byte range". Is there anything else we need there? |
This is an interesting direction. @alanshaw and I have been talking about "top of tree" claims a lot, where all the non-leaf block are captured and shared in one round trip for the reader. However that still implies a use-case where folks are doing block level reads rather than just CARs as slabs of bytes. /musing |
Our existing claims allow for the possibilty of partially valid claims to be created. For example
assert/partition (rootCid, [carCid,...])
maps a content cid to a set of car cids. It is therefore possible to make claims wherethe partial claim can be used to find some of the cars the dag is in, but not all of them. In the absence of a fully valid claim, it's existence would be strictly better than nothing.
aside: this is how our
upload/add
capability works today. Users send us a(rootCid, carCid)
pair and send the same rootCid with a different carCid each time to build up the CAR set, like a partition claim builder. Eachupload/add
call is a partial (or invalid)assert/partition
claim.Other examples
a
assert/inclusion (carCid, indexCid)
claims where the car index is incomplete. Where we only want to read a given sub-dag/file/entity from a super set, and the car index includes entries for those cids, then it's useable, even tho it does not index every block in the car.a
assert/inclusion (carCid, indexCid)
claims where the car is incomplete or contains a block with bytes that don't match it's cid. (We assert that we will store CARs as slabs of bytes, we don't ensure the CARs are valid at the block level)... Where we only want to read a given sub-dag/file/entity from a super set, and the car includes blocks for those cids, then it's useable, even tho the car does not include every block listed in the index.The text was updated successfully, but these errors were encountered: