Semantics of @context and @id #2

ikreymer · 2015-10-07T23:59:24Z

The @context and @id are borrowed from JSON-LD.
However, I'm not sure it makes sense to bring in the entire spec for these, given: http://www.w3.org/TR/json-ld/#the-context

Perhaps the values should be restricted to a single IRI/URI, or a very specific subset applicable to this use case.

Or should the full JSON-LD spec be supported?

The text was updated successfully, but these errors were encountered:

ibnesayeed · 2015-10-08T00:26:15Z

I said it is JSON-LD "inspired", because there are a few differences that we will have to deal with. JSON-LD @context applies to the underlying hierarchy in the object tree where it is present unless it's overridden, but it can't affect siblings and other higher nodes in the tree. However, in case of ORS, we want to put it in the meta section and be it applied to the entire document like a mix-in. Another feature that I envision with the @context keyword in ORS format is to use it like XML namespaces. There is a brief discussion on this topic in the blog post where I said:

@context provides context to the keywords used in the rest of the document. The value of this entry can be an array of contexts or an object with named keys. In the case of an array, all the term definitions from all the contexts will be merged in the global namespace (resolving name conflicts will be the responsibility of the document creator) while in the case of a named object it will serve like the XML Namespace.

ikreymer · 2015-10-08T02:06:40Z

Hm, I'm not sure about all of this.. If CDXJ is to be on the same level as JSON, and NDJSON, I think this may be extraneous..

I want the CDXJ parser to be very simple, and this sounds like we'll need namespace merging, schemas, and schema validators!

Maybe there's a use case for CDXJ-LD that adds these additional semantics..

For my use cases, I think a URN that identifies the format without any further semantics may be sufficient. Maybe (maybe) a list of require fields that must be present in the JSON value.

I think these can all be specified with a single @meta prefix (either on same or different lines).
Actually, why not just @

@ {format: "urn:CDXJ:archive_index"}
@ {keys: ["url", "timestamp"]}
@ {values: ["offset", "length", "filename"]}

For parsing reasons, I would also like to restrict the value to always be a JSON {} dict, and not a [] list.

Validation, if needed, would be pretty trivial, ensure that each key has two fields, and ensure that the 3 values are found in the JSON dict of each line.

ikreymer · 2015-10-08T03:09:45Z

Using the new semantics outlined in #3, one could support equivalent of @context and @id
for a stricter CDXJ-LD by including the header:

 {"context": "..."}
 {"id": "..."}
 {}

But this should not be part of the core CDXJ spec, I think..

ibnesayeed · 2015-10-08T04:20:10Z

Having linked-data semantics is an optional feature that allows interoperability and unambiguous naming. It is perfectly fine if a parser does not support it initially. However having the doors open for such possibilities is always a good idea. There is no XML-LD because it is possible to add semantics in the XML document itself while in the case of JSON-LD it was an after-thought, hence it was extended by a third-party.

I think these can all be specified with a single @meta prefix (either on same or different lines).

It is possible to crunch everything in a single line, but it might not be a pleasant way in cases where the data is split in arbitrary parts or more than one files are merged to make a large file. Allowing multiple metadata entries and merging them at run time reduces a lot of burden when the data is not aware of other pieces. On the other hand apart from some special keys such as @id and @context it is perfectly fine to merge rest of the metadata in a single line.

For parsing reasons, I would also like to restrict the value to always be a JSON {} dict, and not a [] list.

I am not sure what parsing ease will it bring to support one JSON format, but not the other. Additionally, it will limit the usefulness and variety of data we can store. Also, the list form implies a strict ordering which is not the case with dict format. Consequently, we will have to rely on some other means to encode some semantics that are otherwise achieved easily. For example, @keys metadata in CDXJ has the value field as an array of keys where there order of keys is same as they appear in the prefixes.

ikreymer · 2015-10-08T04:39:53Z

It is possible to crunch everything in a single line, but it might not be a pleasant way in cases where the data is split in arbitrary parts or more than one files are merged to make a large file. Allowing multiple metadata entries and merging them at run time reduces a lot of burden when the data is not aware of other pieces. On the other hand apart from some special keys such as @id and @context it is perfectly fine to merge rest of the metadata in a single line.

I did not mean a single line, but using only a single prefix, either @meta or @.
Instead of @keys ["a", "b] we just have @meta {"keys": ["a", "b"]}.

Users can add whatever metadata they like, and no reason why @keys should be special. Same for the other special @ fields, I think. It's either metadata or part of the data.

Having linked-data semantics is an optional feature that allows interoperability and unambiguous naming. It is perfectly fine if a parser does not support it initially. However having the doors open for such possibilities is always a good idea. There is no XML-LD because it is possible to add semantics in the XML document itself while in the case of JSON-LD it was an after-thought, hence it was extended by a third-party.

But there's a reason we no longer use XML :) The neat thing about JSON is its simplicity and I'd like to preserve that here.. I am not saying its not useful, I think that it should be treated as an add-on, not necessarily core part of the spec..

ibnesayeed · 2015-10-08T05:21:28Z

I did not mean a single line, but using only a single prefix, either @meta or @.
Instead of @keys ["a", "b] we just have @meta {"keys": ["a", "b"]}.

Users can add whatever metadata they like, and no reason why @keys should be special. Same for the other special @ fields, I think. It's either metadata or part of the data.

Conceptually it has simply two parts, header/meta/special (let's call it header for now) and the data portion. The prefix for header is neither @ nor @meta, but it is @* or more precisely @\w+. However, there is a good reason why crack the outer shell of the header object and allow some top-level special header keys to appear separately. Metadata means the data about the data, but the header section does not contain metadata only, it has some other things as well. For example, the ID, the context that does not describe the data, but the fields and keywords used in the file, the keys that does not describe the data or content, but the how it is organised, then comes the metadata such as what is the data about, when it was created, who owns it, and whatnot. There is another practical and engineering oriented reason why we need to split them, as one of the goals is to make it text processing tool friendly which means often finding the desired "data" without essentially parsing the file in the first place and it is quite practical to grep for $@id\b to read the identifier line, the same is true for other similar header fields, but the metadata consumption may not be at the same level.

But there's a reason we no longer use XML :) The neat thing about JSON is its simplicity and I'd like to preserve that here.. I am not saying its not useful, I think that it should be treated as an add-on, not necessarily core part of the spec..

You do know that the flexibility of XML is not the cause of it's degrading popularity as it does not stop one from using it without namespaces. The reasons of the growing popularity of JSON over XML include it's less verbosity while still maintaining readability, rise of JavaScript which has native support for JSON, natural tendency of change where nothing can be on the rise forever, and some other similar things.

JSON-LD is an extension, but it is still a valid JSON, it is just an additional effort to bring the linked data capabilities to a format that is gaining popularity. Linked data might not be of greater interest for engineers, but it matters a lot to scientists. I will reiterate that initially not making a parser that resolves all the contexts is perfectly fine, but introducing the ability should not be an after thought and should reserve the means to implement that later. Additionally, reusing the efforts that was put in solving similar problems elsewhere is the smart move.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Semantics of @context and @id #2

Semantics of @context and @id #2

ikreymer commented Oct 7, 2015

ibnesayeed commented Oct 8, 2015

ikreymer commented Oct 8, 2015

ikreymer commented Oct 8, 2015

ibnesayeed commented Oct 8, 2015

ikreymer commented Oct 8, 2015

ibnesayeed commented Oct 8, 2015

Semantics of @context and @id #2

Semantics of @context and @id #2

Comments

ikreymer commented Oct 7, 2015

ibnesayeed commented Oct 8, 2015

ikreymer commented Oct 8, 2015

ikreymer commented Oct 8, 2015

ibnesayeed commented Oct 8, 2015

ikreymer commented Oct 8, 2015

ibnesayeed commented Oct 8, 2015