Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider JSON Lines rather than JSON #48

Closed
timgdavies opened this issue Nov 30, 2017 · 2 comments
Closed

Consider JSON Lines rather than JSON #48

timgdavies opened this issue Nov 30, 2017 · 2 comments

Comments

@timgdavies
Copy link
Contributor

Dealing with lots of data in single JSON objects can bring a big overhead.

We should consider using JSONLines, which allows easier stream processing, and use of command line tools (e.g. simply grep over files) to work with large collections data.

@timgdavies
Copy link
Contributor Author

timgdavies commented Feb 21, 2018

There will be a couple of issues to address here:

(1) Do we use JSON Lines , JSON-L or NDJSON

JSON-L appears to be an expired internet draft

NDJSON and JSONLines appear to be (almost?) the same, and there is discussion of whether they should be merged.

It looks like the difference is ultimately very limited, with perhaps something about whether comments or blank lines are accepted or not. Assuming, if there are differing opinions here, that there will be greater tooling support for files without comments or blank lines, that we should apply this constraints to BODS.

I can't see any of the above formats having made it into IANA Media Types list - and it looks at a quick glance like tooling that reads one should read the others.

This might mean this is more a case of referring to the most stable and clear spec, rather than choosing between different formats.

(2) What exactly a line should contain

For example, would each line still require a statementGroup as it's root element. Or could it even allow an array of statementGroups?

Would we still enforce the constraint in this case that a statementGroup should contain all relevant statements, or would we allow that a statementGroup might be repeated on multiple lines, with the merged set of these containing all relevant statements.

(3) Do we have JSON Lines instead of, or in addition to a full object serialisation?

E.g. we could say that publication can be either:

  • A JSON document, with a statementGroups array as the root element or
  • A JSON Lines document, where each line contains a statementGroup containing one or more statements.

@timgdavies
Copy link
Contributor Author

We allow for JSON Lines in our serialisation guidance. http://standard.openownership.org/en/0.2.0/schema/guidance/serialization.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant