Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add generator metadata to cdxj files produces by ipwb replay #108

Closed
machawk1 opened this issue Feb 15, 2017 · 3 comments
Closed

Add generator metadata to cdxj files produces by ipwb replay #108

machawk1 opened this issue Feb 15, 2017 · 3 comments
Assignees

Comments

@machawk1
Copy link
Member

@ibnesayeed Suggested generator as the cdxj key. This will allow us to migrate the format as we evolve how ipwb interacts with archival indexes. For example, when we change the key per #41, old versions of ipwb will not be able to read cdxj files generated by newer versions and newer versions will need to migrate old versions' cdxj files to work with whatever scheme we use in the future.

A first step in accomplishing this would be to provide the ipwb indexer version as the value of a "generator" key in produce cdxj's metadata.

@ibnesayeed I recall there being discussion of using a leading character other than "@" in cdxj metadata. What is the current one? Where was that discussion? What is the current "standard" was of conveying cdxj metadata?

@ibnesayeed
Copy link
Member

The current metadata indicator prefix character is ! as noted in oduwsdl/ORS#6.

There is another specification draft for the CDXJ, especially in context to Open Wayback. This is trying to standardizing various keys. This is still far from done. To track variations, it uses a version number specific to the CDXJ itself, independent of how it was generated.

@machawk1
Copy link
Member Author

machawk1 commented Feb 15, 2017

@ibnesayeed I had not even considered needing to specify the version of CDXJ that the document represents. Is there an example of how to define this in a reference CDXJ? As mentioned in other tickets, it would be good to have a CDXJ spec/EBNF that we could programmatically reference for validation (like XSLT?).

@ibnesayeed
Copy link
Member

ibnesayeed commented Feb 15, 2017

That document is still very controversial and version number is one of those things that is not agreed upon yet. The biggest complaint about that was the format as it does not conforms to the CDXJ/ORS line grammar and the other issue was about merging multiple documents with different minor version numbers.

For now, I would suggest you use something like this:

!context ["http://oduwsdl.github.io/contexts/cdxj"]
!meta {"generator": "IPWB v0.1-alpha", "created_at": "2017-01-15T13:15:52Z"}

@machawk1 machawk1 self-assigned this Feb 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants