Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate BASE Compliant OAI-PMH XML files from Wikidata entries #56

Open
bootsa opened this issue Mar 21, 2022 · 2 comments
Open

Generate BASE Compliant OAI-PMH XML files from Wikidata entries #56

bootsa opened this issue Mar 21, 2022 · 2 comments
Assignees
Labels
exploratory First trials of software, system, processes oai-pmh things related to the OAI-PMH protocol

Comments

@bootsa
Copy link

bootsa commented Mar 21, 2022

With the simple OAI-PMH server (#52) working on toolforge (#53), generating compliant XML files from Wikidata entries is next.

Two possible path's seem worthy of initial exploration:

  1. using CitationJS to generate a JSON object from Wikidata and then producing XML output using either:
    1. creating a custom CitationJS output plugin
    2. XMLBuilder2
    3. xml-js
  2. using XSLT to transform the RDF record of a Wikidata entity (encoded as XML) into a BASE compliant XML file

Creating an XSLT transform (2) would probably be more useful in creating a generalised solution for Wikibase (#14) though involves traversing multiple interlinked nodes which could be tricky and might be more brittle than (1).

As we have established that CitationJS generates a well formed JSON object from Wikidata (including some clean up of Author names, etc) (1) might be a quicker, though more specific to solely our project, approach.

Thus I shall prioritise investigating (1) first.

@bootsa bootsa added oai-pmh things related to the OAI-PMH protocol exploratory First trials of software, system, processes labels Mar 21, 2022
@bootsa bootsa self-assigned this Mar 21, 2022
@Daniel-Mietchen Daniel-Mietchen added this to the Q-03--2022-05-31 milestone Mar 22, 2022
@bootsa
Copy link
Author

bootsa commented Mar 23, 2022

Created a processing pipeline to automate the process using Github Actions.

Works well when running locally and reducing the number of entities pulled from the wikidata api to around 75.

Need to implement some kind of rate limiting.

@bootsa
Copy link
Author

bootsa commented Mar 24, 2022

Rudimentary rate limiting implemented - InvasionBiologyHypotheses/enKORE-corpus-processor#1

Technical issues will be documented in the repo's issue tracker.

Leaving this open for discussion about the structure of the XML output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exploratory First trials of software, system, processes oai-pmh things related to the OAI-PMH protocol
Projects
None yet
Development

No branches or pull requests

2 participants