Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extend to biorxiv #5

Open
yannickspill opened this issue Oct 24, 2017 · 3 comments
Open

extend to biorxiv #5

yannickspill opened this issue Oct 24, 2017 · 3 comments

Comments

@yannickspill
Copy link

How hard would it be to extend this work to data on biorXiv, the archive for biology? And, even better, how hard would it be to connect the biorxiv papers to the arXiv ones?

@rknegjens
Copy link
Member

rknegjens commented Oct 25, 2017

The hardest part would be building the citation network. We did this for the arXiv by parsing the source files (mostly TeX) for their references and then attaining a high match rate between this reference info and the referred to arXiv papers. In the many cases where no arXiv or DOI identifiers are available the next best thing is the journal information, which requires specialized regex to parse well. High energy physics has the best representation in Paperscape partly because we were both working in the field at the time and were most familiar with its journals (but also because arXiv usage and referencing is generally better in this field).

Generating a paperscape-like map from another citation network is relatively straight forward,

@sfrosenb
Copy link

sfrosenb commented Apr 9, 2019

Hey I would help with the extension to biorxiv if you are interested. I am a CS PhD student but I work with computational models of ecological systems so I might be able to help for the same reason that high energy physics has the best representation in Paperscape

@dpgeorge
Copy link
Member

@sfrosenb if you want to help with biorxiv that would be great. But note that the maintainers of this project (@rknegjens and myself) are mostly busy with other things now so won't have much time to help out here.

As mentioned above, the main thing to do is to extract the citation network from biorxiv. Do they provide such a thing already? Do they provide source code or downloadable forms of their paper database?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants