Ship citation info with our packages #3634
Replies: 3 comments 4 replies
-
Thanks for the overview!
I assume this means users can access this information with something like ? >>> import scippneutron as scn
>>> scn.__citation__
'https://doi.org/10.5281/zenodo.7760280' I do like the idea of keeping everything in a citation.cff as we can have a single source of truth and the .cff file can be programtically read to inject this information into |
Beta Was this translation helpful? Give feedback.
-
I tried using the action https://github.com/marketplace/actions/zenodo-upload to upload a test project to the Zenodo sandbox: https://github.com/jl-wynen/test_software_citation But it fails and I don't know why. In general, the action does not see a lot of activity or stars. So we may not be able to rely on it. While looking into this, I came across the CodeMeta Project which defines an alternative |
Beta Was this translation helpful? Give feedback.
-
If Zenodo doesn't work out for us, we could also look into the Software Heritage project for uploading our software. They have our code anyway: swh:1:dir:2143375f73b6417410f348a3251c9c6a062ae060 But they don't seem to mint DOIs. Instead they have their own identifier: SWHID (see the link above for an example). So this makes it more difficult to cite the software until (and if) publishers offer the option to cite SWHIDs. I also don't know how to keep a CITATION.cff or codemeta.json file up to date with Software Heritage. But at least, Software Heritage supports codemeta.json. |
Beta Was this translation helpful? Give feedback.
-
Since we publish scientific packages, it makes sense to make those packages easily citable. We currently have DOIs for all packages and refer to them in our docs. This is good but requires users to manually search for it in the docs. We can expose this information in more places to make it easier to find. And ideally also programmatically findable and introspectable.
Previous work
There is no standard or convention in Python.
Some (many) packages provide a
__citation__
attribute: https://github.com/search?q=__citation__&type=Code. The contents of that attribute are diverse and range from a simple DOI to a full BibTeX entry. This seems to generally be borne out of astropy which provides a BibTex entry.Different attempts have been made but none got anywhere. It looks like they weren't rejected, in fact responses are generally positive. They just seem to have run out of steam.
There are also some packages that aim to help with citation:
pyproject.toml
file.Citation data can be provided by a separate file. GitHub can handle many such files and display a link on a project's home page: https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-citation-files.
There is a dedicated format for storing citation information that is both human readable and easy to parse by a program: https://citation-file-format.github.io/
Assessment
It is unfortunate that there is no consensus within the Python community. The closest we have is the
__citation__
attribute. But its diverse content makes it difficult to handle. In particular, BibTeX entries are difficult to parse.3rd party packages can help but adoption is low and relying on them presents a risk.
Providing a citation file in the repo is good. But it does not address the introspection issue. Even if we ship it with our packages, it is difficult to handle. Citing a package should be as easy as possible to not scare away users.
Ideas
We should definitely provide citation files in our projects. Even if for no other reason than to provide one more way for people to find that information.
Further, we can make our own in-house convention1 for storing citation info in packages. We can start with storing a DOI in
__citation__
as there is precedent for that. This would already cover some use cases.If we want more, we can package a CITATION.cff file. These files appear to be YAML files even though this isn't stated in the docs. So they are easy to parse if we want to access the info programmatically.
Example CFF file for ScippNeutron
Here is an incomplete(!) CFF file for ScippNeutron for reference:
Footnotes
By 'in-house' I mean ideally all of DMSC, not just Scipp. ↩
Beta Was this translation helpful? Give feedback.
All reactions