This software tool is designed to enable the curatorial review of datasets that are deposited into the University of Arizona Research Data Repository (ReDATA). It follows a workflow that was developed by members of the Research Data Services Team at the University of Arizona Libraries. The software has a number of backend features, such as:
- Retrieving private datasets from the Figshare API that are undergoing curatorial review
- Constructing a README.txt file based on information from the deposit's metadata and information provided by the researchers using a Qualtrics form that walks the users through additional information
- Retrieving a Deposit Agreement Form from Qualtrics, which is a requirement for all ReDATA deposits
- Retrieving a copy of Curatorial Review Report template (MS-Word) for ReDATA curators to complete.
- Creating a hierarchical folder structure the supports library preservation and archive
- Supporting ReDATA curators with access and workflow management through standard UNIX commands
These backend services ingest the datasets and accompanying files (described above) onto a curatorial "staging" server with attached storage to enable the full curatorial review procedure.
Although not available yet, a web application will serve as the front-end framework to allow for easy navigation through the curatorial review. Also, integration with the Trello REST API is another feature to further assist with the curatorial review process.
These instructions will have the code running on your local or virtual machine.
You will need the following to have a working copy of this software. See installation steps: Note: some of the dependencies will be updated.
- Python (>=v3.7.9)
figshare
- ReDATA's forked copy of cognoma's figsharepandas
(1.2.3)requests
(2.22.0)numpy
(1.20.0)jinja2
(2.11.2)tabulate
(0.8.3)html2text
(2020.1.16)
First, install a working version of Python (>=3.7.9). We recommend using the Anaconda package installer.
After you have Anaconda installed, you will want to create a separate conda
environment
and activate it:
$ (sudo) conda create -n curation python=3.7
$ conda activate curation
With the activated conda
environment, next clone the
UA Libraries' forked copy of figshare
and install with the setup.py
script:
(curation) $ cd /path/to/parent/folder
(curation) $ git clone https://github.com/UAL-RE/figshare.git
(curation) $ cd /path/to/parent/folder/figshare
(curation) $ (sudo) python setup.py develop
Then, clone this repository (LD-Cool-P
) into the parent folder and install with the setup.py
script:
(curation) $ cd /path/to/parent/folder
(curation) $ git clone https://github.com/UAL-RE/LD-Cool-P.git
(curation) $ cd /path/to/parent/folder/LD-Cool-P
(curation) $ (sudo) python setup.py develop
This will automatically installed the required pandas
, requests
, numpy
,
jinja2
, tabulate
, and html2text
packages.
You can confirm installation via conda list
(curation) $ conda list ldcoolp
You should see that the version is 1.2.0
.
Configuration settings are specified through the --config
flag in the scripts
described below. For example:
--config ldcoolp/config/myconfig.ini
Note that in the init.py, there's a default setting:
config_dir = path.join(co_path, 'config/')
main_config_file = 'default.ini'
config_file = path.join(config_dir, main_config_file)
This is used when a configuration file is not provided in all modules and functions that require settings.
A template for this configuration file is provided.
There are a number of config sections, including figshare
, curation
, and qualtrics
.
The most important settings to define are those populated with ***override***
.
Additional settings to change are figshare
stage
flag, and curation
source
.
Since the configuration settings will continue to evolve, we refer users to the
documented information provided.
These configurations are read in through the config
sub-package.
This section is under construction
There are or will be a number of ways to execute the software.
There are two ways to execute the software using the command-line. The first is to use ipython/python:
article_id = 13456789
from ldcoolp.curation import main
main.workflow(article_id)
Here the article_id
is the unique ID that Figshare provides for any article.
The above script will perform the prerequisite steps of:
- Retrieving the data using the Figshare API
- Retrieve a copy of the curatorial review report
- Attempt to retrieve the deposit agreement form through the Qualtrics API or provide a custom link to provide to the depositor
- Generate a README.txt file
- Follow our curation workflow by relocating the content from
1.ToDo
to the2.UnderReview
Another command-line approach is using the python script called prereq_script
:
(curation) $ ./ldcoolp/scripts/prereq_script \
--config ldcoolp/config/default.ini --article_id 12345678
Additional python scripts are available to
-
Retrieve the list of pending curation and their
article_id
:(curation) $ ./ldcoolp/scripts/get_curation_list \ --config ldcoolp/config/default.ini
-
Retrieve the Qualtrics URLs to provide to an author/depositor:
(curation) $ ./ldcoolp/scripts/generate_qualtrics_link \ --config ldcoolp/config/default.ini --article_id 12345678
-
Update the README.txt file for changes to metadata information:
(curation) $ ./ldcoolp/scripts/update_readme \ --config ldcoolp/config/default.ini --article_id 12345678
-
Move between curation stages (either
next
,back
, or topublish
):(curation) $ ./ldcoolp/scripts/perform_move --direction next \ --config ldcoolp/config/default.ini --article_id 12345678 (curation) $ ./ldcoolp/scripts/perform_move --direction back \ --config ldcoolp/config/default.ini --article_id 12345678 (curation) $ ./ldcoolp/scripts/perform_move --direction publish \ --config ldcoolp/config/default.ini --article_id 12345678
We use SemVer for versioning. For the versions available, see the tags on this repository.
Releases are auto-generated using this GitHub Actions script
following a git tag
version.
See the CHANGELOG for all changes since project inception
- UAL-RE University of Arizona Libraries, Research Engagement
- Current authors: Fernando Rios, Yan Han
- Past author: Chun Ly, Ph.D. (@astrochun)
See also the list of contributors who participated in this project.
This project is licensed under the MIT License - see the LICENSE file for details.