Skip to content

CRDC-H model in LinkML, developed by the Center for Cancer Data Harmonization (CCDH)

License

Notifications You must be signed in to change notification settings

cancerDHC/ccdhmodel

Repository files navigation

CRDC-H Model in LinkML

This repository stores the LinkML representation of the CRDC Harmonized Data Model (CRDC-H) produced by the Center for Cancer Data Harmonization (CCDH).

This repository includes the LinkML model itself (in YAML format) as well as a number of artifacts produced automatically by LinkML, including a JSON Schema, JSON-LD context, a GraphQL description, a CSV description and ShEx validation shapes.

Model documentation in Markdown can also be generated for this repository, and is currently hosted on GitHub Pages at https://cancerdhc.github.io/ccdhmodel/. A set of Python Data Classes can also be generated and are available for use. Examples of their use are available in the Example Data repository.

Setup

Generation of LinkML artifacts

All artifacts can be generated by running make in this repository. make clean will delete generated existing artifacts, allowing them to be regenerated. This Makefile uses Poetry to manage dependencies.

We use mike to publish documentation to GitHub Pages. Use mike deploy [version] -p to push a new version of the documentation to Google Pages (via the gh-pages branch). mike deploy [version] latest -p -u can be used to indicate that the uploaded version should be used as the latest version, which will be displayed by default.

Automated generation of YAML

The CRDC-H model is currently in development on a Google Sheet, which is converted into a LinkML schema in ./model/schema/crdch_model.yaml. If you would like to use the latest, in-development version of the schema as described in Google Sheets, you will need to use the sheet2linkml package to regenerate this schema to regenerate this file by running make generate-model.

In order to read a Google Sheet, sheet2linkml will need access to the Google Sheets API in the Google Developers Console. Detailed instructions and screenshots are available from the pygsheets documentation, which is the package sheet2linkml uses to access Google Sheets. Save the file as google_api_credentials.json in the root directory of this project. The first time you run make generate-model, you will see a browser page asking you to log in. Follow the instructions. The script will download a token and store it locally. You will not need to log in when rerunning this command.