This is a collection of datasets created within the Collaborative Research Center 1451. The "superdataset" contains information about datasets stored in various locations. This information can be used to obtain programmatic, fine-grained data access to the level of individual files, depending on the level of detail provided by dataset authors, storage location, and access restrictions.
This repository provides dataset information in two formats:
- DataLad datasets (Git submodules)
- DataLad-tabby dataset description (tabular files with dataset- and file-level metadata)
This repository is a DataLad dataset. It provides fine-grained data access down to the level of individual files, and allows for tracking future updates. In order to use this repository for data retrieval, DataLad is required. It is a free and open source command line tool, available for all major operating systems, and builds up on Git and git-annex to allow sharing, synchronizing, and version controlling collections of large files. You can find information on how to install DataLad at handbook.datalad.org/en/latest/intro/installation.html.
A DataLad dataset can be cloned
by running
datalad clone <url>
Once a dataset is cloned, it is a light-weight directory on your local machine. At this point, it contains only small metadata and information on the identity of the files in the dataset, but not actual content of the (sometimes large) data files.
DataLad datasets can contain other datasets, so called subdatasets. If you clone the top-level dataset, subdatasets do not yet contain metadata and information on the identity of files, but appear to be empty directories.
The SFB 1451 superdataset contains one subdataset per project, and these can have further subdatasets. In order to retrieve file availability metadata in subdatasets, run
datalad get -n <path/to/subdataset>
After cloning a (sub)dataset, you can retrieve file contents, if they are stored in a location available to you, by running:
datalad get <path/to/directory/or/file>
These dataset descriptions are stored in the .datalad/tabby
directory. Each subfolder contains a collection of files with
dataset- and file-level metadata. The latter can iclude information
of file identity (checksum) and availability (URL). See DataLad
Tabby documentation and
CRC1451-specific format
description.