This is the NDE Dataset Register, a service that helps users find and discover datasets.
Institutions (such as cultural heritage organizations) register their dataset descriptions with the NDE Dataset Register using its HTTP API. The Dataset Register builds an index by fetching, validating and periodically crawling dataset descriptions.
The HTTP API is documented at https://datasetregister.netwerkdigitaalerfgoed.nl/api.
See the Dataset Register Demonstrator, a client application for this repository’s HTTP API, for more background information (in Dutch).
- The application follows modern standards and best practices.
- The application uses Linked Data Platform (LDP) for HTTP operations.
- The application prefers JSON-LD as the data exchange format.
- The application uses established Linked Data vocabularies, including Schema.org and DCAT.
Dataset descriptions must adhere to the Requirements for Datasets. You can check validity using the validate API call.
To submit your dataset descriptions to the Dataset Register, use the datasets API call. URLs must be allowed before they can be added to the Register.
You can retrieve dataset descriptions registered by yourself and others from our triple store’s web interface.
Alternatively, use the SPARQL endpoint at https://triplestore.netwerkdigitaalerfgoed.nl/repositories/registry
directly.
For example using Comunica:
comunica-sparql sparql@https://triplestore.netwerkdigitaalerfgoed.nl/repositories/registry 'select * {?s a <http://www.w3.org/ns/dcat#Dataset> . ?s ?p ?o . } limit 100'
Or curl:
curl -H Accept:application/sparql-results+json --data-urlencode 'query=select * {?s a <http://www.w3.org/ns/dcat#Dataset> . ?s ?p ?o . } limit 100' https://triplestore.netwerkdigitaalerfgoed.nl/repositories/registry
If you want to automate dataset descriptions registrations by connecting your (collection management) application to the Dataset Register, please see the HTTP API documentation.
To run the application yourself (for instance if you’d like to contribute, which you’re very welcome to do), follow these steps. (As mentioned above, find the hosted version at https://datasetregister.netwerkdigitaalerfgoed.nl/api).
This application stores data in a GraphDB RDF store, so you need to have that running locally:
docker run -p 7200:7200 docker-registry.ontotext.com/graphdb-free:9.6.0-adoptopenjdk11
When GraphDB runs, you can start the application in development mode. Clone this repository and run:
npm install
npm run dev
To run the application in production, first compile and then run it. You may want to disable logging, which is enabled by default:
npm run compile
LOG=false npm start
You can configure the application through environment variables:
GRAPHDB_URL
: the URL at which your GraphDB instance runs (default:http://localhost:7200
).GRAPHDB_USERNAME
: if using authentication, your GraphDB username (default: empty).GRAPHDB_PASSWORD
: if using authentication, your GraphDB password (default: empty).LOG
: enable/disable logging (default:true
).CRAWLER_SCHEDULE
: a schedule in Cron format; for example0 * * * *
to crawl every hour (default: crawling disabled).REGISTRATION_URL_TTL
: if crawling is enabled, a registered URL’s maximum age (in seconds) before it is fetched again (default:86400
, so one day).
The tests are run automatically on CI.
To run the tests locally, clone this repository, then:
npm install
npm test
The crawler will periodically fetch registration URLs (schema:EntryPoint
) to update the dataset descriptions stored in the Dataset Register.
To enable the crawler, set the CRAWLER_SCHEDULE
configuration variable.
The crawler will then check all registration URLs according to that schedule to see if any of the URLs have become outdated.
A registration URL is considered outdated if it has been last read longer than
REGISTRATION_URL_TTL
ago (its schema:dateRead
is older).
If any outdated registration URLs are found, they are fetched and updated in the RDF Store.
Any URL registered by clients is added as a schema:EntryPoint
to the
Registrations graph.
Datasets are fetched from this URL on registration and when crawling it.
Property | Description |
---|---|
schema:datePosted |
When the URL was registered. |
schema:dateRead |
When the URL was last read by the application. The crawler updates this value when fetching descriptions. |
schema:status |
The HTTP status code last encountered when fetching the URL. |
schema:validUntil |
If the URL has become invalid, the date at which it did so. |
schema:about |
The set of schema:Dataset s that the URL contains. The crawler updates this value when fetching descriptions. |
Each dataset that is found at the schema:EntryPoint
registration URL gets added as a
schema:Dataset
to the
Registrations graph.
Property | Description |
---|---|
schema:dateRead |
When the dataset was last read by the application. |
schema:subjectOf |
From which registration URL the dataset was read. |
When a dataset’s RDF description is fetched and validated, it is added as a dcat:Dataset
to its own graph. The URL
of the graph corresponds to the dataset’s IRI.
If the dataset’s description is provided in Schema.org rather than DCAT, the description is first converted to DCAT. The ‘Based on’ column shows the corresponding Schema.org property. See the Requirements for Datasets for more details.
Property | Description | Based on |
---|---|---|
dct:title |
Dataset title. | schema:name |
dct:alternative |
Dataset alternate title. | schema:alternateName |
dct:identifier |
Dataset identifier. | schema:identifier |
dct:description |
Dataset description. | schema:description |
dct:license |
Dataset license. | schema:license |
dct:language |
Language(s) in which the dataset is available. | schema:inLanguage |
dcat:keyword |
Keywords or tags that describe the dataset. | schema:keywords |
dcat:landingPage |
URL of a webpage where the dataset is described. | schema:mainEntityOfPage |
dct:source |
URL(s) of datasets the dataset is based on. | schema:isBasedOn |
dct:created |
Dataset creation date. | schema:dateCreated |
dct:issued |
Dataset publication date. | schema:datePublished |
dct:modified |
Dataset last modification date. | schema:dateModified |
owl:versionInfo |
Dataset version | schema:version |
dct:creator |
Dataset creator. | schema:creator |
dct:publisher |
Dataset publisher. | schema:publisher |
dcat:distribution |
Dataset distributions. | schema:distribution |
The objects of both the dct:creator
and dct:publisher
dataset have type foaf:Organization
.
If the dataset’s organizations are provided in Schema.org rather than DCAT, the organizations are first converted to DCAT. The ‘Based on’ column shows the corresponding Schema.org property. See the Requirements for Datasets for more details.
Property | Description | Based on |
---|---|---|
foaf:name |
Organization name. | schema:name |
The objects of dcat:distribution
dataset properties have type dcat:Distribution
.
If the dataset’s distributions are provided in Schema.org rather than DCAT, the distributions are first converted to DCAT. The ‘Based on’ column shows the corresponding Schema.org property. See the Requirements for Datasets for more details.
Property | Description | Based on |
---|---|---|
dcat:accessURL |
Distribution URL. | schema:contentUrl |
dcat:mediaType |
Distribution’s IANA media type. | schema:fileFormat |
dct:format |
Distribution content type (e.g. text/turtle ). |
schema:encodingFormat |
dct:issued |
Distribution publication date. | schema:datePublished |
dct:modified |
Distribution last modification date. | schema:dateModified |
dct:description |
Distribution description. | schema:description |
dct:language |
Distribution language. | schema:inLanguage |
dct:license |
Distribution license. | schema:license |
dct:title |
Distribution title. | schema:name |
dcat:byteSize |
Distribution’s download size in bytes. | schema:contentSize |
A registration URL must be on a domain that is allowed before it can be added to the Register. Allowed domains are administered in the https://data.netwerkdigitaalerfgoed.nl/registry/allowed_domain_names RDF graph.
To add a URL:
INSERT DATA {
GRAPH <https://data.netwerkdigitaalerfgoed.nl/registry/allowed_domain_names> {
[] <https://data.netwerkdigitaalerfgoed.nl/allowed_domain_names/def/domain_name> "your-domain.com" .
}
}