Government of Canada CKAN Extension - Extension à CKAN du Gouvernement du Canada
Features:
- Forms and Validation for GoC Metadata Schema
Installation:
- Use open-data fork of CKAN, branch canada-v2.9
From a clean database you must run:
ckanapi load organizations -I transitional_orgs.jsonl
Once to create the organizations this extension requires before loading any data.
canada_forms
dataset forms for Open Canada metadata schema
canada_public
base and public facing Open Canada templates (requires
canada_forms
)
canada_internal
templates for internal site and registration (requires
canada_forms
and canada_public
)
canada_datasets
package processing between CKAN and Solr (requires
Scheming extension
, see below)
canada_security
extra security processing (requires
Security extension
, see below)
Project | Github group/repo | Our Contribution |
---|---|---|
CKAN | open-data/ckan | important contributor |
canada extension | open-data/ckanext-canada | sole maintainer |
Scheming extension | ckan/ckanext-scheming | primary |
Fluent extension | ckan/ckanext-fluent | primary |
ckanapi | ckan/ckanapi | primary |
ckanext-googleanalytics | ofkn/ckanext-googleanalytics | user |
Recombinant extension | open-data/ckanext-recombinant | sole maintainer |
Cloudstorage extension | open-data/ckanext-cloudstorage | original author, user |
Security extension | open-data/ckanext-security | minor contributor |
Xloader extension | open-data/ckanext-xloader | user, minor customizations |
Validation extension | open-data/ckanext-validation | user, minor customization |
The CKAN ini file needs the following settings for the registry server:
ckan.plugins = dcat dcat_json_interface googleanalytics canada_forms canada_internal
canada_public datastore recombinant
canada_datasets scheming_organizations canada_security fluent
For the public server use only:
ckan.plugins = dcat dcat_json_interface googleanalytics canada_forms
canada_public canada_datasets scheming_organizations canada_security fluent
canada.portal_url = http://myserver.com
adobe_analytics.js = //path to the js file needed to trigger Adobe Analytics
Both servers need:
licenses_group_url = file://<path to this extension>/ckanext/canada/public/static/licenses.json
ckan.auth.create_dataset_if_not_in_organization = false
ckan.activity_streams_email_notifications = false
ckan.datasets_per_page = 10
googleanalytics.id = UA-1010101-1 (your analytics account id)
googleanalytics.account = Account name (i.e. data.gov.uk, see top level item at https://www.google.com/analytics)
# Internationalisation Settings
ckan.locales_offered = en fr
For the use of the Portal or Registry sites, the installation of the WET-BOEW theme extension isn't required anymore, because the templates it provides are now included in the canada_public
and canada_internal
plugins. All what's needed is to add the resource files:
Set wet_boew.url
(in your .ini file) to the root URL where the WET resources are hosted:
Example:
wet_boew.url = http://domain.com/wet-boew/v4.0.31
-
Extract the WET 4.0.x core CDN and desired themes cdn package to a folder::
export WET_VERSION=v4.0.31 export GCWEB_VERSION=v5.1 mkdir wet-boew && curl -L https://github.com/wet-boew/wet-boew-cdn/archive/$WET_VERSION.tar.gz | tar -zx --strip-components 1 - -directory=wet-boew mkdir GCWeb && curl -L https://github.com/wet-boew/themes-cdn/archive/$GCWEB_VERSION-gcweb.tar.gz | tar -zx --strip-components 1 --directory=GCWeb
-
Set the
extra_public_paths
settings to that path where the files are extracted:Example:
extra_public_paths = /home/user/wet-boew/v4.0.31
Set wet_theme.geo_map_type
to indicate what style of WET Geomap widget to use. Set this to either 'static' or 'dynamic':
wet_theme.geo_map_type = static
This extension uses a custom Solr schema based on the ckan 2.8 schema. You can find the schema in the root directory of the project. Overwrite the default CKAN Solr schema with this one in order to enable search faceting over custom metadata fields.
You will need to rebuild your search index using:
ckan search-index rebuild
To update strings in the translation files:
python setup.py extract_messages
Extract messages will gather gettext
calls in Python, JS, and Jinja2 files. It will also use th custom PD extractor to get specific strings for the Recombinant YAML files.
To update the English and French catalog files:
python setup.py update_catalog
This will update both English and French PO files. You will need to confirm that there are NO fuzzy
translations in either of the PO files.
After updating the PO files and ensuring that there are no fuzzies, you may commit the two PO files along with the POT file.
Each time you install or update this extension you need to install the updated translations by running:
python setup.py compile_catalog
First extract the current version of the data from the registry for each table, e.g for contracts migrations we need contracts.csv and contracts-nil.csv:
mkdir migrate-contracts-2020-08-07 # a new working directory
paster --plugin=ckanext-recombinant recombinant combine contracts contracts-nil -d migrate-contracts-2020-07-08 -c $REGISTRY_INI
Remove the old tables from the database so that tables with the new schema can be created when we load the migrated data, deleting contracts will delete both the contracts and contracts-nil tables:
paster --plugin=ckanext-recombinant recombinant delete contracts -c $REGISTRY_INI
Deploy the new version of the code, for our prod registry site that would be:
fab pull registry
Migrate the data with the script deployed as part of the changes. The output csv files need to have the same names as the input files for loading to work in the next step.
cd migrate-contracts-2020-08-07
mkdir new err # for migrated data and error logs
.../ckanext-canada/bin/migrate/migrate_contracts_2019_11.py <contracts.csv >new/contracts.csv 2>err/contracts.err
.../ckanext-canada/bin/migrate/migrate_contracts_nil_2019_11.py <contracts-nil.csv >new/contracts-nil.csv 2>err/contracts-nil.err
ls -al err
Records removed in the data migration will appear in the error logs. Also check that the migrated data is comparable in size to the original data in just in case something interrupted the migration.
Load the migrated data back into the registry, capturing any validation errors:
paster --plugin=ckanext-recombinant recombinant load-csv new/contracts.csv new/contracts-nil.csv -c $REGISTRY_INI 2>err/load-contracts.err
ls -al err/load-contracts.err
If there are validation errors or records removed during migration, consider revising the migration script to allow more records to be migrated without manual intervention.
Inform the business owner or source departments about all records that were removed as part of the final data migration or due to validation errors so that data can be corrected and re-imported.
- ckanext-canada (this repository)
- PD yaml files are read by ckanext-recombinant and used to generate most of the pages, tables, triggers and metadata shown.
- add+edit forms use form snippets from ckanext-scheming and validation enforced by datastore triggers. They are currently part of the ckanext-canada extension but should be moved into ckanext-recombinant or another reusable extension once the trigger-validation pattern becomes standardized
- datatable preview is part of ckanext-canada because this code predates the datatables view feature that is now part of ckan. It should be removed from here so we can use the ckan datatable view instead
- filter scripts cover all the business logic required to "clean" PD data before it is released to the public. a Makefile is used to extract raw CSV data, make backups, run these filters and publish CSV data
- ckanext-recombinant
- XLSX data dictionary
- reference lists
- API docs
- schema json
- delete form
- XLSX template UL/DL
- combine command
- ckan
- datastore API
- datastore triggers
- datastore tables
- dataset metadata
- CSV files
- raw CSV data
- nightly backups
- published CSV data
- deplane
- data element profile
- ogc_search
- advanced search
The "Suggest a Dataset" feature integrates CKAN with both Drupal and Django Search.
-
Submit Suggestion: The process begins with external public users submitting the Suggest a Dataset form on Drupal. As a result of submission, a new node of content type
suggested_dataset
is created by the webform handler. -
Moderate Suggestion: A registered user with role
comment_moderator
on Drupal updates and/or adds translation to the suggested dataset. The user may also choose to delete the suggestion if it is not relevant or a duplicate. Once translation is added, the user publishes the suggestion. -
Nightly Cron: A number of cron jobs are responsible for cross-integration.
- Suggested datasets are exported from Drupal as a CSV using a cron hook
- The exported CSV is used to load suggested datasets in the Registry
- Next, the status updates for all suggested datasets is exported from the Registry in a JSON file using
ckanapi search datasets q=type:prop include_private=true
- New status updates are compared with existing status updates in the Solr index and emails are sent out to external public users for any new status updates
- New status updates exported as JSON are loaded into to the Solr index
-
Provide status updates: Registry users can view a list of suggested datasets for their organization and can provide an update on progress.
-
Search suggested datasets: External public users can search a list of suggested datasets. Each suggested dataset has a details page which shows all the status updates for a suggestion. Users can vote on a suggestion or add comments. The votes and comments are stored in Drupal.