Skip to content
jbrown-xentity edited this page Apr 26, 2022 · 6 revisions

This service is deprecated: https://data.gov/meta/data-gov-to-discontinue-csw-service-on-may-1/

The CSW (Catalog Service for Web) app is powered by PyCSW. The implementation is based on the CSW standard for geospatial metadata. It contains the same datasets in catalog.data.gov but in ISO formats (ISO 19115) which CKAN does not support very well.

A nightly job imports all datasets from CKAN. Historically, it has been treated as a component of Catalog, but it should function as its own application with a dependency on Catalog.

CSW comes in two flavors, csw-all (/csw-all) and csw-collection (/csw). csw-all exposes all 3+ million datasets while csw-collection only exposes the top-level Collections.

Environments

Instance Url
Production catalog.data.gov/csw
Staging catalog-datagov.dev-ocsit.bsp.gsa.gov/csw
sandbox catalog.sandbox.datagov.us/csw

Dependencies

Code:

  • pycsw

Services:

Working with CSW

Example queries

Query dataset by Id

Query datasets by title

Database intitialization

These steps vary depending on the environment. Use with a grain of salt.

createuser -h $DB_HOST -U $DB_ADMIN_USER -S -D -R -P pycsw
psql -h $DB_HOST -U $DB_ADMIN_USER -c "GRANT pycsw to $DB_ADMIN_USER;" postgres
createdb -h $DB_HOST -U $DB_ADMIN_USER -O pycsw pycsw -E utf-8
psql -h $DB_HOST -U $DB_ADMIN_USER -d pycsw -c 'CREATE EXTENSION postgis;'
sudo su -l pycsw
cd current
. .venv/bin/activate
pycsw-ckan.py -c setup_db -f /etc/pycsw/pycsw-collection.cfg

Alerts

pycsw-collection not running

New Relic alert alarms when the pycsw-collection or pycsw-all processes are not running. Run the pycsw.yml playbook the jumpbox.

$ ansible-playbook pycsw.yml
Clone this wiki locally