This repository contains docker-compose
files to build and deploy CDCS containers.
Install Docker first, then install Docker Compose.
Update the values in the .env
file:
cd build
vim .env
Below is the list of environment variables to set and their description.
Variable | Description |
---|---|
PROJECT_NAME | Name of the CDCS/Django project to build (e.g. mdcs, nmrr) |
IMAGE_NAME | Name of the image to build (e.g. mdcs, nmrr) |
IMAGE_VERSION | Version of the image to build (e.g. latest, 2.10.0) |
CDCS_REPO | URL of the CDCS repository to clone to build the image (e.g. https://github.com/usnistgov/mdcs.git) |
BRANCH | Branch/Tag of the repository to pull to build the image (e.g. master, 2.10.0) |
PIP_CONF | Pip configuration file to use to build the image |
PYTHON_VERSION | Version of the Python image to use as a base image for the CDCS image |
cdcs-docker/build$ docker-compose build --no-cache
Different images may be needed for different deployment contexts (development/CI/production, docker-compose/K8s, ...).
The custom
build configuration allows adding the following elements
to an existing CDCS image, by editing the following files:
packages.txt
: install additional linux packages (see example_packages.txt),requirements.txt
: install additional python packages (see example_requirements.txt),settings.py
: add settings to settings.py file (see example_settings.py).
To configure the image to build, edit the following variables in the .env
file.
Variable | Description |
---|---|
BASE_IMAGE_NAME | Name of the base image (e.g. mdcs, nmrr) |
BASE_IMAGE_VERSION | Version of the base image (e.g. latest, 2.10.0) |
PROJECT_NAME | Name of the CDCS/Django project to build (e.g. mdcs, nmrr) |
IMAGE_NAME | Name of the image to build (e.g. mdcs) |
IMAGE_VERSION | Version of the image to build (e.g. latest, 2.10.0) |
Then build the custom image.
cd build/custom
vim .env
vim packages.txt
vim requirements.txt
vim settings.py
docker-compose build --no-cache
Update the values in the .env
file:
cd deploy
vim .env
Below is the list of environment variables that can be set and their
description. Commented variables in the .env
need to be uncommented
and filled.
Variable | Description |
---|---|
PROJECT_NAME | Name of the CDCS/Django project to deploy (e.g. mdcs, nmrr) |
IMAGE_NAME | Name of the CDCS image to deploy (e.g. mdcs, nmrr) |
IMAGE_VERSION | Version of the CDCS image to deploy (e.g. latest, 2.10.0) |
HOSTNAME | Hostname of the server (e.g. for local deployment, use the machine's IP address xxx.xxx.xxx.xxx) |
SERVER_URI | URI of server (e.g. for local deployment, http://xxx.xxx.xxx.xxx) |
ALLOWED_HOSTS | Comma-separated list of hosts (e.g. ALLOWED_HOSTS=127.0.0.1,localhost), see Allowed Hosts |
SERVER_NAME | Name of the server (e.g. MDCS) |
SETTINGS | Settings file to use during deployment (more info in the Settings section) |
SERVER_CONF | Mount appropriate nginx file (e.g. default for http deployment using a uWSGI UNIX socket, https to enable SSL, or gunicorn_http[s] . The protocol of the SERVER_URI should be updated accordingly) |
MONGO_PORT | MongoDB Port (default: 27017) |
MONGO_ADMIN_USER | Admin user for MongoDB (should be different from MONGO_USER ) |
MONGO_ADMIN_PASS | Admin password for MongoDB |
MONGO_USER | User for MongoDB (should be different from MONGO_ADMIN_USER ) |
MONGO_PASS | User password for MongoDB |
MONGO_DB | Name of the Mongo database (e.g. cdcs) |
POSTGRES_PORT | Postgres Port (default: 5432) |
POSTGRES_USER | User for Postgres |
POSTGRES_PASS | User password for Postgres |
POSTGRES_DB | Name of the Postgres database (e.g. cdcs) |
REDIS_PORT | Redis Port (default: 6379) |
REDIS_PASS | Password for Redis |
DJANGO_SECRET_KEY | Secret Key for Django (should be a "large random value") |
NGINX_PORT_80 | Expose port 80 on host machine for NGINX |
NGINX_PORT_443 | Expose port 443 on host machine for NGINX |
MONGO_VERSION | Version of the MongoDB image |
REDIS_VERSION | Version of the Redis image |
POSTGRES_VERSION | Version of the Postgres image |
NGINX_VERSION | Version of the NGINX image |
WEB_SERVER | Web server for the CDCS (e.g. uwsgi , gunicorn ) |
PROCESSES | Number of uWSGI processes (default --processes=8 ) / Gunicorn workers to start (default workers=cpu_count() * 2 + 1 ) |
THREADS | Number of uWSGI/Gunicorn threads per process/worker (default 8) |
MONITORING_SERVER_URI | (optional) URI of an APM server for monitoring |
A few additional environment variables are provided to the CDCS
container. The variables below are computed based on the values of
other variables. If changed, some portions of the docker-compose.yml
might need to be updated to stay consistent.
Variable | Description |
---|---|
DJANGO_SETTINGS_MODULE | DJANGO_SETTINGS_MODULE (set using the values of PROJECT_NAME and SETTINGS ) |
MONGO_HOST | Mongodb hostname (set to ${PROJECT_NAME}_cdcs_mongo ) |
POSTGRES_HOST | Postgres hostname (set to ${PROJECT_NAME}_cdcs_postgres ) |
REDIS_HOST | REDIS hostname (set to ${PROJECT_NAME}_cdcs_redis ) |
Configure SAML2 authentication by providing values for the following environment variables in the saml2/.env
file.
See saml2/.env.example
for an example of SAML2 configuration with a Keycloak server.
Variable | Description |
---|---|
ENABLE_SAML2_SSO_AUTH | Enable SAML2 authentication (e.g. ENABLE_SAML2_SSO_AUTH=True ) |
SAML_ATTRIBUTE_MAP_DIR | Points to a directory which has the attribute maps in Python modules (see attribute_map_dir) |
SAML_ATTRIBUTES_MAP_IDENTIFIER | SAML attribute map supported name-format (see attribute_map_dir) |
SAML_ATTRIBUTES_MAP_UID | SAML attribute mapping to uid |
SAML_ATTRIBUTES_MAP_UID_FIELD | SAML attribute mapping uid field name |
SAML_ATTRIBUTES_MAP_EMAIL | SAML attribute mapping to email |
SAML_ATTRIBUTES_MAP_EMAIL_FIELD | SAML attribute mapping email field name |
SAML_ATTRIBUTES_MAP_CN | SAML attribute mapping to common name |
SAML_ATTRIBUTES_MAP_CN_FIELD | SAML attribute mapping common name field name |
SAML_ATTRIBUTES_MAP_SN | SAML attribute mapping to surname |
SAML_ATTRIBUTES_MAP_SN_FIELD | SAML attribute mapping surname field name |
SAML_DJANGO_USER_MAIN_ATTRIBUTE | Django field to use to find user and create session (see user attributes and account linking) |
SAML_USE_NAME_ID_AS_USERNAME | Use SAML2 name id as username (see user attributes and account linking) |
SAML_CREATE_UNKNOWN_USER | Create user if not found in Django database (see user attributes and account linking) |
SAML_KEY_FILE | Path to private key (see key_file) |
SAML_CERT_FILE | Path to the public key (see cert_file) |
SAML_METADATA_REMOTE_URL | Url to remote SAML metadata file (see metadata) |
SAML_METADATA_REMOTE_CERT | (Optional) Certificate for the remote (see metadata) |
SAML_METADATA_LOCAL | Path to local SAML metadata file (see metadata) |
SAML_XMLSEC_BIN_PATH | Full path to xmlsec1 binary program (see xmlsec_binary) |
SAML_WANT_RESPONSE_SIGNED | Set to True if responses must be signed (see want_response_signed) |
SAML_WANT_ASSERTIONS_SIGNED | Set to True if assertions must be signed (see want_assertions_signed) |
SAML_LOGOUT_REQUESTS_SIGNED | Set to True if logout requests must be signed (see logout_requests_signed) |
SAML_LOGOUT_RESPONSES_SIGNED | Set to True if logout responses must be signed (see logout_responses_signed) |
SAML_SIGNING_ALGORITHM | Signing algorithm (see signing_algorithm) |
SAML_DIGEST_ALGORITHM | Digest algorithm (see digest_algorithm) |
CONTACT_PERSON_N | Contact information for person N (see contact_person) |
ORGANIZATION_NAME_N | Organization name N (see organization) |
ORGANIZATION_DISPLAY_NAME_N | Organization display name N (see organization) |
ORGANIZATION_URL_N | Organization url N (see organization) |
Environment variables ending with suffix _N
are expecting N
to be a sequence of integers starting at 1
.
For example, if two contact persons need to be added to the SAML configuration, the following variables should be set:
CONTACT_PERSON_1=
CONTACT_PERSON_2=
- Contact Person
A contact person environment variable is expecting a comma separated list of values in the following order:
- given name,
- surname,
- company,
- email address,
- type (technical, support, administrative, billing or other).
For example:
CONTACT_PERSON_1=Firstname1,Lastname1,Example Co.,[email protected],technical
- Organization
Each section of the SAML organization configuration is stored in a separate environment variable. Each variable is expecting a comma separated pair composed of:
- label,
- language code.
Below is an example from the Pysaml2 documentation and how to represent it in the CDCS using environment variables.
Example from the documentation:
"organization": {
"name": [
("Example Company", "en"),
("Exempel AB", "se")
],
"display_name": ["Exempel AB"],
"url": [
("http://example.com", "en"),
("http://exempel.se", "se"),
],
}
Equivalent CDCS configuration using environment variables:
ORGANIZATION_NAME_1=Example Company,en
ORGANIZATION_NAME_2=Exempel AB,se
ORGANIZATION_DISPLAY_NAME_1=Exempel AB,se
ORGANIZATION_URL_1=http://example.com,en
ORGANIZATION_URL_2=http://exemple.se,se
CDCS supports integration of a hdl.net server to issue and resolve PIDs for
data and blobs. This requires some setting of environment variables at deploy
time to configure effectively. Please see the file
./deploy/handle/.env.example
for more details.
Variable | Description |
---|---|
ENABLE_HANDLE_PID |
Controls whether CDCS is configured to use a handle server for PIDs. If so, all the values below should be set to values specific for your handle server config (contact your handle server administrator for any help). If you enable Handle integration but don't set these values correctly; it's very likely things won't work. |
HANDLE_NET_LOOKUP_URL |
The URL of the handle server to display links (e.g. https://hdl.handle.net) |
HANDLE_NET_REGISTRATION_URL |
The URL of the handle server for registering records (e.g. https://my-handle-net.domain) |
ID_PROVIDER_PREFIXES |
Prefixes to use when creating handles for data and blobs in CDCS. Comma-separated values. |
HANDLE_NET_USER |
Handle server authentication for a user that has admin rights to list and create handles on the provided prefix. The value provided here will be encoded as "300:{HANDLE_NET_PREFIX}/{HANDLE_NET_USER}" when it is sent to the handle server, so this value should be just the suffix of the admin handle |
HANDLE_NET_SECRET_KEY |
The "secret key" for the admin user specified above. This should be provided as plain text and not encoded in any way. This value corresponds to the secret key that would be used if you were creating a handle via batch file |
PID_PATH |
The location in the default schema in which to store and search for PID values. Should be provided in "dot" notation, with attributes indicated using the "@" character. For example, if your PIDs are stored in an attribute named "pid" on the root element named "Resource", the PID_PATH value should be "Resource.@pid" |
AUTO_SET_PID |
Whether to auto-create PIDs for records that are curated or uploaded without them. Should likely be True if you're using PIDs at all |
HANDLE_NET_RECORD_INDEX |
Starting index for records when minting handles |
The following are admin settings for the handle config. The default values are probably fine, but they should match any example batch files you have for creating handles on your handle server | |
HANDLE_NET_ADMIN_INDEX |
The admin index value (default: 100 ) |
HANDLE_NET_ADMIN_TYPE |
The admin type (default: HS_ADMIN ) |
HANDLE_NET_ADMIN_DATA_FORMAT |
The admin data format (default: admin ) |
HANDLE_NET_ADMIN_DATA_INDEX |
The admin data index value (default: 200 ) |
HANDLE_NET_ADMIN_DATA_PERMISSIONS |
The admin data permissions (default: 011111110011 ) |
Starting from MDCS/NMRR 2.14, repositories of these two projects will have settings ready for deployment (not production).
The deployment can be further customized by mounting additional settings to the deployed containers:
- Option 1: Use settings from the image. This option is recommended
if the settings in your image are already well formatted for deployment.
- Update the
docker-compose.yml
file and comment the line that mounts the settings:
# - ./cdcs/${SETTINGS}.py:/srv/curator/${PROJECT_NAME}/${SETTINGS}.py
- set the
SETTINGS
variable tosettings
.
- Update the
- Option 2: Use default settings from the CDCS image and customize
them. Custom settings can be used to provide CI or production
configurations. For example:
- Create a
custom_settings.py
file (seeci_settings.py
ortest_settings.py
as examples), - set the
SETTINGS
variable tocustom_settings
.
- Create a
- Option 3: Override settings from the image. This will
ignore settings already present in the CDCS image. This option is
recommended for MDCS/NMRR 2.14 and below.
- Update the
docker-compose.yml
file and change the line that mounts the settings to:
- ./cdcs/${PROJECT_NAME}.settings.py:/srv/curator/${PROJECT_NAME}/settings.py
- set the
SETTINGS
variable tosettings
.
- Update the
The DJANGO_SETTINGS_MODULE
environment variable can be set to select which settings to use. By
default the docker-compose
file sets it using the values of
PROJECT_NAME
and SETTINGS
variables.
For more information about production deployment of a Django project, please check the Deployment Checklist
The following web servers are available for the CDCS: uWSGI and Gunicorn.
The CDCS image contains a default configurations for each:
- The default uWSGI configuration writes in a UNIX socket, that Nginx reads from.
A socket file is mounted in both containers (
cdcs_socket
). - Gunicorn on the other hand, communicates with Nginx via a port (8000).
You can switch from one web server to the other by setting WEB_SERVER
in the .env
file to either uwsgi
or gunicorn
.
The Nginx configuration is a little different depending on the web server, so SERVER_CONF
needs to be updated accordingly:
use default
(HTTP deployment with uWSGI) or https
(HTTPS deployment with uWSGI) for uWSGI, and gunicorn_http
or gunicorn_https
for Gunicorn.
docker-compose up -d
(Optional) For testing purposes, using the HTTPS protocol, you can then run the following script to generate and copy self-signed certificates to the container.
./docker_set_ssl.sh
The superuser is the first user that will be added to the CDCS. This is the main administrator on the platform. Once it has been created, more users can be added using the web interface. Wait for the CDCS server to start, then run:
cdcs-docker/deploy$ ./docker_createsuperuser.sh ${username} ${password} ${email}
From CDCS 2.9, to prevent concurrency issues and avoid running database operations multiple times, some database initialization commands have been added. These commands need to be run once, after the initial deployment of the application.
- To load the modules, run the following command:
./docker_loadmodules.sh
NOTE: If modules are added/removed from the project's INSTALLED_APPS
, the commands needs to be run again.
- To load the exporters, run the following command:
./docker_loadexporters.sh
The CDCS is now available at the SERVER_URI
set at deployment.
Please read important deployment information in the troubleshoot section below.
DO NOT set HOSTNAME
, SERVER_URI
and ALLOWED_HOSTS
to localhost or 127.0.0.1.
Even if the system, starts properly, some features may not work
(e.g. the search page may show an error instead of returning data).
When deploying locally, use the computer's IP address to set those two
variables, and use the same IP address when accessing the CDCS via a web browser:
If your machine's IP address is xxx.xxx.xxx.xxx, and the default server configuration was
used to deploy the system, access it by typing http://xxx.xxx.xxx.xxx in the address bar of the browser.
Find the IP of the local machine:
- On Linux and MacOS:
ifconfig
- On Windows:
ipconfig
Then update the .env
file:
HOSTNAME=xxx.xxx.xxx.xxx
SERVER_URI=http://xxx.xxx.xxx.xxx
ALLOWED_HOSTS=xxx.xxx.xxx.xxx
NOTE: For testing purposes, ALLOWED_HOSTS
can be set to *
:
ALLOWED_HOSTS=*
- Set
SERVER_CONF
tohttps
- Update the file
nginx/https.conf
if necessary - Add HTTPS configuration to the mounted
settings.py
file - Have a look at the deployment checklist
Make sure every component is running properly by checking the logs.
For example, to check the logs of an MDCS instance (PROJECT_NAME=mdcs
), use the following commands:
docker logs -f mdcs_cdcs
docker logs -f mdcs_cdcs_nginx
docker logs -f mdcs_cdcs_mongo
docker logs -f mdcs_cdcs_postgres
docker logs -f mdcs_cdcs_redis
From https://hub.docker.com/_/mongo
By default Mongo will set the wiredTigerCacheSizeGB to a value proportional to the host's total memory regardless of memory limits you may have imposed on the container. In such an instance you will want to set the cache size to something appropriate, taking into account any other processes you may be running in the container which would also utilize memory.
Having multiple mongodb containers on the same machine could be an issue as each of them will try to use the same amount of RAM from the host without taking into account the amount used by other containers. This could lead to the server running out of memory.
The amount of RAM used by mongodb can be restricted by adding the
--wiredTigerCacheSizeGB
option to the mongodb command:
Example:
command: "--auth --wiredTigerCacheSizeGB 8"
More information on MongoDB RAM usage can be found in the doc
Additional components can be added to the CDCS stack by providing docker-compose.yml
files for those.
Update the COMPOSE_FILE
variable in the .env
file to do so. More information can be found in on this option in the
documentation.
In preparation for the release of CDCS 3.x, MongoDB becomes an optional component and will not be part of the default stack. It will need to be added for any CDCS 2.x deployment.
To add MongoDB to the CDCS stack, you can do the following:
Update the .env
file to deploy MongoDB:
COMPOSE_FILE=docker-compose.yml:mongo/docker-compose.yml
By default, CDCS images have been running the django web server but also celery worker and celery beat. It is now also possible to change this default behavior and run these services separately by selecting one of the following entrypoint:
docker-entrypoint.sh
: starts the django server, celery worker and celery beat (default)docker-entrypoint-django.sh
: starts the django server onlydocker-entrypoint-celery-worker.sh
: starts the celery worker onlydocker-entrypoint-celery-beat.sh
: starts the celery beat only
The default behavior will continue to run these 3 services within the same container. To deploy the 3 services separately in a docker-compose deployment, you can do the following:
- Update the file
docker-compose.yml
and set the default entrypoint of the cdcs service to only start the django server:
cdcs:
entrypoint: /docker-entrypoint-django.sh
- Then add celery worker and celery beat services to the CDCS stack, by updating the
COMPOSE_FILE
variable from the.env
file:
COMPOSE_FILE=docker-compose.yml:celery/docker-compose.yml
⚠️ Concurrency Issue in CDCS < 2.9: Some CDCS applications make database modifications during their initialization. Starting Django and Celery services in parallel can make these scripts run multiple times, causing inconsistencies in the database. The issue needs to be resolved in the code of the CDCS apps. In the meantime, a startup delay has been implemented for celery services.
📄 The default entrypoint runs the scripts synchronously and does not have this issue.
Ongoing developments on the CDCS make use of Elasticsearch. To add Elasticsearch to the CDCS stack, you can do the following:
Update the .env
file to deploy Elasticsearch:
COMPOSE_FILE=docker-compose.yml:elasticsearch/docker-compose.yml
Add and fill the following environment variables:
Variable | Description |
---|---|
ELASTIC_VERSION | Version of the Elasticsearch image (e.g. 7.16.2) |
On linux, you will need to increase the available virtual memory.
To delete all the containers of the CDCS stack, run:
docker-compose down
To delete all containers and all the data, run:
docker-compose down -v
When a new version of a CDCS image becomes available, the system can be upgraded by doing the following steps:
- Stop the stack
docker-compose stop
- Update the version of the image in
deploy/.env
:
IMAGE_VERSION=3.6.0 # set the version of the new image
- Restart the stack with the new image:
docker-compose up -d
- Run the migration script (that will update static files and apply database migrations):
./docker_migrate.sh
NOTE: the script will do dry runs and ask for confirmation before applying the changes, but it is
recommended to create a back up of the databases before starting the migration.