-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.3 Multi-tenant CKAN [Very Optional] #10
Comments
True multi-tenancy is very nice to have, but non-trivial speaking from experience as it often entails architectural changes. Perhaps, we can just document some multi-tenant deployment patterns that members of the community are already using. https://lists.okfn.org/pipermail/ckan-dev/2014-December/008454.html |
Hi @rossjones - apologies for the late answer! (update July 2015 - WIP: docker-based installation docs here) At the Western Australian Department of Parks and Wildlife we're running three CKAN sites of one CKAN installation off one Ubuntu 14.04 AWS EC2 t2.medium with currently 100 GB HD:
Having a shared installation with separate configs and databases makes maintaining the CKAN installation easier, and thanks to AWS's nightly snapshotting we can restore the installation to any last 30 days, plus we can snapshot manually as required. Compared to this easily accessed deployment, maintaining our previous Docker install was much more indirect and time-intensive - we had to rebuild the image with every change to CKAN. Once CKAN matures to a stable version we will consider going back to Docker. The following sections illustrate our sharing/separation rather than aim to be comprehensive installation instructions. The setup is also drawn up here - apologies for the link bait :-) File systemThe VM has another 100 GB btrfs volume mounted at Installation processThe installation is a CKAN source install as of November 2014 with our fork of the CKAN 2.3a master. Multi-tenancy did cause no problems at all. It only required a planning process for file locations / paths depending on which components were shared or separated out. The most tricky bit of the standard source install was to understand folder permissions, and keep in mind which user would access which files (the webserver www-data, the database superuser postgres, the CKAN database user ckan_default, the virtualenv owner root?), and also the grey zone to our networking setup from the VM's ports (CKAN ports 5000, 5001, 5002; Datapusher port 8800; local SolR port 8983), and the reverse proxy and firewall settings outside of my control and visibility. Shared CKAN installationWe run the latest CKAN master (same with some extensions), as it provides some critical bug fixes the latest stable CKAN doesn't have. Also, this enables us to send the fixes back as pull requests. Contents of
The
Disaster recoveryUsing btrfs for snapshotting, we can create a manual snapshot of the
Additionally, AWS provides us with a 30 day rolling nightly snapshot of the whole VM (including the manual btrfs snapshots, which might be older than 30 days). Finally, the db can be dumped, and the db snapshots as well as contents of the filestore/datastore/template etc folders can be rsynced to a safe location off site. LogsMost logs live in DatabaseOne Postgres 9.3 cluster runs as system service with one database for each CKAN instance, creatively called Having separate dbs within the same db cluster was unproblematic and simplified (cluster-based) access management. A few relevant code snippets for your convenience: Moving the Postgres data folder to a custom location
Setup the databases from scratchShown here: ckan_private and ckan_public, not shown: ckan_test
Migrating data from one CKAN instance on separate server to anotherShown: The old server is a CKAN docker container I'm attached to, with the virtualenv activated and me being root. There's one CKAN site per docker container, hence the database is called
Load production data from another systemuse the rsynced .dump files from previous section. Dump the current db as a backup.
Load production data from one CKAN to a new CKANThis step shows how a completely installed CKAN site (here:
SolRThe SolR setup needed tough love, strong words, and brisk walks in fresh air. In our setup, one core serves three instances, but as @rossjones said, one dedicated core might be better. I'll go into more detail here in the hope to prevent further suffering. We use the solr-spatial-field because bounding polygons are useful for georeferencing marine datasets along our curved coastline. Following digital ocean:
Download SolR
Create Jetty service that runs SolRCreate /etc/default/jetty:
Create /opt/solr/etc/jetty-logging.xml:
Create Solr user and jetty service:
Customise SolR to CKAN schema
Download jts-topo-suite JTS-1.13, unpack and copy jars into Test setupTest JTS at CKAN iniWith the SolR core "collection1" renamed to "ckan", and the solr admin GUI at
It is not necessary to open port 8993 in the firewall, as the requests to SolR never leave the local machine. RedisOne redis serves as shared message queue for ckanext-harvesting and -archiver; separate supervisord configs run
ConfigsEach CKAN instance runs off the same CKAN and plugins, but of course with separate configs:
To facilitate maintaining one config per CKAN site, let's parameterise the
MaintenanceAdding a new plugin, and enabling it only in the test instance has been without negative effects on the other instances so far. HostingThe VM runs an Apache 2.4 (remember "Require all granted") server with /etc/apache2/ports.conf
Separate virtualhost configs: /etc/apache2/sites-available/ckan_SITE_ID.conf
Separate wsgis: /etc/ckan/default/SITE_ID.wsgi
OutcomeThe illustrated setup results in one AWS VM serving three CKAN instances on ports 5000, 5001, 5002 and one datapusher (remember to open that port in the firewall) at 8800. SolR listens only locally on localhost:8983 to /solr/ckan so that won't leave the firewall. So far we haven't had any dramas (apart from me inadvertently chowning the entire /var/ folder which broke a lot - don't do this at home), our penetration testing suite hammers the CKANs without findings, and the Google spider pings our external instance every 2 seconds. |
@florianm, this is really useful information. No doubt many future developers will be grateful to find this at the top of their Google results. :) We've just tweeted about it, which I hope will help to spread the word. I would have thought that multi-tenancy would require significant modifications to CKAN, but you did it! :) |
This is fantastic |
I've put up a simple diagram and description of the way ckan-multisite will share the same ckan code, config and extensions for each ckan instance created: https://github.com/boxkite/ckan-multisite/ We're not building anything to share user accounts at the moment, but that would be a really nice auth plugin for ckan. Maybe there's an existing one we could use that's based on ldap or windows domain auth. |
The text was updated successfully, but these errors were encountered: