Description
Similar to #1932;
checklist:
- Work with @nickatnceas to copy production data for testing:
- Time how long it takes to...
-
Copy the production postgres data (
arcticdata.io:/var/lib/postgresql
) to the PROD ceph volume at/mnt/ceph/repos/arctic/postgresql
(treat it like a hot backup).- NOTE: we do not need the
/var/lib/postgresql/10
directory
- NOTE: we do not need the
-
copy the following subset of production data from
arcticdata.io:/var/metacat
to the PROD ceph volume at/mnt/ceph/repos/arctic/metacat
:# /var/metacat/... 16K ./certs 63T ./data 8.0K ./dataone 3.9G ./documents 0 ./inline-data 500K ./logs
- Actual Times taken for /var/metacat/data:
- initial rsync
root@arctica:/var/metacat# time rsync -aHAX --delete /var/metacat/data/ /mnt/pdg/repos/arctic/metacat/data/ real 14286m43.628s user 1131m15.740s sys 3907m38.871s ## -> 9.92 days
- subsequent repeat rsync
brooke@arctica:~$ time sudo rsync -rltDHX /var/metacat/data/ /mnt/pdg/repos/arctic/metacat/data/ [sudo] password for brooke: real 4m19.047s user 0m15.747s sys 0m34.912s
- initial rsync
- Actual Times taken for /var/metacat/data:
-
- Time how long it takes to...
Follow the Quick Reference: Metacat K8s Installation Steps. Supplementary TODOs below...
Persistent Volumes
- Set up a PV to point to PROD cephfs
.../repos/arctic/metacat
for metacat - Set up a PV to point to PROD cephfs
.../repos/arctic/postgres
for postgres - Create a PVC for Postgresql; see
prod_cluster/metacatarctic/pvc--metacatarctic-postgres.yaml
- "csi-cephfs-sc-ephemeral" storageClass missing. Ask @nickatnceas to add, like he did for dev cluster:
storageclass.storage.k8s.io "csi-cephfs-sc-ephemeral" not found
MetacatUI setup
- Copy config (tokens) from adc server
Metacat Config
- Add
values.yaml
overrides for non-default 2.19 settings (diff arcticdata.io$TOMCAT_HOME/webapps/metacat/WEB-INF/metacat.properties
with defaultmetacat.properties
from 2.19 release) - Add
values.yaml
overrides for newly-introduced 3.0 settings (diff defaultmetacat.properties
from 3.0.0 release with defaultmetacat.properties
from 2.19 release) - Compare with test.arcticdata.io values overrides as a sanity check
First Deployment
-
Complete steps in "First Install - IMPORTANT IF MOVING DATA FROM AN EXISTING LEGACY DEPLOYMENT" BEFORE first startup!
-
solr pods not starting. root cause from logs:
$ kc logs pod/metacatarctic-solr-1 /scripts/setup.sh: line 8: /opt/bitnami/scripts/solr/entrypoint.sh: Permission denied
SOLVED - was overriding
extraVolumes
values, and the override didn't include the permissions line -
https://arctic-prod.test.dataone.org/catalog/
(trailing slash) works, buthttps://arctic-prod.test.dataone.org/catalog
gives a 404 (nginx) -
ensure all data and documents files are group writeable (otherwise, hashstore upgrader can't create hard links):
sudo find /mnt/ceph/repos/arctic/metacat/data/ -type f ! -perm -g=w -exec chmod g+w {} +
-
chown -R 59997:59997
the ceph dir corresponding to/var/metacat
, and update values.yaml to use this uid:gidbrooke@datateam:/mnt/ceph/repos/arctic$ time sudo chown -R 59997:59997 metacat real 4m7.026s user 0m0.004s sys 0m0.027s
-
Hostname aliases and rewrite rules
- Figure out how to do these with ingress; see all-sites-enabled.conf. Lots of complexity - eg http://aoncadis.org aliased to adc.io, and site conf has RewriteMaps each having >3700 entries.
- EXPLANATION: aoncadis.org was the predecessor to the ADC site. These rewrite rules map existing, old dataset urls to their new locations on ADC - so these rewrites need to be maintained somewhere
- Leave all the redirects/other sites on the current Apache host, and move only arcticdata.io.
ATTENTION: Still To Do Before Final Deployment
-
Time hashstore conversion
-
Time reindex-all
-
MetacatUI + WordPress setup. How do we host it and link to k8s metacat?
- ACTION: use a wordpress image/bitnami chart, deployed separately from the metacat helm chart
-
ACTION: Ask @nickatnceas for help with letsencrypt certs - do we need to remove
arcticdata.io
from wildcard cert on arctica? NOTE: we still need subdomain certs there (ie status.adc, beta.adc). -
Skip 3.0.0 and deploy 3.1.0, but only after it's been running on less-trafficked hosts for a while. See proposed release plan in Issue Metacat 3.1.0 Release Plan #1984.