Skip to content

Move arcticdata.io (Production) to Kubernetes #1954

Closed
3 of 3 issues completed
Closed
3 of 3 issues completed
@artntek

Description

@artntek

Similar to #1932;

checklist:

  • Work with @nickatnceas to copy production data for testing:
    • Time how long it takes to...
      • Copy the production postgres data (arcticdata.io:/var/lib/postgresql) to the PROD ceph volume at /mnt/ceph/repos/arctic/postgresql (treat it like a hot backup).

        • NOTE: we do not need the /var/lib/postgresql/10 directory
      • copy the following subset of production data from arcticdata.io:/var/metacat to the PROD ceph volume at /mnt/ceph/repos/arctic/metacat:

        # /var/metacat/...
        16K	    ./certs
        63T	    ./data
        8.0K        ./dataone
        3.9G        ./documents
        0           ./inline-data
        500K        ./logs
        • Actual Times taken for /var/metacat/data:
          • initial rsync
            root@arctica:/var/metacat# time rsync -aHAX --delete /var/metacat/data/ /mnt/pdg/repos/arctic/metacat/data/
            
            real    14286m43.628s
            user    1131m15.740s
            sys     3907m38.871s
            ## -> 9.92 days
          • subsequent repeat rsync
            brooke@arctica:~$ time sudo rsync -rltDHX  /var/metacat/data/ /mnt/pdg/repos/arctic/metacat/data/
            [sudo] password for brooke:
            
            real	4m19.047s
            user	0m15.747s
            sys	0m34.912s

Follow the Quick Reference: Metacat K8s Installation Steps. Supplementary TODOs below...

Persistent Volumes

  • Set up a PV to point to PROD cephfs .../repos/arctic/metacat for metacat
  • Set up a PV to point to PROD cephfs .../repos/arctic/postgres for postgres
  • Create a PVC for Postgresql; see prod_cluster/metacatarctic/pvc--metacatarctic-postgres.yaml
  • "csi-cephfs-sc-ephemeral" storageClass missing. Ask @nickatnceas to add, like he did for dev cluster:
  storageclass.storage.k8s.io "csi-cephfs-sc-ephemeral" not found

MetacatUI setup

  • Copy config (tokens) from adc server

Metacat Config

  • Add values.yaml overrides for non-default 2.19 settings (diff arcticdata.io $TOMCAT_HOME/webapps/metacat/WEB-INF/metacat.properties with default metacat.properties from 2.19 release)
  • Add values.yaml overrides for newly-introduced 3.0 settings (diff default metacat.properties from 3.0.0 release with default metacat.properties from 2.19 release)
  • Compare with test.arcticdata.io values overrides as a sanity check

First Deployment

  • Complete steps in "First Install - IMPORTANT IF MOVING DATA FROM AN EXISTING LEGACY DEPLOYMENT" BEFORE first startup!

  • solr pods not starting. root cause from logs:

    $ kc logs pod/metacatarctic-solr-1
      /scripts/setup.sh: line 8: /opt/bitnami/scripts/solr/entrypoint.sh: Permission denied

    SOLVED - was overriding extraVolumes values, and the override didn't include the permissions line

  • https://arctic-prod.test.dataone.org/catalog/ (trailing slash) works, but https://arctic-prod.test.dataone.org/catalog gives a 404 (nginx)

  • ensure all data and documents files are group writeable (otherwise, hashstore upgrader can't create hard links):

    sudo find /mnt/ceph/repos/arctic/metacat/data/ -type f ! -perm -g=w -exec chmod g+w {} +
  • chown -R 59997:59997 the ceph dir corresponding to /var/metacat, and update values.yaml to use this uid:gid

    brooke@datateam:/mnt/ceph/repos/arctic$ time sudo chown -R 59997:59997 metacat
    
    real	4m7.026s
    user	0m0.004s
    sys	0m0.027s
  • Hostname aliases and rewrite rules

    • Figure out how to do these with ingress; see all-sites-enabled.conf. Lots of complexity - eg http://aoncadis.org aliased to adc.io, and site conf has RewriteMaps each having >3700 entries.
    • EXPLANATION: aoncadis.org was the predecessor to the ADC site. These rewrite rules map existing, old dataset urls to their new locations on ADC - so these rewrites need to be maintained somewhere
    • Leave all the redirects/other sites on the current Apache host, and move only arcticdata.io.

ATTENTION: Still To Do Before Final Deployment

  • Time hashstore conversion

  • Time reindex-all

  • MetacatUI + WordPress setup. How do we host it and link to k8s metacat?

    • ACTION: use a wordpress image/bitnami chart, deployed separately from the metacat helm chart
  • ACTION: Ask @nickatnceas for help with letsencrypt certs - do we need to remove arcticdata.io from wildcard cert on arctica? NOTE: we still need subdomain certs there (ie status.adc, beta.adc).

  • Skip 3.0.0 and deploy 3.1.0, but only after it's been running on less-trafficked hosts for a while. See proposed release plan in Issue Metacat 3.1.0 Release Plan #1984.

Testing - see Matt's comment below

Sub-issues

Metadata

Metadata

Assignees

Labels

Epick8sKubernetes/Helm Related

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions