= = = THIS IS A TEMPLATE - MAKE YOUR OWN COPY BEFORE CHECKING BOXES! = = =
- PURPOSE: This ordered checklist is for upgrading an existing K8s Metacat v3.0.0 instance to Metacat v3.1.0
- Very Important: Before starting a migration, you must have a fully-functioning k8s installation of Metacat version 3.0.0/helm chart 1.1.x. Upgrade from other versions is not supported.
- Some references below are specific to NCEAS infrastructure (e.g. CephFS storage); adjust as needed for your own installation.
- Assumptions: you have a working knowledge of Kubernetes deployment, including working with yaml files, helm and kubectl commands, and your kubectl context is set for the target deployment location
e.g. see the values-dev-cluster-example.yaml file.
- Remove any uid or gid overrides, since we're now adopting the defaults of 59996 for postgres and 59997 for metacat
- Temporarily disable probes until hashstore conversion is done
- set
storage.hashstore.disableConversion: true
, so the hashstore converter won't run yet - In the metacat database, verify that all the
systemmetadata.checksum_algorithm
entries are on the list of supported algorithms (NOTE: syntax matters! E.g.sha-1
is OK, butsha1
isn't):kubectl exec ${RELEASE_NAME}-postgresql-0 -- bash -c "psql -U metacat << EOF SELECT DISTINCT checksum_algorithm FROM systemmetadata WHERE checksum_algorithm NOT IN ('MD2','MD5','SHA-1','SHA-256','SHA-384','SHA-512','SHA-512/224','SHA-512/256'); EOF" # then manually update each to the correct syntax; e.g: kubectl exec ${RELEASE_NAME}-postgresql-0 -- bash -c "psql -U metacat << EOF UPDATE systemmetadata SET checksum_algorithm='SHA-1' WHERE checksum_algorithm='SHA1'; EOF" # ...etc
-
Change ownership ON CEPHFS as follows:
## postgres (59996:59996) in postgresql data directory sudo chown -R 59996:59996 /mnt/ceph/repos/REPO-NAME/postgresql ## tomcat (59997:59997) in metacat directory sudo chown -R 59997:59997 data dataone documents logs
-
...then ensure all metacat
data
anddocuments
files haveg+rw
permissions, otherwise, hashstore converter can't create hard links:sudo chmod -R g+rw data documents dataone
-
helm upgrade
, debug any startup and configuration issues -
Delete or comment out the
storage.hashstore.disableConversion:
setting, so the hashstore converter will run, andhelm upgrade
again. Allow hashstore upgrade to finish. (production machines took approx 0.16 seconds per object, but will likely be longer on dev cluster)NOTE: while hashstore conversion is still in progress, it is expected for metacatUI to display
Oops! It looks like there was a problem retrieving your search results.
, and for/metacat/d1/mn/v2/
api calls to displayMetacat has not been configured
See Tips, below for how to detect when hashstore conversion finishes
- When hashstore conversion has finished, re-enable probes and helm upgrade to apply changes
- To monitor progress: check the number of rows in the
checksums
table: total # rows should be:5 * (total objects)
, (approx; not accounting for conversion errors), where total object count can be found fromhttps://HOSTNAME/CONTEXT/d1/mn/v2/object
# get number of entries in `checksums` table -- should be approx 5*(total objects) kubectl exec ${RELEASE_NAME}-postgresql-0 -- bash -c "psql -U metacat << EOF select count(*) from checksums; EOF"
- To detect when hashstore conversion finishes:
# EITHER CHECK STATUS FROM DATABASE... kubectl exec ${RELEASE_NAME}-postgresql-0 -- bash -c "psql -U metacat << EOF select storage_upgrade_status from version_history where status='1'; EOF" # ...OR CHECK LOGS # If log4j root level is INFO egrep "\[INFO\]: The conversion took [0-9]+ minutes.*HashStoreUpgrader:upgrade" # If log4j root level is WARN, can also grep for this, if errors: egrep "\[WARN\]: The conversion is complete"
# If you see this in the metacat logs:
Pid <autogen pid> is missing system metadata. Since the pid starts with autogen and looks like to be
created by DataONE api, it should have the systemmetadata. Please look at the systemmetadata and
identifier table to figure out the real pid.
Steps to resolve:
- Given the docid, get all revisions:
select * from identifier where docid='<docid>';
- Look for pid beginning 'autogen', and note its revision number
- pid should be the
obsoleted_by
from the previous revision's system metadata:select obsoleted_by from systemmetadata where guid='<previous revision pid>';
- Check by look at
obsoletes
from the following revision, if one exists:select obsoletes from systemmetadata where guid='<following revision pid>';
- Check if systemmetadata table has an entry for autogen pid
...and the checksum matches that of the original file, found in:
select checksum from systemmetadata where guid='<autogen pid>';
/var/metacat/(data or documents)/<'autogen' docid>.<revision number>
- If an autogen-pid entry was found, update it with the new pid:
update systemmetadata set guid='<pid from steps 3 & 4>' where guid='<autogen pid>';
- Replace the 'autogen' pid with the real pid in the 'identifier' table:
update identifier set guid='<pid from steps 3 & 4>' where guid='<autogen pid>';
- Set the hashstore conversion status back to
pending
:...and restart the metacat pod to re-run the hashstore conversion and generate the correct sysmeta file in hashstoreupdate version_history set storage_upgrade_status='pending' where status='1';
-
Enable port forwarding:
kubectl port-forward service/${RELEASE_NAME}-rabbitmq-headless 15672:15672
-
then browse http://localhost:15672. Username
metacat-rmq-guest
and RabbitMQ password from metacat Secrets, or from:secret_name=$(kubectl get secrets | egrep ".*\-metacat-secrets" | awk '{print $1}') rmq_pwd=$(kubectl get secret "$secret_name" \ -o jsonpath="{.data.rabbitmq-password}" | base64 -d) echo "rmq_pwd: $rmq_pwd"