Checklist: Metacat K8s 3.0.0 to 3.1.0 Upgrade Steps

= = = THIS IS A TEMPLATE - MAKE YOUR OWN COPY BEFORE CHECKING BOXES! = = =

IMPORTANT NOTES - before you begin...

PURPOSE: This ordered checklist is for upgrading an existing K8s Metacat v3.0.0 instance to Metacat v3.1.0

Very Important: Before starting a migration, you must have a fully-functioning k8s installation of Metacat version 3.0.0/helm chart 1.1.x. Upgrade from other versions is not supported.

Some references below are specific to NCEAS infrastructure (e.g. CephFS storage); adjust as needed for your own installation.

Assumptions: you have a working knowledge of Kubernetes deployment, including working with yaml files, helm and kubectl commands, and your kubectl context is set for the target deployment location

1. Values: Edit your existing values override file

e.g. see the values-dev-cluster-example.yaml file.

Remove any uid or gid overrides, since we're now adopting the defaults of 59996 for postgres and 59997 for metacat
Temporarily disable probes until hashstore conversion is done

2. Prep Before Upgrading

set storage.hashstore.disableConversion: true, so the hashstore converter won't run yet

In the metacat database, verify that all the systemmetadata.checksum_algorithm entries are on the list of supported algorithms (NOTE: syntax matters! E.g. sha-1 is OK, but sha1 isn't):

kubectl exec ${RELEASE_NAME}-postgresql-0 -- bash -c "psql -U metacat << EOF
  SELECT DISTINCT checksum_algorithm FROM systemmetadata WHERE checksum_algorithm NOT IN
        ('MD2','MD5','SHA-1','SHA-256','SHA-384','SHA-512','SHA-512/224','SHA-512/256');
EOF"

# then manually update each to the correct syntax; e.g:
kubectl exec ${RELEASE_NAME}-postgresql-0 -- bash -c "psql -U metacat << EOF
  UPDATE systemmetadata SET checksum_algorithm='SHA-1' WHERE checksum_algorithm='SHA1';
EOF"
# ...etc

3. Upgrading - NOTE THERE WILL BE DOWNTIME!

= = = = = = = = Downtime starts here = = = = = = = =

Change ownership ON CEPHFS as follows:

## postgres (59996:59996) in postgresql data directory
sudo chown -R 59996:59996 /mnt/ceph/repos/REPO-NAME/postgresql

## tomcat (59997:59997) in metacat directory
sudo chown -R 59997:59997 data dataone documents logs

...then ensure all metacat data and documents files have g+rw permissions, otherwise, hashstore converter can't create hard links:
```
sudo chmod -R g+rw data documents dataone
```
helm upgrade, debug any startup and configuration issues
Delete or comment out the storage.hashstore.disableConversion: setting, so the hashstore converter will run, and helm upgrade again. Allow hashstore upgrade to finish. (production machines took approx 0.16 seconds per object, but will likely be longer on dev cluster)

NOTE: while hashstore conversion is still in progress, it is expected for metacatUI to display Oops! It looks like there was a problem retrieving your search results., and for /metacat/d1/mn/v2/ api calls to display Metacat has not been configured

See Tips, below for how to detect when hashstore conversion finishes

= = = = = = = = Downtime ends here = = = = = = = =

When hashstore conversion has finished, re-enable probes and helm upgrade to apply changes

Tips:

Monitor Hashstore Conversion Progress and Completion

To monitor progress: check the number of rows in the checksums table: total # rows should be: 5 * (total objects), (approx; not accounting for conversion errors), where total object count can be found from https://HOSTNAME/CONTEXT/d1/mn/v2/object
```
# get number of entries in `checksums` table -- should be approx 5*(total objects)
kubectl exec ${RELEASE_NAME}-postgresql-0 -- bash -c "psql -U metacat << EOF
  select count(*) from checksums;
EOF"
```

To detect when hashstore conversion finishes:

# EITHER CHECK STATUS FROM DATABASE...
kubectl exec ${RELEASE_NAME}-postgresql-0 -- bash -c "psql -U metacat << EOF
  select storage_upgrade_status from version_history where status='1';
EOF"

# ...OR CHECK LOGS
# If log4j root level is INFO
egrep "\[INFO\]: The conversion took [0-9]+ minutes.*HashStoreUpgrader:upgrade"

# If log4j root level is WARN, can also grep for this, if errors:
egrep "\[WARN\]: The conversion is complete"

Fix Hashstore Error - PID Doesn't Exist in `identifier` Table:

# If you see this in the metacat logs:
Pid <autogen pid> is missing system metadata. Since the pid starts with autogen and looks like to be
created by DataONE api, it should have the systemmetadata. Please look at the systemmetadata and
identifier table to figure out the real pid.

Steps to resolve:

Given the docid, get all revisions:

select * from identifier where docid='<docid>';

Look for pid beginning 'autogen', and note its revision number

pid should be the obsoleted_by from the previous revision's system metadata:

select obsoleted_by from systemmetadata where guid='<previous revision pid>';

Check by look at obsoletes from the following revision, if one exists:

select obsoletes from systemmetadata where guid='<following revision pid>';

Check if systemmetadata table has an entry for autogen pid

select checksum from systemmetadata where guid='<autogen pid>';

...and the checksum matches that of the original file, found in:

/var/metacat/(data or documents)/<'autogen' docid>.<revision number>

= = = If these exist and do not match, STOP HERE AND INVESTIGATE FURTHER! = = =

If an autogen-pid entry was found, update it with the new pid:

update systemmetadata set guid='<pid from steps 3 & 4>' where guid='<autogen pid>';

Replace the 'autogen' pid with the real pid in the 'identifier' table:

update identifier set guid='<pid from steps 3 & 4>' where guid='<autogen pid>';

Set the hashstore conversion status back to pending:
```
update version_history set storage_upgrade_status='pending' where status='1';
```
...and restart the metacat pod to re-run the hashstore conversion and generate the correct sysmeta file in hashstore

Monitor Indexing Progress via RabbitMQ Dashboard:

Enable port forwarding:

kubectl port-forward service/${RELEASE_NAME}-rabbitmq-headless 15672:15672

then browse http://localhost:15672. Username metacat-rmq-guest and RabbitMQ password from metacat Secrets, or from:

secret_name=$(kubectl get secrets | egrep ".*\-metacat-secrets" | awk '{print $1}')
rmq_pwd=$(kubectl get secret "$secret_name" \
        -o jsonpath="{.data.rabbitmq-password}" | base64 -d)
echo "rmq_pwd: $rmq_pwd"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3.0.0-to-3.1.0-upgrade-checklist.md

3.0.0-to-3.1.0-upgrade-checklist.md

Checklist: Metacat K8s 3.0.0 to 3.1.0 Upgrade Steps

IMPORTANT NOTES - before you begin...

1. Values: Edit your existing values override file

2. Prep Before Upgrading

3. Upgrading - NOTE THERE WILL BE DOWNTIME!

= = = = = = = = Downtime starts here = = = = = = = =

= = = = = = = = Downtime ends here = = = = = = = =

Tips:

Monitor Hashstore Conversion Progress and Completion

Fix Hashstore Error - PID Doesn't Exist in `identifier` Table:

= = = If these exist and do not match, STOP HERE AND INVESTIGATE FURTHER! = = =

Monitor Indexing Progress via RabbitMQ Dashboard:

Files

3.0.0-to-3.1.0-upgrade-checklist.md

Latest commit

History

3.0.0-to-3.1.0-upgrade-checklist.md

File metadata and controls

Checklist: Metacat K8s 3.0.0 to 3.1.0 Upgrade Steps

IMPORTANT NOTES - before you begin...

1. Values: Edit your existing values override file

2. Prep Before Upgrading

3. Upgrading - NOTE THERE WILL BE DOWNTIME!

= = = = = = = = Downtime starts here = = = = = = = =

= = = = = = = = Downtime ends here = = = = = = = =

Tips:

Monitor Hashstore Conversion Progress and Completion

Fix Hashstore Error - PID Doesn't Exist in identifier Table:

= = = If these exist and do not match, STOP HERE AND INVESTIGATE FURTHER! = = =

Monitor Indexing Progress via RabbitMQ Dashboard:

Fix Hashstore Error - PID Doesn't Exist in `identifier` Table: