-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move arcticdata.io (Production) to Kubernetes #1954
Comments
For the Testing section, here's a quick rundown: Get the R package
|
hashstore conversion notesFirst conversion (with errors) took almost exactly 48 hours |
11/19/24: Second conversion (comprising only the failed objects from last time) took 42 minutes see #1964 (comment) for error analysis |
11/19/24: Did another rsync and clean hashstore conversionbrooke@arctica:~$ time sudo rsync -aHAX --delete /var/lib/postgresql/ /mnt/ceph/repos/$NAME/postgresql/
real 60m57.679s
user 1m5.106s
sys 4m33.743s
time sudo rsync -rltDHX --stats --human-readable /var/metacat/data/ /mnt/ceph/repos/$NAME/metacat/data/
real 29m29.133s
user 1m56.979s
sys 6m45.844s
time sudo rsync -rltDHX --stats --human-readable /var/metacat/dataone/ /mnt/ceph/repos/$NAME/metacat/dataone/
real 0m10.742s
user 0m0.037s
sys 0m0.018s
time sudo rsync -rltDHX --stats --human-readable /var/metacat/documents/ /mnt/ceph/repos/$NAME/metacat/documents/
real 0m16.327s
user 0m0.490s
sys 0m1.014s
time sudo rsync -rltDHX --stats --human-readable /var/metacat/logs/ /mnt/ceph/repos/$NAME/metacat/logs/
real 0m0.101s
user 0m0.025s
sys 0m0.016s hashstore conversion started: Wed Nov 20 22:55:15 UTC 2024 Total 1116383 objects =>
|
Unexplained log entries
|
Hashstore Conversion Errors to Follow up on:nonMatchingChecksum_2024-11-20_22-55-18.txt
Can't find the object [..] in the Metacat legacy store.Previously investigated: missing system metadata
see these steps to fix. Applies to these 6 pids:
Docid not found in the identifier table: urn:uuid:2de418e3-d9bb-4b7f-82af-ef5885da6b9b
|
Initial index-all
|
indexer log errors to Investigate$ cat * | grep -c "\[ERROR\]"
15448 15448 total errors across all 50 index workers.4557 Errors containing: "Cannot index the task for identifier", which is the general top-level error indexer with more than one root cause.Of these...
Following are all due to existing bad data/metadata, and are currently not indexed on adc, so we're no worse off. Can fix in future if/when there's time
10,891 start with
|
Similar to #1932;
checklist:
Copy the production postgres data (
arcticdata.io:/var/lib/postgresql
) to the PROD ceph volume at/mnt/ceph/repos/arctic/postgresql
(treat it like a hot backup)./var/lib/postgresql/10
directorycopy the following subset of production data from
arcticdata.io:/var/metacat
to the PROD ceph volume at/mnt/ceph/repos/arctic/metacat
:# /var/metacat/... 16K ./certs 63T ./data 8.0K ./dataone 3.9G ./documents 0 ./inline-data 500K ./logs
Follow the Quick Reference: Metacat K8s Installation Steps. Supplementary TODOs below...
Persistent Volumes
.../repos/arctic/metacat
for metacat.../repos/arctic/postgres
for postgresprod_cluster/metacatarctic/pvc--metacatarctic-postgres.yaml
storageclass.storage.k8s.io "csi-cephfs-sc-ephemeral" not found
MetacatUI setup
Metacat Config
values.yaml
overrides for non-default 2.19 settings (diff arcticdata.io$TOMCAT_HOME/webapps/metacat/WEB-INF/metacat.properties
with defaultmetacat.properties
from 2.19 release)values.yaml
overrides for newly-introduced 3.0 settings (diff defaultmetacat.properties
from 3.0.0 release with defaultmetacat.properties
from 2.19 release)First Deployment
Complete steps in "First Install - IMPORTANT IF MOVING DATA FROM AN EXISTING LEGACY DEPLOYMENT" BEFORE first startup!
solr pods not starting. root cause from logs:
SOLVED - was overriding
extraVolumes
values, and the override didn't include the permissions linehttps://arctic-prod.test.dataone.org/catalog/
(trailing slash) works, buthttps://arctic-prod.test.dataone.org/catalog
gives a 404 (nginx)ensure all data and documents files are group writeable (otherwise, hashstore upgrader can't create hard links):
sudo find /mnt/ceph/repos/arctic/metacat/data/ -type f ! -perm -g=w -exec chmod g+w {} +
chown -R 59997:59997
the ceph dir corresponding to/var/metacat
, and update values.yaml to use this uid:gidbrooke@datateam:/mnt/ceph/repos/arctic$ time sudo chown -R 59997:59997 metacat real 4m7.026s user 0m0.004s sys 0m0.027s
Hostname aliases and rewrite rules
ATTENTION: Still To Do Before Final Deployment
Time hashstore conversion
Time reindex-all
MetacatUI + WordPress setup. How do we host it and link to k8s metacat?
ACTION: Ask @nickatnceas for help with letsencrypt certs - do we need to remove
arcticdata.io
from wildcard cert on arctica? NOTE: we still need subdomain certs there (ie status.adc, beta.adc).Skip 3.0.0 and deploy 3.1.0, but only after it's been running on less-trafficked hosts for a while. See proposed release plan in Issue Metacat 3.1.0 Release Plan #1984.
Testing - see Matt's comment below
The text was updated successfully, but these errors were encountered: