Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move ESA to K8s #2062

Open
artntek opened this issue Feb 5, 2025 · 5 comments
Open

Move ESA to K8s #2062

artntek opened this issue Feb 5, 2025 · 5 comments
Assignees

Comments

@artntek
Copy link
Contributor

artntek commented Feb 5, 2025

Tracking progress for moving https://data.esa.org/ from mn-ucsb-2.dataone.org to k8s prod cluster.

Add any notes to this issue, and follow checklist in sub-issue #2063

@artntek
Copy link
Contributor Author

artntek commented Feb 5, 2025

Notes on rsync

  • ceph is not mounted on ESA host (mn-ucsb-2.dataone.org), and I'm rsyncing across hosts to datateam using brooke login (see commands below).
  • Therefore, need to log into datateam, and chown -R brooke:brooke on /mnt/ceph/repos/esa/metacat and /mnt/ceph/repos/esa/postgresql destination ceph directories, before running the rsync on mn-ucsb-2
  • After completing rsync, need to log into datateam, and chown back to 59997 and 59996

Commands

$ time sudo rsync -aHAX /var/esa/data/ [email protected]:/mnt/ceph/repos/esa/metacat/data/
real	1m14.854s
user	0m0.159s
sys	0m0.104s

brooke@mn-ucsb-2:~$ time sudo rsync -aHAX /var/esa/documents/ [email protected]:/mnt/ceph/repos/esa/metacat/documents/
real	0m5.177s
user	0m0.144s
sys	0m0.081s

brooke@mn-ucsb-2:~$ time sudo rsync -aHAX /var/esa/logs/ [email protected]:/mnt/ceph/repos/esa/metacat/logs/
real	0m2.261s
user	0m0.114s
sys	0m0.052s

brooke@mn-ucsb-2:~$ time sudo rsync -aHAX /var/lib/postgresql/14/ [email protected]:/mnt/ceph/repos/esa/postgresql/14/
real	1m8.735s
user	0m10.161s
sys	0m24.387s

@artntek
Copy link
Contributor Author

artntek commented Feb 7, 2025

Indexer startup issue: HashStore not yet initialized on fresh install, and indexers come up before metacat - so indexer HashStore lib tries to initialize it, but doesn't have write access:

(should self-resolve after metacat pod is up and running, and has initialized HashStore)

dataone-indexer 20250207-14:44:57: [ERROR]: Dataone-indexer cannot initialize the Storage class since HashStoreFactory - Error creating 'FileHashStore' instance: java.nio.file.FileSystemException: /var/metacat/hashstore: Read-only file system [org.dataone.indexer.storage.Storage:<clinit>:28]
	at org.dataone.cn.indexer.IndexWorker.<init>(IndexWorker.java:225) [dataone-index-worker-3.1.1-shaded.jar:?]
org.dataone.hashstore.exceptions.HashStoreFactoryException: HashStoreFactory - Error creating 'FileHashStore' instance: java.nio.file.FileSystemException: /var/metacat/hashstore: Read-only file system
	at org.dataone.cn.indexer.IndexWorker.<init>(IndexWorker.java:209) [dataone-index-worker-3.1.1-shaded.jar:?]
	at org.dataone.hashstore.HashStoreFactory.getHashStore(HashStoreFactory.java:84) ~[dataone-index-worker-3.1.1-shaded.jar:?]
	at org.dataone.indexer.storage.Storage.<init>(Storage.java:61) ~[dataone-index-worker-3.1.1-shaded.jar:?]
	at org.dataone.cn.indexer.IndexWorker.main(IndexWorker.java:103) [dataone-index-worker-3.1.1-shaded.jar:?]
	at org.dataone.indexer.storage.Storage.<clinit>(Storage.java:26) [dataone-index-worker-3.1.1-shaded.jar:?]
	at org.dataone.cn.indexer.object.ObjectManager.<clinit>(ObjectManager.java:57) [dataone-index-worker-3.1.1-shaded.jar:?]
	at org.dataone.cn.indexer.IndexWorker.<init>(IndexWorker.java:225) [dataone-index-worker-3.1.1-shaded.jar:?]
	at org.dataone.cn.indexer.IndexWorker.<init>(IndexWorker.java:209) [dataone-index-worker-3.1.1-shaded.jar:?]
	at org.dataone.cn.indexer.IndexWorker.main(IndexWorker.java:103) [dataone-index-worker-3.1.1-shaded.jar:?]

@artntek
Copy link
Contributor Author

artntek commented Feb 7, 2025

Database related issues

Metacat startup error. Note that database name is "esa", not "metacat", although postgresql.auth.database is correctly set to esa, and the configmap contains the correct value of:

database.connectionURI=jdbc:postgresql://metacatesa-postgresql-hl/esa

Next step - check debug output for correct props init

Error:

 [edu.ucsb.nceas.metacat.startup.StartupRequirementsChecker:abort:351]
org.postgresql.util.PSQLException: FATAL: database "metacat" does not exist
	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2733) ~[postgresql-42.7.4.jar:42.7.4]
	at org.postgresql.core.v3.QueryExecutorImpl.readStartupMessages(QueryExecutorImpl.java:2845) ~[postgresql-42.7.4.jar:42.7.4]
	at org.postgresql.core.v3.QueryExecutorImpl.<init>(QueryExecutorImpl.java:176) ~[postgresql-42.7.4.jar:42.7.4]
	at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:323) ~[postgresql-42.7.4.jar:42.7.4]
	at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:54) ~[postgresql-42.7.4.jar:42.7.4]
	at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:273) ~[postgresql-42.7.4.jar:42.7.4]
	at org.postgresql.Driver.makeConnection(Driver.java:446) ~[postgresql-42.7.4.jar:42.7.4]
	at org.postgresql.Driver.connect(Driver.java:298) ~[postgresql-42.7.4.jar:42.7.4]
	at java.sql.DriverManager.getConnection(Unknown Source) ~[java.sql:?]
	at java.sql.DriverManager.getConnection(Unknown Source) ~[java.sql:?]
	at edu.ucsb.nceas.metacat.database.DBConnection.openConnection(DBConnection.java:372) ~[metacat.jar:?]
	at edu.ucsb.nceas.metacat.database.DBConnection.openConnection(DBConnection.java:345) ~[metacat.jar:?]
	at edu.ucsb.nceas.metacat.database.DBConnection.<init>(DBConnection.java:83) ~[metacat.jar:?]
	at edu.ucsb.nceas.metacat.database.DBConnectionPool.initialDBConnectionPool(DBConnectionPool.java:187) ~[metacat.jar:?]
	at edu.ucsb.nceas.metacat.database.DBConnectionPool.<init>(DBConnectionPool.java:156) ~[metacat.jar:?]
	at edu.ucsb.nceas.metacat.database.DBConnectionPool.getInstance(DBConnectionPool.java:134) ~[metacat.jar:?]
	at edu.ucsb.nceas.metacat.startup.MetacatInitializer.initAfterMetacatConfig(MetacatInitializer.java:156) ~[metacat.jar:?]
	at edu.ucsb.nceas.metacat.startup.MetacatInitializer.contextInitialized(MetacatInitializer.java:103) [metacat.jar:?]

This was because pg_hba.conf didn't have the right permissions (expected db name to be metacat instead of esa, Fixed, and metacat can now connect

Metacat runs 2.19.0 -> 2.19.1 DB script, and the 2.19.1 -> 3.0.0 script successfully, but then failed on the 3.0.0 -> 3.1.0 script:

metacat 20250210-17:21:51: [ERROR]: initializeContainerisedDBConfiguration(): error getting
metacat version (3.1.0) or database version (2.19.0). Error was: DBAdmin.upgradeDatabase -
SQL error when running upgrade scripts: ERROR: relation "db_version_id_seq" does not exist
[edu.ucsb.nceas.metacat.startup.K8sAdminInitializer:initK8sDBConfig:109]

Discovered this is because ESA has the default value for db_version_id set to use the text value of the sequence (db_version_id_seq), instead of treating it as a reference:

esa=> \d db_version;
                                             Table "public.db_version"
    Column     |            Type             | Collation | Nullable |                   Default
---------------+-----------------------------+-----------+----------+----------------------------------------------
 db_version_id | bigint                      |           | not null | nextval('db_version_id_seq'::text::regclass)

(note the text in nextval('db_version_id_seq'::text::regclass). Therefore, when the sequence is renamed, this value remains unchanged. For comparison, compare the above with the same query in GOA:

evos=> \d db_version;
                                          Table "public.db_version"
    Column     |            Type             | Collation | Nullable |                Default
---------------+-----------------------------+-----------+----------+----------------------------------------
 db_version_id | bigint                      |           | not null | nextval('db_version_id_seq'::regclass)

(note nextval('db_version_id_seq'::regclass))

Fixed this by doing:

ALTER TABLE db_version 
ALTER COLUMN db_version_id 
SET DEFAULT nextval('db_version_id_seq'::regclass);

and then the conversions ran as expected

@artntek
Copy link
Contributor Author

artntek commented Feb 7, 2025

metacatui startup error - can't mount PVC due to permissions:

  Warning  FailedMount  27s    kubelet            MountVolume.MountDevice failed for volume "cephfs-metacatesa-metacatui-theme" : rpc error: code = Internal desc = an error (exit status 32) occurred while running mount args: [-t ceph 10.0.3.131:6789,10.0.3.132:6789,10.0.3.133:6789:/volumes/k8ssubvolgroup/k8ssubvol/58cda964-ce10-4ff9-8242-983da0fd0da3/repos/esa/metacatui /var/lib/kubelet/plugins/kubernetes.io/csi/pv/cephfs-metacatesa-metacatui-theme/globalmount -o name=pdg-subvol-user,secretfile=/tmp/csi/keys/keyfile-620537753,mds_namespace=cephfs,_netdev] stderr: mount error 13 = Permission denied
  Warning  FailedMount  19s    kubelet            Unable to attach or mount volumes: unmounted volumes=[metacatesa-mcui-custom-theme-files], unattached volumes=[metacatesa-mcui-source-files kube-api-access-w6fpj metacatesa-mcui-custom-theme-files metacatesa-mcui-config-js metacatesa-mcui-config-all]: timed out waiting for the condition

Solved. Incorrect rootPath: in pv definition, and then incorrect subPath

@artntek artntek added this to the 3.1.0-deployment milestone Feb 10, 2025
@artntek artntek self-assigned this Feb 10, 2025
@artntek
Copy link
Contributor Author

artntek commented Feb 11, 2025

Still need to look at and restore hashstore conversion errors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

1 participant