diff --git a/RELEASE-NOTES.md b/RELEASE-NOTES.md index d36ebb921..faa8c1c73 100755 --- a/RELEASE-NOTES.md +++ b/RELEASE-NOTES.md @@ -10,7 +10,7 @@ Release date: 2024-04-30 This major release introduces breaking changes: - A valid admin (auth) token is required for a Metacat 3.0.0 installation to function correctly (i.e. to handle private datasets). - - Please [contact DataONE](https://www.dataone.org/contact/) to obtain a long-term token (valid for 1 year) + - Please [contact DataONE](https://www.dataone.org/contact/) to obtain a long-term token (valid for 1 year) - If you are already part of the DataONE network and have a member node, we will issue you a token linked to your DataONE Node identity. - If you're not a DataONE member node, we [encourage you to join](https://www.dataone.org/jointhenetwork/) (it's free!) so that your data can partake in DataONE's goal of the preservation of scientific data for future use. - Otherwise, we can issue a token linked to your Metacat administrator's ORCID iD. @@ -30,6 +30,12 @@ This major release introduces breaking changes: - An ORCID iD is now required in order to log in as a Metacat administrator. - Please [sign up for an ORCID iD](https://orcid.org/register) if you do not already have one. +Also in this release: +- You can now deploy Metacat on Kubernetes, using a Helm chart. Note this is a beta feature. It has + been tested, and we believe it to be working well, but it has not yet been used in production - so + we recommend caution with this early release. If you try it, [we'd love to hear your + feedback](https://www.dataone.org/contact/)! + ### Upgrade Notes (2.19.0 to 3.0.0): - Starting Requirements: @@ -56,7 +62,7 @@ This major release introduces breaking changes: # where $TOKEN is an environment variable containing your indexer token (see overview of major changes above). # example: curl -X PUT -H "Authorization: Bearer $TOKEN" https://knb.ecoinformatics.org/knb/d1/mn/v2/index?all=true - ``` + ``` - Ensure that `/etc/default/solr.in.sh` is group writable - ex. `sudo chmod g+w /etc/default/solr.in.sh` - In `solr.in.sh`, be sure to delete the old solr home add a new solr path: @@ -72,7 +78,7 @@ This major release introduces breaking changes: - You are now ready to install Metacat 3.0.0 - Additional notes: - `metacat.properties` no longer contains custom settings, and should not be edited. - - Please first re-configure Metacat through the Metacat Admin UI after upgrading. + - Please first re-configure Metacat through the Metacat Admin UI after upgrading. - If you have custom properties that are not available for configuration in the Metacat Admin UI, these can be added to `metacat-site.properties`. - The database upgrade process may require several minutes or longer to complete. @@ -84,7 +90,7 @@ This major release introduces breaking changes: - LDAP and Password-based login is no longer supported - Metacat admin users now have member node admin privileges [I-1816](https://github.com/NCEAS/metacat/issues/1816) - Storage and Indexing Enhancements [PR-1695](https://github.com/NCEAS/metacat/pull/1695) - - Revised use of port numbers to determine https vs. http [I-1697](https://github.com/NCEAS/metacat/issues/1697) + - Revised use of port numbers to determine https vs. http [I-1697](https://github.com/NCEAS/metacat/issues/1697) - Re-implemented mechanism to handle index tasks that failed to be put into RabbitMQ [I-1603](https://github.com/NCEAS/metacat/issues/1603) - Increased indexing speeds by replacing hazelcast with dataone-indexer - Removed hazelcast dependency and added new SystemMetadataManager class to handle system metadata @@ -92,7 +98,7 @@ This major release introduces breaking changes: - Moved reindexing action from old MetacatAPI to new DataONE MN.admin API [PR-1738](https://github.com/NCEAS/metacat/pull/1738) - Upgrade from Java 8 to 17 [I-1481](https://github.com/NCEAS/metacat/issues/1481), [PR-1731](https://github.com/NCEAS/metacat/pull/1731), [PR-1735](https://github.com/NCEAS/metacat/pull/1735) - Disabled old MetacatAPI and significant code clean-up (removed obsolete legacy code) [PR-1713](https://github.com/NCEAS/metacat/pull/1713), [PR-1725](https://github.com/NCEAS/metacat/pull/1725), [PR-1726](https://github.com/NCEAS/metacat/pull/1726), [PR-1744](https://github.com/NCEAS/metacat/pull/1744) - - Removed obsolete skin configurations [PR-1767](https://github.com/NCEAS/metacat/pull/1767) + - Removed obsolete skin configurations [PR-1767](https://github.com/NCEAS/metacat/pull/1767) - MNAdmin API Enhancements - Added 'reindex' and 'reindexall' methods [I-1716](https://github.com/NCEAS/metacat/issues/1716) - Added 'updateIdMetadata' method [I-1766](https://github.com/NCEAS/metacat/issues/1766) @@ -167,11 +173,11 @@ New features and bugs fixed in this release: * Deleting objects failed to remove solr doc * Sampling citation not showing up in view service * Mis-Formatting of Data Package Contents -* Unhelpful error message when trying to create as a denied submitter +* Unhelpful error message when trying to create as a denied submitter * Data objects missing after a package was published * Multiple updates on a single DOI happen when users use the metacat admin page to update DOIs * Metacat updated the DOI metadata (datacite) when the system metadata of an obsoleted object was updated if the obsolescent chain has a DOI Sid -* getPackage fails to include system metadata +* getPackage fails to include system metadata * OSTI DOI Plugin Notifications need more information ## Release Notes for 2.18.0 @@ -211,7 +217,7 @@ New features and bugs fixed in this release: * Refactor the DOI service to use the plug-in architecture * CN subjects cannot query private objects * Users with the write permission cannot update system metadata -* Metacat should return a not-found error rather than the internal error when there is a typo in the old Metacat API url +* Metacat should return a not-found error rather than the internal error when there is a typo in the old Metacat API url ## Release Notes for 2.15.1 New features and bugs fixed in this release: @@ -223,7 +229,7 @@ New features and bugs fixed in this release: * Expand elements covered by EML's attribute index fields beyond just dataTable * EML to HTML/PDF is broken in the getPackage() method * Have MNCore.getCapabilities() report on auth.allowSubmitters parameter setting -* GetPackage API doesn't work from R on Windows +* GetPackage API doesn't work from R on Windows * Add new indexes for column archvied and object_format in the systemmetadata table ## Release Notes for 2.15.0 @@ -234,21 +240,21 @@ New features and bugs fixed in this release: * Fix the bug of incorrect geohash * Fix the bug of incorrect collectionQuery * Change the default order to dateModified for the listObject method -* Remove extra logged event in the MN.update method +* Remove extra logged event in the MN.update method * Upgrade some library jar files to fix security threats ## Release Notes for 2.14.1 New features and bugs fixed in this release: * Support new XML schemas for collections-1.1.0 and portals-1.1.0 -* Metacat is creating too many Timer objects which leads to out of memory issues +* Metacat is creating too many Timer objects which leads to out of memory issues and excessive numbers of threads under high request loads -* Users with only write permission can change the access policy on update() requests, +* Users with only write permission can change the access policy on update() requests, when this operation is reserved only for users with changePermission permission * Close OutputStream objects after fulfilling DataONE API requests * Fix how exclude filters are translated in the collectionQuery * Update Apache setup docs to match current practices * Update documentation build to Python3 -* Fix an issue where a client editor sees a "Nothing was found" error despite +* Fix an issue where a client editor sees a "Nothing was found" error despite having all permissions * Project abstract displays oddly * View service rendering EML project abstract incorrectly @@ -280,7 +286,7 @@ New features and bugs fixed in this release: ## Release Notes for 2.12.2 Bugs fixed in this release: -* Modify the schema files of the format ids of portal and collections +* Modify the schema files of the format ids of portal and collections ## Release Notes for 2.12.1: Bugs fixed in this release: @@ -337,7 +343,7 @@ New features and bugs fixed in this release: After installing this release, you need to issue the "reindexall" command since a new SOLR field has been added. New features and bugs fixed in this release: * Exclude EcoGrid on Metacat -* Do not allow restrictive access control change to content with a DOI +* Do not allow restrictive access control change to content with a DOI * MN/CN.updateSystemMetadata doesn't check the field - authoritativeMemberNode * Integrate the fixed SeriesIdResolver class which gets SystemMetadata locally * EZID metadata registration doesn't seem to work with SIDs @@ -455,7 +461,7 @@ New features and bugs fixed in this release: * Metacat creates an Invalid Content-Disposition value for some filenames * External links in the registry should open in new tab * Remove the support of Oracle on documentation -* Changing a Metacat member node's synchronization value on the d1 admin page doesn't work +* Changing a Metacat member node's synchronization value on the d1 admin page doesn't work * Add the feature to support the noNamespaceSchemaLocation attribute in xml objects * Provide clear messages to clients if the namespaces/formatids of the schemas of xml objects are not registered in Metacat * Disable the feature of downloading external schemas for unregistered namespaces @@ -468,7 +474,7 @@ New features and bugs fixed in this release: ## Release Notes for 2.7.0: * Use different format ids to identity variants of the schema with same namespace -* Add EML 2.1.1 to Darwin Core supporting for OAI-PMH provider +* Add EML 2.1.1 to Darwin Core supporting for OAI-PMH provider * Bugs fixed include: - Series head resolution should use obsoletes field as part of determination(7020) - The InputStream (parameter) in the CN/MN.create and MN.update method is not closed(7005) @@ -667,7 +673,7 @@ Concern workflow functionality (TPC). The following issues were addressed: * Metacat should run against Tomcat 6 (Bug 4716) ## Release Notes for 1.9.1: -The 1.9.1 release holds the bug fixes found after releasing 1.9.0 beta. +The 1.9.1 release holds the bug fixes found after releasing 1.9.0 beta. These bugs were primarily replication issues. There is no difference in functionality between 1.9.0 and 1.9.1 @@ -683,7 +689,7 @@ were made to the code: - Database schema version detection and install/upgrade utilities were added to the application. Also, this release includes several enhancements: -- it supports the new EML 2.1.0 version. +- it supports the new EML 2.1.0 version. - Documents are now stored on the local filesystem as well as in the database in order to preserve document integrity. - Metacat verifies new schemas when they are added. - Additional access is propegated with documents during replication. @@ -865,7 +871,7 @@ New Features: * Added a new skin for Ecological Society of America. * Created an Advanced search servlet which can be used from the web. * Various connections have been modified to be secure. e.g. connection between -ldaps is made secure now, replication is done over secure channels. +ldaps is made secure now, replication is done over secure channels. Performance: * Reduced size of xml_nodes by creating a new table for holding nodes from @@ -1083,4 +1089,3 @@ Fixes in 1.3.0: were failed in replication. 4) Decrease the time to create access rules during insert or update a package. - diff --git a/helm/README.md b/helm/README.md index 785c9ae33..43416bf96 100644 --- a/helm/README.md +++ b/helm/README.md @@ -4,8 +4,50 @@ Metacat is repository software for preserving data and metadata (documentation a helps scientists find, understand and effectively use data sets they manage or that have been created by others. For more details, see https://github.com/NCEAS/metacat -> **Warning**: this deployment does not currently work on Apple Silicon machines (e.g. in Rancher -> Desktop), because at least one of the dependencies (RabbitMQ) doesn't work in that environment. +> ### Before You Start: +> 1. **This Metacat Helm chart is a beta feature**. It has been tested, and we believe it to be +> working well, but it has not yet been used in production - so we recommend caution with this +> early release. If you try it, [we'd love to hear your +> feedback](https://www.dataone.org/contact/)! +> +> +> 2. If you are considering **migrating an existing Metacat installation to Kubernetes**, see +> [Appendix 5](#appendix-5-migrating-to-kubernetes-from-an-existing-metacat-219-installation) +> for important information +> +> +> 3. For non-public dataset support, see: [Setting up a Token and Optional CA certificate for +> Indexer Access](#setting-up-a-token-and-optional-ca-certificate-for-indexer-access) +> +> +> 4. This deployment does not currently work on Apple Silicon machines (e.g. in Rancher Desktop), +> because the official Docker image for at least one of the dependencies (RabbitMQ) doesn't yet +> work in that environment. + +--- + +- [Metacat Helm Chart](#metacat-helm-chart) + * [TL;DR](#tldr) + * [Introduction](#introduction) + * [Prerequisites](#prerequisites) + * [Installing the Chart](#installing-the-chart) + * [Uninstalling the Chart](#uninstalling-the-chart) + * [Parameters](#parameters) + * [Configuration and installation details](#configuration-and-installation-details) + + [Metacat Application-Specific Properties](#metacat-application-specific-properties-1) + + [Secrets](#secrets) + * [Persistence](#persistence) + * [Networking, Certificates, and Auth Tokens](#networking-certificates-and-auth-tokens) + * [Setting up a Token and Optional CA certificate for Indexer Access](#setting-up-a-token-and-optional-ca-certificate-for-indexer-access) + * [Setting up a TLS Certificate for HTTPS Traffic](#setting-up-a-tls-certificate-for-https-traffic) + * [Setting up Certificates for DataONE Replication](#setting-up-certificates-for-dataone-replication) + * [Appendix 1: Self-Signing TLS Certificates for HTTPS Traffic](#appendix-1-self-signing-tls-certificates-for-https-traffic) + * [Appendix 2: Self-Signing Certificates for Testing Mutual Authentication](#appendix-2-self-signing-certificates-for-testing-mutual-authentication) + * [Appendix 3: Troubleshooting Mutual Authentication](#appendix-3-troubleshooting-mutual-authentication) + * [Appendix 4: Debugging and Logging](#appendix-4-debugging-and-logging) + * [Appendix 5: Migrating to Kubernetes from an Existing Metacat 2.19 Installation](#appendix-5-migrating-to-kubernetes-from-an-existing-metacat-219-installation) + +--- ## TL;DR Starting in the root directory of the `metacat` repo: @@ -14,24 +56,30 @@ Starting in the root directory of the `metacat` repo: contents of the values overlay files (like [./values-dev-cluster.yaml](./values-dev-cluster.yaml) , for example), to see which settings typically need to be changed. + 2. Add your credentials to [./admin/secrets.yaml](./admin/secrets.yaml), and add to cluster: ```shell $ vim helm/admin/secrets.yaml ## follow the instructions in this file ``` -3. Deploy and enjoy! +3. Deploy + + (*Note: Your k8s service account must have the necessary permissions to get information about the + resource `roles` in the API group `rbac.authorization.k8s.io`*). ```shell $ ./helm-install.sh myreleasename mynamespace ./helm ``` -You should then be able to access the application via http://your-host-name/metacat! +To access Metacat, you'll need to create a mapping between your ingress IP address (found by: +`kubectl describe ingress | grep "Address:"`) and your metacat hostname. Do this either by adding a +permanent DNS record for everyone to use, or by adding a line to the `/etc/hosts` file on your +local machine, providing temporary local access for your own testing. You should then be able to +access the application via http://your-host-name/metacat. -> ### Note: -> If you are considering **migrating an existing Metacat installation to Kubernetes**, see -> [Appendix 5](#appendix-5-migrating-to-kubernetes-from-an-existing-metacat-219-installation) -> for important information +Read on for more in-depth information about the various installation and configuration options that +are available... ## Introduction @@ -1025,6 +1073,67 @@ https://knb.ecoinformatics.org, to run on our development Kubernetes cluster, he the README section [Setting up a Token and Optional CA certificate for Indexer Access](#setting-up-a-token-and-optional-ca-certificate-for-indexer-access). +11. Finally, you can now re-index all your datasets, so they will show up in Metacat search: + + > **Caution:** If you deploy large numbers of index workers, they can overwhelm Metacat with API + requests when doing a large re-index. This can lead to errors and indexing failures. A future + release will fix this, but in the meantime, we recommend starting with a low number of + indexers (3 - 5), and finding the optimal number for your own installation. + + 1. (*Beta workaround*) After deploying Metacat, but before starting the re-index, check that + all the deployed indexer pods have started up cleanly. This can only be determined by + inspecting the logs for each indexer pod (e.g. + `kubectl logs -f -l app.kubernetes.io/name=d1index`), to ensure there are no exceptions. + If any indexers did not start correctly, use `kubectl delete pod ` to delete + them, and k8s will then recreate them. + + 2. Re-indexing can take anywhere from seconds to hours or even days, depending on how much + data you have, and how many index workers you choose to deploy. You can override the + number of index workers in the dataone-indexer sub-chart by adding the following to your + metacat values.yaml: + + ```yaml + dataone-indexer: + # increase minReplicas from default 3 + autoscaling: + minReplicas: 5 + # set max to the same value, so we don't + # overwhelm Metacat (see "Caution" note, above): + maxReplicas: 5 + ``` + + 3. When you are ready to reindex, issue the following command (`$TOKEN` should contain your + administrator auth token -- [see this + section](#setting-up-a-token-and-optional-ca-certificate-for-indexer-access)). Replace + `myHostName.org` and `myContext` with your own: + + ```shell + $ curl -X PUT -H "Authorization: Bearer $TOKEN" \ + "https://myHostName.org/myContext/d1/mn/v2/index?all=true + + # expected output: + # + # true + ``` + + 4. You can monitor indexing progress via the RabbitMQ dashboard. Enable port forwarding: + + ```shell + $ kubectl port-forward service/-rabbitmq-headless 15672:15672 + ``` + + ...and then point your browser at http://localhost:15672, and log in with the username + `metacat-rmq-guest` and the RabbitMQ password you set in your metacat Secrets, or obtain by: + + ```shell + secret_name=$(kubectl get secrets | egrep ".*\-metacat-secrets" | awk '{print $1}') + rmq_pwd=$(kubectl get secret "$secret_name" \ + -o jsonpath="{.data.rabbitmq-password}" | base64 -d) + echo "rmq_pwd: $rmq_pwd" + ``` + +--- + > ### Tips: > > 1. If you need to change the database user's password for your existing database, `kubectl exec`