Add support for snapshots with zstd compression #1523

jugal-chauhan · 2025-05-16T23:13:57Z

Description

This PR brings in the No-Op implement for ignoring Bloom Filters and to skip the Elasticsearch816 Codec required due to certain default index settings in ES 8x. It also focuses on adding in any new Codec into our Lucene9 Readers to successfully read ZSTD compressed lucene segments.

Issues Resolved

MIGRATIONS-2536

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

peternied

Could we update this PR / rename it to be oriented towards the customer problem solved. Seems like this one would be Add support for snapshots with zstd compression?

Lets also see about adding test cases that exercise this behavior included as well. What do you think about adding a new index onto the existing EndToEnd test cases that enable this level of compression?

Signed-off-by: Jugal Chauhan <[email protected]>

AndreKurait · 2025-05-22T16:33:43Z

.../java/org/opensearch/migrations/bulkload/lucene/version_9/Elasticsearch816CodecFallback.java

@@ -20,6 +20,7 @@ public class Elasticsearch816CodecFallback extends Codec {

    public Elasticsearch816CodecFallback() {
        super("Elasticsearch816");
+        System.out.println(">>>>> Loading stub Elasticsearch816CodecFallback class");


Can we remove all these prints, we can have logger.debug if you want to retain

peternied

Did I miss where we've supported zstd, I don't see any libraries or updated compression configuration to our lucene readers?

peternied · 2025-05-22T16:38:15Z

...kerSolution/src/main/docker/migrationConsole/lib/integ_test/integ_test/default_operations.py

    def run_test_benchmarks(self, cluster: Cluster):
        run_test_benchmarks(cluster=cluster)
+
+    def disable_bloom(self, cluster: Cluster, index_name: str):


Aren't indices with bloom supported with the IgnoreBloomFilter, why would we need to disable them?

So that we do not carry-forward this setting on OS versions during metadata migrate.

Overall I would say the metadata process should not move unsupported features from ES8 -> OpenSearch. This seems like its the case for this bloom feature, no?

peternied · 2025-05-22T16:38:48Z

RFS/src/main/java/org/opensearch/migrations/bulkload/lucene/version_9/IgnorePsmPostings.java

@@ -28,6 +28,7 @@ public class IgnorePsmPostings extends PostingsFormat {

    public IgnorePsmPostings() {
        super("ES812Postings");
+        System.out.println(">>>>> Loading stub IgnorePsmPostings class");


Please remove all usage of System.out.println(..,), does it make sense to include debug level logging statements for these?

Yes, will remove the usage of system.out.println and will use log4j wherever I want to opt in for temporary logging.

peternied · 2025-05-22T16:40:22Z

...in/java/org/opensearch/migrations/bulkload/lucene/version_9/ForgivingStoredFieldsFormat.java

+            System.out.println(">>>>> Injecting missing mode BEST_SPEED into segment info");
+            si.putAttribute("Lucene90StoredFieldsFormat.mode", "BEST_SPEED");
+        }
+        return new Lucene90StoredFieldsFormat(Mode.BEST_SPEED).fieldsReader(directory, si, fn, context);


Why choose BEST_SPEED? Can we move this into a static, this 'why' is a good comment to include on the field

I do not plan on continuing with this file entirely.

and Hence it has not been referenced anywhere else.

peternied · 2025-05-22T16:41:22Z

...tion/src/main/docker/migrationConsole/lib/integ_test/integ_test/test_cases/backfill_tests.py

@@ -29,6 +29,28 @@
    "sonested": {"count": "1000"},
    "nyc_taxis": {"count": "1000"}
 }
+empty_indices_no_taxi = {


These do not appear to be referenced?

I will refine it, I included them only for testing purposes, to see a checkpoint if test passes without nyc_taxis. I agree many things are not clean in this PR, and will clean them one by one. Thank you for pointing them to my attention.

peternied · 2025-05-22T16:42:49Z

...kerSolution/src/main/docker/migrationConsole/lib/integ_test/integ_test/default_operations.py

+
+        logger.info(f"Successfully disabled bloom filter on index: {index_name}. Response: {response}")
+
+    def refresh(self, cluster: Cluster, index_name: str):


Thanks for adding this, please update check_doc_counts_match to use this refresh definition

peternied · 2025-05-22T16:43:34Z

...tion/src/main/docker/migrationConsole/lib/integ_test/integ_test/test_cases/backfill_tests.py

+class Test0007EndToEndTestForES8WithOSBenchmarks(MATestBase):
+    def __init__(self, console_config_path: str, console_link_env: Environment, unique_id: str):
+        allow_combinations = [
+            (ElasticsearchV8_X, OpensearchV2_X)


Can we avoid creating a new test and add this matrix value onto an existing test case?

Signed-off-by: Jugal Chauhan <[email protected]>

peternied · 2025-05-22T21:52:21Z

...main/java/org/opensearch/migrations/bulkload/lucene/version_9/ZstdStoredFields814Format.java

+    public StoredFieldsReader fieldsReader(Directory directory, SegmentInfo si, FieldInfos fn, IOContext context) throws IOException {
+        System.out.println(">>> Attempting to decode stored fields using ZstdStoredFields814Format for segment: " + si.name);
+        // TODO: Replace with real reader implementation
+        return new Lucene90CompressingStoredFieldsFormat(


I think that importing custom-codecs implementation would be a better path that rolling our own implementation. Looks like Lucene912CustomStoredFieldsFormat might be the file we are looking for link.

Here it is on maven, https://mvnrepository.com/artifact/org.opensearch.plugin/opensearch-custom-codecs

Since we are using lucene 9, make sure to use their 2.19.2 artifact

@jugal-chauhan Why aren't we importing the custom codecs?

…fieldsformat Signed-off-by: Jugal Chauhan <[email protected]>

Signed-off-by: Jugal Chauhan <[email protected]>

…s expected Signed-off-by: Jugal Chauhan <[email protected]>

Signed-off-by: Jugal Chauhan <[email protected]>

peternied · 2025-06-02T18:34:31Z

@jugal-chauhan I'm not sure what the endpoint of this pull request is - can you update the description and describe what you want to accomplish by merging these changes?

jugal-chauhan requested review from AndreKurait, chelma, gregschohn, lewijacn, mikaylathompson, peternied and sumobrian as code owners May 16, 2025 23:13

jugal-chauhan had a problem deploying to migrations-cicd May 16, 2025 23:14 — with GitHub Actions Failure

peternied reviewed May 19, 2025

View reviewed changes

jugal-chauhan had a problem deploying to migrations-cicd May 19, 2025 22:04 — with GitHub Actions Failure

jugal-chauhan changed the title ~~Register SPI stub class and add codec for zstd~~ Add support for snapshots with zstd compression May 20, 2025

jugal-chauhan added 5 commits May 21, 2025 21:23

Add logging in IgnorePsmPostings stub class

b1639e0

Signed-off-by: Jugal Chauhan <[email protected]>

Add stub class for codec IgnoreElasticsearch816Codec

7f86371

Signed-off-by: Jugal Chauhan <[email protected]>

Add mising mode in StoredFields

37c7b58

Signed-off-by: Jugal Chauhan <[email protected]>

Add ES8x backfill test case

f2ec56f

Signed-off-by: Jugal Chauhan <[email protected]>

Use single worker for testing

a3fb764

Signed-off-by: Jugal Chauhan <[email protected]>

jugal-chauhan force-pushed the codec-changes branch from 99b55d2 to a3fb764 Compare May 22, 2025 02:25

jugal-chauhan had a problem deploying to migrations-cicd May 22, 2025 02:25 — with GitHub Actions Failure

jugal-chauhan added 3 commits May 21, 2025 21:36

Test with logging in stub classes

eec12fb

Signed-off-by: Jugal Chauhan <[email protected]>

Test disable nyc taxis workload

9e58722

Signed-off-by: Jugal Chauhan <[email protected]>

Re add nyc taxis workload

eb7890a

Signed-off-by: Jugal Chauhan <[email protected]>

jugal-chauhan temporarily deployed to migrations-cicd May 22, 2025 03:19 — with GitHub Actions Inactive

AndreKurait reviewed May 22, 2025

View reviewed changes

peternied reviewed May 22, 2025

View reviewed changes

jugal-chauhan added 2 commits May 22, 2025 13:13

Brining in zstd storedfields implementation

11df94c

Signed-off-by: Jugal Chauhan <[email protected]>

Adding compressionMode to avoid Zstd frame requirement

9a660c6

Signed-off-by: Jugal Chauhan <[email protected]>

jugal-chauhan had a problem deploying to migrations-cicd May 22, 2025 20:33 — with GitHub Actions Failure

peternied reviewed May 22, 2025

View reviewed changes

Bring compressionMode for zstd, zstd no dict and copy lucene912stored…

2763aad

…fieldsformat Signed-off-by: Jugal Chauhan <[email protected]>

jugal-chauhan temporarily deployed to migrations-cicd May 22, 2025 22:58 — with GitHub Actions Inactive

jugal-chauhan had a problem deploying to migrations-cicd May 22, 2025 22:58 — with GitHub Actions Failure

Refactor and fix compilation errors

c4c4e4e

Signed-off-by: Jugal Chauhan <[email protected]>

jugal-chauhan temporarily deployed to migrations-cicd May 23, 2025 14:59 — with GitHub Actions Inactive

jugal-chauhan added 3 commits May 23, 2025 13:14

Rename actual codec expected

8ac3323

Signed-off-by: Jugal Chauhan <[email protected]>

use zstd-no-dict as default

2af343b

Signed-off-by: Jugal Chauhan <[email protected]>

Temporary logging to debug corrupted blocks in lucene segments

63371ff

Signed-off-by: Jugal Chauhan <[email protected]>

jugal-chauhan temporarily deployed to migrations-cicd May 28, 2025 21:59 — with GitHub Actions Inactive

jugal-chauhan had a problem deploying to migrations-cicd May 29, 2025 19:39 — with GitHub Actions Failure

jugal-chauhan temporarily deployed to migrations-cicd May 29, 2025 19:39 — with GitHub Actions Inactive

Adding a unit test for OS2x to OS2x migration

3b728c7

Signed-off-by: Jugal Chauhan <[email protected]>

jugal-chauhan force-pushed the codec-changes branch from c44b2f7 to 3b728c7 Compare May 29, 2025 19:57

jugal-chauhan temporarily deployed to migrations-cicd May 29, 2025 19:57 — with GitHub Actions Inactive

jugal-chauhan had a problem deploying to migrations-cicd May 29, 2025 19:57 — with GitHub Actions Failure

jugal-chauhan added 2 commits May 29, 2025 16:45

Attempt to deploy source and target os2x

e7ce639

Signed-off-by: Jugal Chauhan <[email protected]>

Add new test for OS2 to OS2 backfill using compression

b2edf7f

Signed-off-by: Jugal Chauhan <[email protected]>

jugal-chauhan had a problem deploying to migrations-cicd May 29, 2025 22:20 — with GitHub Actions Failure

Adding temporary backfilll test, customcodec dependency, zstd912 clas…

f7ce6d8

…s expected Signed-off-by: Jugal Chauhan <[email protected]>

jugal-chauhan had a problem deploying to migrations-cicd June 2, 2025 16:12 — with GitHub Actions Failure

Remove unwanted codec and logging

04e74f6

Signed-off-by: Jugal Chauhan <[email protected]>

jugal-chauhan had a problem deploying to migrations-cicd June 2, 2025 16:16 — with GitHub Actions Failure


		logger.info(f"Successfully disabled bloom filter on index: {index_name}. Response: {response}")

		def refresh(self, cluster: Cluster, index_name: str):

Add support for snapshots with zstd compression #1523

Are you sure you want to change the base?

Add support for snapshots with zstd compression #1523

Uh oh!

Conversation

jugal-chauhan commented May 16, 2025

Description

Issues Resolved

Uh oh!

peternied left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peternied left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peternied commented Jun 2, 2025

Uh oh!

Uh oh!

peternied left a comment •

edited

Loading