Skip to content

Conversation

dylanwong250
Copy link
Contributor

@dylanwong250 dylanwong250 commented Sep 2, 2025

What changes were proposed in this pull request?

This is the last PR in the series (#52047, #52148, #52202) for adding State Data Source support with checkpoint v2.

The key enhancement is the ability to snapshotStartBatchId and related options when the state uses checkpoint v2.

NOTE: To read checkpoint v2 state data sources it is required to have "spark.sql.streaming.stateStore.checkpointFormatVersion" -> 2. It is possible to allow reading state data sources arbitrarily based on what is in the CommitLog by relaxing assertion checks but this is left as a future change.

Why are the changes needed?

State checkpoint v2 ("spark.sql.streaming.stateStore.checkpointFormatVersion") introduces a new format for storing state metadata that includes unique identifiers in the file path for each state store. The existing StateDataSource implementation only worked with checkpoint v1 format, making it incompatible with streaming queries using the newer checkpoint format. Only batchId was implemented in #52047 and only readChangeFeed was implemented in #52148.

Does this PR introduce any user-facing change?

Yes.

State Data Source will work when checkpoint v2 is used and the snapshotStartBatchId and related options are used.

How was this patch tested?

In the previous PRs test suites were added to parameterize the current tests with checkpoint v2. All of these tests are now added back. All tests that previously intentionally tested some feature of the State Data Source Reader with checkpoint v1 should now be parameterized with checkpoint v2 (including python tests).

RocksDBWithCheckpointV2StateDataSourceReaderSnapshotSuite is added which uses the golden file approach similar to #46944 where snapshotStartBatchId is first added.

Was this patch authored or co-authored using generative AI tooling?

No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant