Skip to content

CNDB-13192: Bump the default cassandra.sai.latest.version from DC to EC so we support BM25 by default. #1737

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ekaterinadimitrova2
Copy link

@ekaterinadimitrova2 ekaterinadimitrova2 commented May 15, 2025

What is the issue

...
Bump the default cassandra.sai.latest.version from DC to EC so we support BM25 by default.
Update jvector_version to 4.

What does this PR fix and why was it fixed

...
Bump the default cassandra.sai.latest.version from DC to EC so we support BM25 by default.
Update jvector_version to 4.

Copy link

Checklist before you submit for review

  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the DataStax copyright header instead of the Apache License one

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR bumps the default SAI version configuration to support BM25 by default, fixes Version.LATEST to point to ED, and improves test coverage by moving and adding tests.

  • Update default SAI version property in CassandraRelevantProperties.java.
  • Change Version.LATEST from EC to ED in Version.java.
  • Update and refactor tests across BM25Test, FeaturesVersionSupportTest, SAITester, and add FeaturesVersionSupportEDTest.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/unit/org/apache/cassandra/index/sai/cql/FeaturesVersionSupportTest.java Added new testIndexMetaForNumRows and supporting helper methods for SAI index metadata validation.
test/unit/org/apache/cassandra/index/sai/cql/BM25Test.java Removed duplicate SAI index metadata test code.
test/unit/org/apache/cassandra/index/sai/SAITester.java Moved common collection data utilities to SAITester.
test/distributed/org/apache/cassandra/distributed/test/sai/features/FeaturesVersionSupportEDTest.java Added distributed test for ED version support.
src/java/org/apache/cassandra/index/sai/disk/format/Version.java Updated Version.LATEST to point to ED.
src/java/org/apache/cassandra/config/CassandraRelevantProperties.java Updated default property value for SAI version from "dc" to "ec".

for (var memtable : getCurrentColumnFamilyStore().getAllMemtables())
{
MemtableIndex memIndex = getIndexContext(indexName).getLiveMemtables().get(memtable);
assert memIndex instanceof TrieMemtableIndex;
Copy link
Preview

Copilot AI May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider replacing the use of the Java assert statement with an explicit JUnit assertion (such as assertTrue) to ensure the check is enforced even when JVM assertions are disabled.

Suggested change
assert memIndex instanceof TrieMemtableIndex;
Assert.assertTrue(memIndex instanceof TrieMemtableIndex);

Copilot uses AI. Check for mistakes.

Comment on lines 300 to 302
rowCount = Arrays.stream(((TrieMemtableIndex) memIndex).getRangeIndexes())
.map(index -> ((TrieMemoryIndex) index).getDocLengths().size())
.mapToInt(Integer::intValue).sum();
Copy link
Preview

Copilot AI May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If multiple memtables exist, the current implementation overwrites the rowCount per iteration instead of accumulating values. Consider summing the row counts from all memtables to correctly reflect the total number of rows.

Suggested change
rowCount = Arrays.stream(((TrieMemtableIndex) memIndex).getRangeIndexes())
.map(index -> ((TrieMemoryIndex) index).getDocLengths().size())
.mapToInt(Integer::intValue).sum();
rowCount += Arrays.stream(((TrieMemtableIndex) memIndex).getRangeIndexes())
.map(index -> ((TrieMemoryIndex) index).getDocLengths().size())
.mapToInt(Integer::intValue).sum();

Copilot uses AI. Check for mistakes.

@ekaterinadimitrova2 ekaterinadimitrova2 requested review from k-rus and adelapena and removed request for k-rus and adelapena May 15, 2025 20:25
@ekaterinadimitrova2 ekaterinadimitrova2 marked this pull request as draft May 15, 2025 20:27
@ekaterinadimitrova2
Copy link
Author

The switch to testing on the latest version ED led to two tests failing:

  • testFallbackToAnotherIndex - I haven't had the chance to look in detail at why, I will take a look tomorrow.
  • testSegmentsMetadata - some output needs to be changed to account for the new metadata. I will take care tomorrow.

This one is not a regression, as marked by Butler, there is also a ticket for it - testKDTreePostingsQueryMetricsWithSingleIndex - CNDB-13830

The others seem to be failures with previous versions, where we did not touch anything, so they seem unrelated.
testUnsupportedOperator {p0=db} and testTermsIteration {p0=dc}

@ekaterinadimitrova2 ekaterinadimitrova2 changed the title CNDB-13192: Bump the default cassandra.sai.latest.version from DC to EC so we support BM25 by default. Fix Version.LATEST to point to recently added ED and fix its test coverage. CNDB-13192: Bump the default cassandra.sai.latest.version from DC to EC so we support BM25 by default. May 19, 2025
@ekaterinadimitrova2
Copy link
Author

Updated the PR with only the default version update as per the original intention.
I will take care of the ED testing and fixing some bugs that were discovered in a separate ticket.

@ekaterinadimitrova2 ekaterinadimitrova2 marked this pull request as ready for review May 19, 2025 14:34
@ekaterinadimitrova2 ekaterinadimitrova2 marked this pull request as draft May 19, 2025 14:37
@ekaterinadimitrova2 ekaterinadimitrova2 marked this pull request as ready for review June 4, 2025 19:22
@ekaterinadimitrova2
Copy link
Author

We fixed 3 issues related to EB+ versions that were caught by CI testing in other tickets.
So I rebased the branch and also updated jvector_version to 4.
Let's see what the new CI run will show.

Copy link

sonarqubecloud bot commented Jun 4, 2025

@cassci-bot
Copy link

❌ Build ds-cassandra-pr-gate/PR-1737 rejected by Butler


4 new test failure(s) in 1 builds
See build details here


Found 4 new test failures

Test Explanation Branch history Upstream history
...dexWhileQueryingTest.testFallbackToAnotherIndex regression 🔴 🔵🔵🔵🔵🔵🔵🔵
...herTest.testEqQueriesAgainstStringIndex {p0=ec} regression 🔴 🔵🔵🔵🔵🔵🔵🔵
...testTermQueriesAgainstShortPostingLists {p0=ed} regression 🔴 🔵🔵🔵🔵🔵🔵🔵
o.a.c.u.b.BinLogTest.testTruncationReleasesLogS... regression 🔴 🔵🔵🔵🔵🔵🔵🔵

Found 2 known test failures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants