Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RCORE-2064 eq neq string compression #7575

Closed
wants to merge 2 commits into from

Conversation

nicola-cab
Copy link
Member

What, How & Why?

String interning introduces a new compressor for strings, which basically implements a function that maps the string to some unique integer.
For compressed strings and in particular for EQ and NEQ, we can leverage this in order to quickly compare integers against integers.
In all the other cases, we need to decompress the string.

TODO:

  1. Perf the searching function
  2. Implement NEQ

Copy link
Contributor

@finnschiermer finnschiermer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We need to also handle the situation where a string has been interned, but not all leaves have been compressed with the interned value.

  2. I think there is something wrong with the insensitive search here. It should be able to handle mixtures of upper/lower case.


size_t _find_first_local_interned_string(const StringData& data, size_t start, size_t end) const
{
REALM_ASSERT_DEBUG(m_leaf->is_interned());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not hold.

Even though a string has been interned, we may have leafs with strings which are not interned (yet).

Consequently for every leaf a check has to be made and
A. if the leaf has been interned, we can compare integers
B. if the leaf has not been interned, we have to compare the strings, even though it is slower.

tv = q.find_all();
CHECK_EQUAL(tv.size(), 1);

q = table->where().not_equal(col_str, StringData("TEST"), true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When testing case insensitive search, we should test with a mix of upper/lower case. I think this will reveal a bug in the implementation in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants