Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RCORE-2094 Compressing Integer Arrays #7668

Open
wants to merge 420 commits into
base: next-major
Choose a base branch
from
Open

Conversation

nicola-cab
Copy link
Member

@nicola-cab nicola-cab commented May 1, 2024

What, How & Why?

Main Logic for compressing integers. it includes:

  1. https://jira.mongodb.org/browse/RCORE-2051 ==> Packed Format.
  2. https://jira.mongodb.org/browse/RCORE-2052 ==> Flex Format
  3. https://jira.mongodb.org/browse/RCORE-2053 ==> Adding the integer compressor class for compressing in Packed or Flex
  4. https://jira.mongodb.org/browse/RCORE-2054 ==> Main Logic for compressing and decompressing Arrays
  5. https://jira.mongodb.org/browse/RCORE-2056 ==> Some query optimizations and faster decompression of Arrays

nicola-cab and others added 30 commits December 21, 2023 12:24
Copy link

coveralls-official bot commented May 1, 2024

Pull Request Test Coverage Report for Build nicola.cabiddu_1715

Details

  • 1479 of 1721 (85.94%) changed or added relevant lines in 35 files are covered.
  • 103 unchanged lines in 24 files lost coverage.
  • Overall coverage increased (+0.002%) to 90.611%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/realm/array_mixed.cpp 1 3 33.33%
src/realm/array_unsigned.cpp 11 13 84.62%
src/realm/integer_flex_compressor.cpp 31 40 77.5%
src/realm/integer_packed_compressor.cpp 18 27 66.67%
src/realm/query_engine.hpp 3 12 25.0%
src/realm/array.cpp 110 125 88.0%
src/realm/integer_packed_compressor.hpp 106 127 83.46%
src/realm/integer_compressor.hpp 78 106 73.58%
src/realm/array_aggregate_optimizations.cpp 141 176 80.11%
src/realm/integer_compressor.cpp 188 224 83.93%
Files with Coverage Reduction New Missed Lines %
src/realm/array.hpp 1 95.73%
src/realm/array_unsigned.cpp 1 85.5%
src/realm/sync/noinst/server/server_history.cpp 1 63.7%
src/realm/util/compression.cpp 1 89.79%
src/realm/util/file.cpp 1 78.84%
test/test_table.cpp 1 99.52%
src/realm/array_with_find.hpp 2 97.92%
src/realm/object-store/shared_realm.cpp 2 91.89%
src/realm/sync/network/http.hpp 2 82.27%
src/realm/sync/transform.cpp 2 60.99%
Totals Coverage Status
Change from base Build nicola.cabiddu_1713: 0.002%
Covered Lines: 216702
Relevant Lines: 239157

💛 - Coveralls

Base automatically changed from nc/rcore-2057 to next-major May 21, 2024 14:55
@nicola-cab nicola-cab force-pushed the nc/RCORE-2094 branch 2 times, most recently from 4004852 to 2b9969d Compare May 21, 2024 16:02
Copy link
Contributor

@jedelbo jedelbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments

}
}
start += sizeof(int64_t) * 8 / no0(w) * chunks;
if (!is_compressed() && m_integer_compressor.get_encoding() == NodeHeader::Encoding::WTypBits) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the first condition is redundant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, It is too defensive.. You are right…

for (size_t i = 0; i < sz; ++i) {
auto item = arr.get(i);
values.push_back(item);
REALM_ASSERT_DEBUG(values.back() == item);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps an assertion too much. Are you in doubt that push_back works?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep too conservative, originally I was in doubt whether our integer compression algo worked :-) , but there is also an assertion (only in debug mode) that checks the whole array after it gets committed, so I can remove this.

REALM_ASSERT_DEBUG(values[indices[i]] == arr.get(i));
}

#if REALM_DEBUG
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not tested above?

auto last = std::unique(values.begin(), values.end());
values.erase(last, values.end());

for (size_t i = 0; i < arr.size(); ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use sz

@@ -13,6 +13,7 @@ set(REALM_SOURCES
array_blobs_big.cpp
array_decimal128.cpp
array_fixed_bytes.cpp
array_aggregate_optimizations.cpp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The files should be added to Package.swift file too.

@@ -190,38 +189,108 @@ using namespace realm::util;

void QueryStateBase::dyncast() {}

size_t Array::bit_width(int64_t v)
Array::Array(Allocator& allocator) noexcept
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not inlined anymore?

const auto mask = c.v_mask();
BfIterator ndx_iterator{data, offset, ndx_w, ndx_w, start};
BfIterator data_iterator{data, 0, v_w, v_w, static_cast<size_t>(*ndx_iterator)};
while (start < end) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use the same approach as in the parallel find, where we first search the values and then search for the relevant indices afterwards.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm, interesting.. I am not sure we are going to become faster or slower.. I'll try..

@@ -176,19 +176,21 @@ class Array : public Node, public ArrayParent {
template <size_t w>
void set(size_t ndx, int64_t value);

int64_t get(size_t ndx) const noexcept;
inline int64_t get(size_t ndx) const noexcept;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sene to add inline to a function definition? It happens a lot in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, because the getter is defined below, it is a left over, I'll delete it ..

Copy link
Contributor

@jedelbo jedelbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am missing more basic testing of the compressed arrays, Group_ArrayCompression_Correctness is hardly enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants