Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DYOD] Add variable string segment #2593

Open
wants to merge 82 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
c27b42f
Add String Encoding Benchmark Script
ClFeSc Jun 21, 2023
439a28b
Remove leftover files before running benchmarks
phkeese Jun 21, 2023
9f310b5
Fix incorrect timeout value
ClFeSc Jun 21, 2023
1dc9f76
Apply feedback by @23mafi regarding code duplication
ClFeSc Jun 21, 2023
5c73197
Add verbosity levels
ClFeSc Jun 21, 2023
2bb38ba
Use longer timeout for multithreading
ClFeSc Jun 22, 2023
474d615
Add metrics to script and improve output
ClFeSc Jun 26, 2023
6883969
Add Segment
ClFeSc Jun 27, 2023
b1d3bc8
fix build errors and started fixing tests (WIP)
23mafi Jun 28, 2023
65d37b1
WIP: Write test for functionality of segment alone to decouple segmen…
phkeese Jul 4, 2023
a655ead
Hide offsets, instead surface only value ids in variable string segment
ClFeSc Jul 4, 2023
bc86009
WIP: Some work in the script
ClFeSc Jul 4, 2023
c17acef
Add skip benchmarks option
ClFeSc Jul 5, 2023
2aba648
WIP: Fix some issues
ClFeSc Jul 5, 2023
508c431
WIP: Try to fix tests
ClFeSc Jul 5, 2023
aa65277
New Fix: No Compiler Error! New Feature: The Linker fails!
ClFeSc Jul 5, 2023
b078280
Instantiate VariableStringDictionary explicitly.
phkeese Jul 6, 2023
2fc3f29
Construct entire strings and pass many tests :D
phkeese Jul 6, 2023
25ef70f
Comment-in tests
ClFeSc Jul 6, 2023
1d6ccbe
Move test file to correct CMakeList list
ClFeSc Jul 6, 2023
468ae60
Only use encoding for string segments, use `Dictionary` for all other…
ClFeSc Jul 8, 2023
f88b6f4
Re-enable multi-threaded execution
ClFeSc Jul 9, 2023
ecca86d
Add plotting of median runtime vs. median memory consumption of strin…
ClFeSc Jul 10, 2023
9fb3ebc
format and lint script
23mafi Jul 12, 2023
aae0987
Fix lint errors
phkeese Jul 12, 2023
ceb2068
write binary parser and change binary writer for VariableStringDictio…
23mafi Jul 13, 2023
4d08c0e
Restructure evaluation script
ClFeSc Jul 18, 2023
6979634
Fix no distinction between brnaches in plots
ClFeSc Jul 19, 2023
8528ecd
Merge remote-tracking branch 'phkeese/feature/benchmark-script' into …
ClFeSc Jul 19, 2023
3b75303
Fix linter errors for benchmark script
phkeese Jul 19, 2023
32bc1bb
Merge remote-tracking branch 'phkeese/feature/benchmark-script' into …
ClFeSc Jul 19, 2023
0282e07
Add binary writer/parser tests
ClFeSc Jul 20, 2023
a4113db
add VariableStringDictionary in table scan (not working)
23mafi Jul 25, 2023
9c29b0f
Fix incorrect scanning of `VariableStringDictionarySegment`
ClFeSc Jul 25, 2023
2ed683a
Improve benchmark script:
ClFeSc Jul 31, 2023
79cee88
Merge remote-tracking branch 'phkeese/feature/benchmark-script' into …
ClFeSc Jul 31, 2023
2ad06af
Add axis sharing as an option & sort by encoding name
ClFeSc Aug 1, 2023
c293adb
Merge remote-tracking branch 'phkeese/feature/benchmark-script' into …
ClFeSc Aug 1, 2023
668e2ac
Cherry-pick iterator tests and fix.
phkeese Jul 26, 2023
a1d65de
run formater
23mafi Aug 2, 2023
b204407
delete unnecessary todos
23mafi Aug 2, 2023
56055bf
add comments
23mafi Aug 2, 2023
9f6910c
Fix Binary I/O
phkeese Aug 1, 2023
a3cd36f
Commit to trigger pipeline
ClFeSc Aug 2, 2023
050644c
Improve script
ClFeSc Aug 3, 2023
4c96b91
use offset_vector for VariableStringVector and Iterable
23mafi Aug 9, 2023
d9299a0
remove null byte in klotz after each string
23mafi Aug 9, 2023
f09b327
Fix binary files and order of member initialization under GCC
phkeese Aug 9, 2023
d04ad1a
Use `std::ranges::iota_view` instead of creating vectors
ClFeSc Aug 10, 2023
742ea2b
apply review comments
23mafi Aug 18, 2023
4068fb3
Implement some changes from review comments.
phkeese Aug 18, 2023
93fb587
Add query comparison graphs
ClFeSc Aug 22, 2023
b795a46
Merge remote-tracking branch 'phkeese/feature/benchmark-script' into …
ClFeSc Aug 22, 2023
7bdf3c1
Reformat CMakeLists.txt
phkeese Aug 22, 2023
d792984
Merge remote-tracking branch 'upstream/master' into feature/variable-…
phkeese Aug 22, 2023
ec5a2d0
Constrain `VariableStringDictionarySegment<T>` to `T = pmr_string`
ClFeSc Aug 22, 2023
cf589e8
Improve script code quality
ClFeSc Aug 22, 2023
116cdd1
apply review feedback and fix AccessCounters
23mafi Aug 22, 2023
6b773c9
Fix missing `requires`
ClFeSc Aug 22, 2023
6e3e608
run format.sh
23mafi Aug 22, 2023
035feea
Fix linter
phkeese Aug 22, 2023
bb6666f
Merge remote-tracking branch 'phkeese/feature/benchmark-script' into …
ClFeSc Aug 22, 2023
53190d8
Assert that dictionary offsets stay below 4GB.
phkeese Aug 23, 2023
adc3d88
Merge branch 'feature/variable-string-length-segment-three-layers' of…
phkeese Aug 23, 2023
4ab5267
Fix dictionary size assert
ClFeSc Aug 23, 2023
e588400
Update required GCC and G++ version to 10 zo support C++20.
phkeese Aug 23, 2023
4747e92
Please linter
ClFeSc Aug 23, 2023
8ffb228
Trigger CI
phkeese Aug 23, 2023
afc6e7e
Trigger Full CI
ClFeSc Aug 23, 2023
b16129a
WIP
phkeese Aug 24, 2023
fab1239
Merge branch 'feature/variable-string-length-segment-three-layers' of…
phkeese Aug 24, 2023
1554422
Remove ranges as it is not supported on Clang 11
ClFeSc Aug 24, 2023
94833f1
Include header in test to fix error during clangDebugDisablePrecompil…
phkeese Aug 24, 2023
1826bef
Merge remote-tracking branch 'upstream/master' into feature/variable-…
phkeese Aug 25, 2023
1d957b4
Replace `--metrics` with `--system_metrics` in call to benchmarks due…
ClFeSc Sep 16, 2023
8a4d8fe
Merge, merge, merge
Bouncner May 6, 2024
66f91e3
Lint and format
Bouncner May 6, 2024
34f15a2
Fix some clang tidy issues
Bouncner May 6, 2024
bb62bcf
Format
Bouncner May 6, 2024
05b6c31
Fix some further clang tidy issues
Bouncner May 6, 2024
08b1db7
Test removal of includes
Bouncner May 7, 2024
db0344f
Fix clang tidy issue
Bouncner May 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Binary file not shown.
Binary file not shown.
Binary file not shown.
837 changes: 837 additions & 0 deletions scripts/evaluate_string_segments.py

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions src/benchmark/operators/table_scan_sorted_benchmark.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,7 @@ void registerTableScanSortedBenchmarks() {
{"None", EncodingAndSupportedDataTypes(EncodingType::Unencoded, {"Int", "String"})},
{"Dictionary", EncodingAndSupportedDataTypes(EncodingType::Dictionary, {"Int", "String"})},
{"FixedStringDictionary", EncodingAndSupportedDataTypes(EncodingType::FixedStringDictionary, {"String"})},
{"VariableStringDictionary", EncodingAndSupportedDataTypes(EncodingType::VariableStringDictionary, {"String"})},
{"FrameOfReference", EncodingAndSupportedDataTypes(EncodingType::FrameOfReference, {"Int"})},
{"RunLength", EncodingAndSupportedDataTypes(EncodingType::RunLength, {"Int", "String"})},
{"LZ4", EncodingAndSupportedDataTypes(EncodingType::LZ4, {"Int", "String"})}};
Expand Down
35 changes: 21 additions & 14 deletions src/lib/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -209,12 +209,12 @@ set(
operators/index_scan.hpp
operators/insert.cpp
operators/insert.hpp
operators/join_helper/join_output_writing.cpp
operators/join_helper/join_output_writing.hpp
operators/join_hash.cpp
operators/join_hash.hpp
operators/join_hash/join_hash_steps.hpp
operators/join_hash/join_hash_traits.hpp
operators/join_helper/join_output_writing.cpp
operators/join_helper/join_output_writing.hpp
operators/join_index.cpp
operators/join_index.hpp
operators/join_nested_loop.cpp
Expand Down Expand Up @@ -484,6 +484,9 @@ set(
storage/index/adaptive_radix_tree/adaptive_radix_tree_index.hpp
storage/index/adaptive_radix_tree/adaptive_radix_tree_nodes.cpp
storage/index/adaptive_radix_tree/adaptive_radix_tree_nodes.hpp
storage/index/chunk_index_statistics.cpp
storage/index/chunk_index_statistics.hpp
storage/index/chunk_index_type.hpp
storage/index/group_key/composite_group_key_index.cpp
storage/index/group_key/composite_group_key_index.hpp
storage/index/group_key/group_key_index.cpp
Expand All @@ -496,10 +499,6 @@ set(
storage/index/group_key/variable_length_key_proxy.hpp
storage/index/group_key/variable_length_key_store.cpp
storage/index/group_key/variable_length_key_store.hpp
storage/index/chunk_index_statistics.cpp
storage/index/chunk_index_statistics.hpp
storage/index/table_index_statistics.cpp
storage/index/table_index_statistics.hpp
storage/index/partial_hash/flat_map_iterator.cpp
storage/index/partial_hash/flat_map_iterator.hpp
storage/index/partial_hash/flat_map_iterator_impl.cpp
Expand All @@ -508,7 +507,8 @@ set(
storage/index/partial_hash/partial_hash_index.hpp
storage/index/partial_hash/partial_hash_index_impl.cpp
storage/index/partial_hash/partial_hash_index_impl.hpp
storage/index/chunk_index_type.hpp
storage/index/table_index_statistics.cpp
storage/index/table_index_statistics.hpp
storage/lqp_view.cpp
storage/lqp_view.hpp
storage/lz4_segment.cpp
Expand Down Expand Up @@ -560,9 +560,23 @@ set(
storage/value_segment.hpp
storage/value_segment/null_value_vector_iterable.hpp
storage/value_segment/value_segment_iterable.hpp
storage/variable_string_dictionary/variable_string_dictionary_encoder.hpp
storage/variable_string_dictionary/variable_string_dictionary_iterable.hpp
storage/variable_string_dictionary/variable_string_vector.cpp
storage/variable_string_dictionary/variable_string_vector.hpp
storage/variable_string_dictionary/variable_string_vector_iterator.hpp
storage/variable_string_dictionary_segment.cpp
storage/variable_string_dictionary_segment.hpp
storage/vector_compression/base_compressed_vector.hpp
storage/vector_compression/base_vector_compressor.hpp
storage/vector_compression/base_vector_decompressor.hpp
storage/vector_compression/bitpacking/bitpacking_compressor.cpp
storage/vector_compression/bitpacking/bitpacking_compressor.hpp
storage/vector_compression/bitpacking/bitpacking_decompressor.hpp
storage/vector_compression/bitpacking/bitpacking_iterator.hpp
storage/vector_compression/bitpacking/bitpacking_vector.cpp
storage/vector_compression/bitpacking/bitpacking_vector.hpp
storage/vector_compression/bitpacking/bitpacking_vector_type.hpp
storage/vector_compression/compressed_vector_type.cpp
storage/vector_compression/compressed_vector_type.hpp
storage/vector_compression/fixed_width_integer/fixed_width_integer_compressor.cpp
Expand All @@ -571,13 +585,6 @@ set(
storage/vector_compression/fixed_width_integer/fixed_width_integer_utils.hpp
storage/vector_compression/fixed_width_integer/fixed_width_integer_vector.hpp
storage/vector_compression/resolve_compressed_vector_type.hpp
storage/vector_compression/bitpacking/bitpacking_compressor.cpp
storage/vector_compression/bitpacking/bitpacking_compressor.hpp
storage/vector_compression/bitpacking/bitpacking_iterator.hpp
storage/vector_compression/bitpacking/bitpacking_decompressor.hpp
storage/vector_compression/bitpacking/bitpacking_vector.hpp
storage/vector_compression/bitpacking/bitpacking_vector.cpp
storage/vector_compression/bitpacking/bitpacking_vector_type.hpp
storage/vector_compression/vector_compression.cpp
storage/vector_compression/vector_compression.hpp
strong_typedef.hpp
Expand Down
21 changes: 21 additions & 0 deletions src/lib/import_export/binary/binary_parser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
#include "storage/table.hpp"
#include "storage/table_column_definition.hpp"
#include "storage/value_segment.hpp"
#include "storage/variable_string_dictionary_segment.hpp"
#include "storage/vector_compression/bitpacking/bitpacking_vector.hpp"
#include "storage/vector_compression/bitpacking/bitpacking_vector_type.hpp"
#include "storage/vector_compression/compressed_vector_type.hpp"
Expand Down Expand Up @@ -175,6 +176,8 @@ std::shared_ptr<AbstractSegment> BinaryParser::_import_segment(std::ifstream& fi
} else {
Fail("Unsupported data type for FixedStringDictionary encoding");
}
case EncodingType::VariableStringDictionary:
return _import_variable_string_length_segment<pmr_string>(file, row_count);
case EncodingType::RunLength:
return _import_run_length_segment<ColumnDataType>(file, row_count);
case EncodingType::FrameOfReference:
Expand Down Expand Up @@ -219,6 +222,24 @@ std::shared_ptr<DictionarySegment<T>> BinaryParser::_import_dictionary_segment(s
return std::make_shared<DictionarySegment<T>>(dictionary, attribute_vector);
}

template <typename T>
std::shared_ptr<VariableStringDictionarySegment<T>> BinaryParser::_import_variable_string_length_segment(
std::ifstream& file, ChunkOffset row_count) {
// Read attribute vector compression type and use it to decompress.
const auto compressed_vector_type_id = _read_value<CompressedVectorTypeID>(file);
const auto attribute_vector = _import_attribute_vector(file, row_count, compressed_vector_type_id);

// Read offset vector.
const auto offset_vector_size = _read_value<uint32_t>(file);
const auto offset_vector = std::make_shared<pmr_vector<uint32_t>>(_read_values<uint32_t>(file, offset_vector_size));

// Read dictionary.
const auto dictionary_size = _read_value<uint32_t>(file);
const auto dictionary = std::make_shared<pmr_vector<char>>(_read_values<char>(file, dictionary_size));

return std::make_shared<VariableStringDictionarySegment<pmr_string>>(dictionary, attribute_vector, offset_vector);
}

std::shared_ptr<FixedStringDictionarySegment<pmr_string>> BinaryParser::_import_fixed_string_dictionary_segment(
std::ifstream& file, ChunkOffset row_count) {
const auto compressed_vector_type_id = _read_value<CompressedVectorTypeID>(file);
Expand Down
5 changes: 5 additions & 0 deletions src/lib/import_export/binary/binary_parser.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
#include "storage/run_length_segment.hpp"
#include "storage/table.hpp"
#include "storage/value_segment.hpp"
#include "storage/variable_string_dictionary_segment.hpp"
#include "storage/vector_compression/bitpacking/bitpacking_vector_type.hpp"

namespace hyrise {
Expand Down Expand Up @@ -76,6 +77,10 @@ class BinaryParser {
template <typename T>
static std::shared_ptr<DictionarySegment<T>> _import_dictionary_segment(std::ifstream& file, ChunkOffset row_count);

template <typename T>
static std::shared_ptr<VariableStringDictionarySegment<T>> _import_variable_string_length_segment(
std::ifstream& file, ChunkOffset row_count);

static std::shared_ptr<FixedStringDictionarySegment<pmr_string>> _import_fixed_string_dictionary_segment(
std::ifstream& file, ChunkOffset row_count);

Expand Down
21 changes: 21 additions & 0 deletions src/lib/import_export/binary/binary_writer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
#include "storage/segment_iterate.hpp"
#include "storage/table.hpp"
#include "storage/value_segment.hpp"
#include "storage/variable_string_dictionary_segment.hpp"
#include "storage/vector_compression/bitpacking/bitpacking_vector.hpp"
#include "storage/vector_compression/bitpacking/bitpacking_vector_type.hpp"
#include "storage/vector_compression/compressed_vector_type.hpp"
Expand Down Expand Up @@ -350,6 +351,26 @@ void BinaryWriter::_write_segment(const LZ4Segment<T>& lz4_segment, bool /*colum
}
}

template <typename T>
void BinaryWriter::_write_segment(const VariableStringDictionarySegment<T>& dictionary_segment,
bool /*column_is_nullable*/, std::ofstream& ofstream) {
export_value(ofstream, EncodingType::VariableStringDictionary);

// Write attribute vector compression type and data.
const auto compressed_vector_type_id = _compressed_vector_type_id<T>(dictionary_segment);
export_value(ofstream, compressed_vector_type_id);
_export_compressed_vector(ofstream, *dictionary_segment.compressed_vector_type(),
*dictionary_segment.attribute_vector());

// Write offset vector.
export_value(ofstream, static_cast<uint32_t>(dictionary_segment.offset_vector()->size()));
export_values(ofstream, *dictionary_segment.offset_vector());

// Write the dictionary size and dictionary
export_value(ofstream, static_cast<uint32_t>(dictionary_segment.dictionary()->size()));
export_values(ofstream, *dictionary_segment.dictionary());
}

template <typename T>
CompressedVectorTypeID BinaryWriter::_compressed_vector_type_id(
const AbstractEncodedSegment& abstract_encoded_segment) {
Expand Down
5 changes: 5 additions & 0 deletions src/lib/import_export/binary/binary_writer.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
#include "storage/reference_segment.hpp"
#include "storage/run_length_segment.hpp"
#include "storage/value_segment.hpp"
#include "storage/variable_string_dictionary_segment.hpp"

namespace hyrise {

Expand Down Expand Up @@ -222,6 +223,10 @@ class BinaryWriter {
template <typename T>
static void _write_segment(const LZ4Segment<T>& lz4_segment, bool /*column_is_nullable*/, std::ofstream& ofstream);

template <typename T>
static void _write_segment(const VariableStringDictionarySegment<T>& dictionary_segment, bool /*column_is_nullable*/,
std::ofstream& ofstream);

template <typename T>
static CompressedVectorTypeID _compressed_vector_type_id(const AbstractEncodedSegment& abstract_encoded_segment);

Expand Down
1 change: 1 addition & 0 deletions src/lib/operators/aggregate_hash.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
#include <vector>

#include <boost/container/pmr/monotonic_buffer_resource.hpp>
#include <boost/unordered/unordered_flat_map.hpp>

#include "aggregate/window_function_traits.hpp"
#include "all_type_variant.hpp"
Expand Down
4 changes: 4 additions & 0 deletions src/lib/operators/print.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -249,6 +249,10 @@ std::string Print::_segment_type(const std::shared_ptr<AbstractSegment>& segment
segment_type += "FSD";
break;
}
case EncodingType::VariableStringDictionary: {
segment_type += "VSD";
break;
}
case EncodingType::FrameOfReference: {
segment_type += "FoR";
break;
Expand Down
61 changes: 40 additions & 21 deletions src/lib/operators/table_scan/column_like_table_scan_impl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@
#include "storage/pos_lists/row_id_pos_list.hpp"
#include "storage/segment_iterables/create_iterable_from_attribute_vector.hpp"
#include "storage/segment_iterate.hpp"
// NOLINTBEGIN(misc-include-cleaner): VariableStringDictionary is accessed in _find_matches_in_dictionary.
#include "storage/variable_string_dictionary/variable_string_vector.hpp"
#include "storage/variable_string_dictionary/variable_string_vector_iterator.hpp"
// NOLINTEND(misc-include-cleaner)
#include "types.hpp"
#include "utils/assert.hpp"

Expand Down Expand Up @@ -49,8 +53,8 @@ void ColumnLikeTableScanImpl::_scan_generic_segment(
const AbstractSegment& segment, const ChunkID chunk_id, RowIDPosList& matches,
const std::shared_ptr<const AbstractPosList>& position_filter) const {
segment_with_iterators_filtered(segment, position_filter, [&](auto iter, [[maybe_unused]] const auto end) {
// Don't instantiate this for ReferenceSegments to save compile time as ReferenceSegments are handled
// via position_filter
// Do not instantiate this for ReferenceSegments to save compile time as ReferenceSegments are handled via
// position_filter.
if constexpr (!is_reference_segment_iterable_v<typename decltype(iter)::IterableType>) {
using ColumnDataType = typename decltype(iter)::ValueType;

Expand All @@ -62,10 +66,10 @@ void ColumnLikeTableScanImpl::_scan_generic_segment(
_scan_with_iterators<true>(functor, iter, end, chunk_id, matches);
});
} else {
Fail("Can only handle strings");
Fail("Can only handle strings.");
}
} else {
Fail("ReferenceSegments have their own code paths and should be handled there");
Fail("ReferenceSegments have their own code paths and should be handled there.");
}
});
}
Expand All @@ -76,14 +80,27 @@ void ColumnLikeTableScanImpl::_scan_dictionary_segment(const BaseDictionarySegme
// First, build a bitmap containing 1s/0s for matching/non-matching dictionary values. Second, iterate over the
// attribute vector and check against the bitmap. If too many input rows have already been removed (are not part of
// position_filter), this optimization is detrimental. See caller for that case.
std::pair<size_t, std::vector<bool>> result;
auto result = std::pair<size_t, std::vector<bool>>{};

if (segment.encoding_type() == EncodingType::Dictionary) {
const auto& typed_segment = static_cast<const DictionarySegment<pmr_string>&>(segment);
result = _find_matches_in_dictionary(*typed_segment.dictionary());
} else {
const auto& typed_segment = static_cast<const FixedStringDictionarySegment<pmr_string>&>(segment);
result = _find_matches_in_dictionary(*typed_segment.fixed_string_dictionary());
switch (segment.encoding_type()) {
case EncodingType::Dictionary: {
const auto& typed_segment = static_cast<const DictionarySegment<pmr_string>&>(segment);
result = _find_matches_in_dictionary(*typed_segment.dictionary());
break;
}
case EncodingType::FixedStringDictionary: {
const auto& typed_segment = static_cast<const FixedStringDictionarySegment<pmr_string>&>(segment);
result = _find_matches_in_dictionary(*typed_segment.fixed_string_dictionary());
break;
}
case EncodingType::VariableStringDictionary: {
const auto& typed_segment = static_cast<const VariableStringDictionarySegment<pmr_string>&>(segment);
result = _find_matches_in_dictionary(*typed_segment.variable_string_dictionary());
break;
}
default: {
Fail("Segment is either not dictionary-encoded or encoding specialization is not implemented.");
}
}

const auto& match_count = result.first;
Expand All @@ -103,8 +120,8 @@ void ColumnLikeTableScanImpl::_scan_dictionary_segment(const BaseDictionarySegme
return;
}

// LIKE matches no rows
if (match_count == 0u) {
// LIKE matches no rows.
if (match_count == 0) {
++num_chunks_with_early_out;
return;
}
Expand All @@ -125,21 +142,23 @@ std::pair<size_t, std::vector<bool>> ColumnLikeTableScanImpl::_find_matches_in_d
auto& count = result.first;
auto& dictionary_matches = result.second;

count = 0u;
dictionary_matches.reserve(dictionary.size());
count = 0;
dictionary_matches.resize(dictionary.size());

_matcher.resolve(_invert_results, [&](const auto& matcher) {
#ifdef __clang__
// For the loop through the dictionary, we want to use const auto& for DictionarySegments. However,
// FixedStringVector iterators return an std::string_view value. Thus, we disable clang's -Wrange-loop-analysis
// error about a potential copy for the loop value.
// For the loop through the dictionary, we want to use const auto& for DictionarySegments. However, FixedStringVector
// iterators return an std::string_view value. Thus, we disable clang's -Wrange-loop-analysis error about a potential
// copy for the loop value.
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wrange-loop-analysis"
#endif
auto index = size_t{0};
for (const auto& value : dictionary) {
const auto matches = matcher(value);
count += static_cast<size_t>(matches);
dictionary_matches.push_back(matches);
const auto match_result = matcher(value);
count += static_cast<size_t>(match_result);
dictionary_matches[index] = match_result;
++index;
}

#ifdef __clang__
Expand Down
2 changes: 0 additions & 2 deletions src/lib/operators/table_scan/column_like_table_scan_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@
#include <utility>
#include <vector>

#include <boost/variant.hpp>

#include "abstract_dereferenced_column_table_scan_impl.hpp"
#include "expression/evaluation/like_matcher.hpp"
#include "types.hpp"
Expand Down
23 changes: 14 additions & 9 deletions src/lib/storage/create_iterable_from_segment.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,20 +26,22 @@ class ReferenceSegment;
template <typename T, EraseReferencedSegmentType>
class ReferenceSegmentIterable;

template <typename T>
requires(std::is_same_v<T, pmr_string>)
class VariableStringDictionarySegment;

/**
* @defgroup Uniform interface to create an iterable from a segment
*
* These methods cannot be part of the segments' interfaces because
* reference segment are not templated and thus don’t know their type.
* These methods cannot be part of the segments' interfaces because reference segment are not templated and thus do not
* know their type.
*
* All iterables implement the same interface using static polymorphism
* (i.e. the CRTP pattern, see segment_iterables/.hpp).
* All iterables implement the same interface using static polymorphism (i.e. the CRTP pattern, see
* segment_iterables/.hpp).
*
* In debug mode, create_iterable_from_segment returns a type erased
* iterable, i.e., all iterators have the same type
* In debug mode, create_iterable_from_segment returns a type erased iterable, i.e., all iterators have the same type.
*
* Functions must be forward-declared because otherwise, we run into
* circular include dependencies.
* Functions must be forward-declared because otherwise, we run into circular include dependencies.
*
* @{
*/
Expand Down Expand Up @@ -73,10 +75,13 @@ template <typename T, bool EraseSegmentType = HYRISE_DEBUG,
: EraseReferencedSegmentType::No)>
auto create_iterable_from_segment(const ReferenceSegment& segment);

template <typename T, bool EraseSegmentType = HYRISE_DEBUG>
auto create_iterable_from_segment(const VariableStringDictionarySegment<T>& segment);

/**@}*/

} // namespace hyrise

// Include these only now to break up include dependencies
// Include these only now to break up include dependencies.
#include "create_iterable_from_reference_segment.ipp"
#include "create_iterable_from_segment.ipp"