-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Querier: optimize moves of querier_opt, partition_range and partition_slice #18633
base: master
Are you sure you want to change the base?
Querier: optimize moves of querier_opt, partition_range and partition_slice #18633
Conversation
60a8b91
to
d2e2b09
Compare
perf results obtained with #18628 |
replica/database.cc
Outdated
querier_opt = _querier_cache.lookup_data_querier(cmd.query_uuid, *s, ranges.front(), cmd.slice, semaphore, trace_state, timeout); | ||
if (auto found = _querier_cache.lookup_data_querier(cmd.query_uuid, *s, ranges.front(), cmd.slice, semaphore, trace_state, timeout)) { | ||
querier_opt.emplace(std::move(*found)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this improve anything? Usually lookup_data_querier returns nullopt and operator= will do almost nothing. Moreover, the optimizer should remember that querier_opt is disengaged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this just makes the code slightly harder to read, without any apparent gain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, it has very little effect.
I needed it at some point in development when the copy constructor was deleted.
But it's not a must to have.
} | ||
|
||
while (!qs.done()) { | ||
auto&& range = *qs.current_partition_range++; | ||
|
||
if (!querier_opt) { | ||
query::querier_base::querier_config conf(_config.tombstone_warn_threshold); | ||
querier_opt = query::querier(as_mutation_source(), s, permit, range, qs.cmd.slice, trace_state, conf); | ||
querier_opt.emplace(as_mutation_source(), s, permit, range, qs.cmd.slice, trace_state, conf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This I understand.
replica/table.cc
Outdated
if (saved_querier) { | ||
*saved_querier = std::move(querier_opt); | ||
if (saved_querier && querier_opt) { | ||
saved_querier->emplace(std::move(*querier_opt)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternative approach: make querier a unique_ptr and (and use optimized_optional, or just unique_ptr's optionalness). De-pointerize the contents of queriers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it's that simple. Maybe with same massage to the multishard querier.
The root problem is the separation of the readers that may be kept as inactive reads by the reader concurrency semaphore, and state they rely on, like the partition range and slice. Those are kept external to the reader using mechanisms like lifecycle policy. Some readers do keep some of the state (see the evictable_reader_v2) and in the case of the shard_mutation_querier, the range and slice are kept long term in reader_meta::remote_parts so they can be used for recreating the reader in case it was evicted.
Maybe the querier impl can be kept as a whole in remote_parts
, but that might conflict with the querier cache.
I wish we had one mechanism for saving the query state that would work for all use cases.
Ah, and we also have query_state
in table::query that keep partition ranges, but it is not kept long term.
querier.cc
Outdated
@@ -121,7 +121,7 @@ static bool ranges_match(const schema& s, const dht::partition_range& original_r | |||
return bound_eq(original_range.end(), new_range.end()); | |||
} | |||
|
|||
static bool ranges_match(const schema& s, dht::partition_ranges_view original_ranges, dht::partition_ranges_view new_ranges) { | |||
static bool ranges_match(const schema& s, const dht::partition_ranges_view& original_ranges, const dht::partition_ranges_view& new_ranges) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partition_ranges_view is just two words (btw - can be replaced with std::span now). Passing it is cheap while dereferencing it is more expensive. I don't think this is an optimization (unless - the compiler refuses to pass it in registers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll measure perf-simple-query on this change alone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it would be up to 30 instruction/op give or take a few as the next patch shows...
so I'll drop both
querier.cc
Outdated
} | ||
|
||
return std::nullopt; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe this complication is worth 30 instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense
@@ -65,29 +72,24 @@ public: | |||
protected: | |||
schema_ptr _schema; | |||
reader_permit _permit; | |||
lw_shared_ptr<const dht::partition_range> _range; | |||
std::unique_ptr<const query::partition_slice> _slice; | |||
querier_reader_context_ptr _ctx; | |||
std::variant<flat_mutation_reader_v2, reader_concurrency_semaphore::inactive_read_handle> _reader; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could move all of the querier into an allocated object. Then moving queriers around is just moving unique_ptrs.
We could be even crazier and move all the contents into reader_permit (and then querier becomes a small wrapper around it). But we'll have many more reader_permit:s than queriers.
🔴 CI State: FAILURE✅ - Build Failed Tests (1/30735):Build Details:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the changes are good, but others has gone too far, in particular patch 3 (and to some extent, parts of patch 1). The readability and maintainability of the code is also something we should also optimize for. On the hot path we are willing to sacrifice some of the readability and maintainability for performance but the tradeoff should still be reasonable.
replica/database.cc
Outdated
querier_opt = _querier_cache.lookup_data_querier(cmd.query_uuid, *s, ranges.front(), cmd.slice, semaphore, trace_state, timeout); | ||
if (auto found = _querier_cache.lookup_data_querier(cmd.query_uuid, *s, ranges.front(), cmd.slice, semaphore, trace_state, timeout)) { | ||
querier_opt.emplace(std::move(*found)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this just makes the code slightly harder to read, without any apparent gain.
querier.cc
Outdated
} | ||
|
||
return std::nullopt; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree.
Save a move of the querier object. Suprisingly, this yields a saving of about 100 instructions/op: ``` base: 94575.94 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42288 insns/op, 0 errors) post: 93323.42 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42181 insns/op, 0 errors) ``` Signed-off-by: Benny Halevy <[email protected]>
Keep both partition_range and partition_slice in a single allocated structure to reduce the number of allocations and clean up the code a bit in this area, since the mutlishard mutation query always keeps both in the reader_meta::remote_parts as a pair. This change saves an allocation call and it reduces copies of range and slice, yielding close to 300 insns/op improvement in perf-simple-query: ``` base: 93323.42 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42181 insns/op, 0 errors) post: 96538.66 tps ( 62.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 41895 insns/op, 0 errors) ``` Signed-off-by: Benny Halevy <[email protected]>
d2e2b09
to
37839ab
Compare
v2: simplified series |
🔴 CI State: FAILURE✅ - Build Failed Tests (4/30696):
Build Details:
|
This series contains a couple of optimizations for the querier interface
that optimize
optional<querier>
moves and packs the querierdht::partition_range
andquery::partition_alice
into a new
querier_reader_context
structure that uses a single allocation.Total improvement as measured with perf-simple-query is of about 400 instructions/op (~1%) and 1 fewer allocs/op: