Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Querier: optimize moves of querier_opt, partition_range and partition_slice #18633

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

bhalevy
Copy link
Member

@bhalevy bhalevy commented May 12, 2024

This series contains a couple of optimizations for the querier interface
that optimize optional<querier> moves and packs the querier dht::partition_range and query::partition_alice
into a new querier_reader_context structure that uses a single allocation.

Total improvement as measured with perf-simple-query is of about 400 instructions/op (~1%) and 1 fewer allocs/op:

base: 94575.94 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42288 insns/op,        0 errors)
post: 96538.66 tps ( 62.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   41895 insns/op,        0 errors)
  • No backport required, this PR contains only optimizations

@bhalevy bhalevy added the backport/none Backport is not required label May 12, 2024
@bhalevy bhalevy force-pushed the querier-optimize-partition_range_view branch from 60a8b91 to d2e2b09 Compare May 12, 2024 19:44
@bhalevy
Copy link
Member Author

bhalevy commented May 12, 2024

perf results obtained with #18628

querier_opt = _querier_cache.lookup_data_querier(cmd.query_uuid, *s, ranges.front(), cmd.slice, semaphore, trace_state, timeout);
if (auto found = _querier_cache.lookup_data_querier(cmd.query_uuid, *s, ranges.front(), cmd.slice, semaphore, trace_state, timeout)) {
querier_opt.emplace(std::move(*found));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this improve anything? Usually lookup_data_querier returns nullopt and operator= will do almost nothing. Moreover, the optimizer should remember that querier_opt is disengaged.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this just makes the code slightly harder to read, without any apparent gain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, it has very little effect.
I needed it at some point in development when the copy constructor was deleted.
But it's not a must to have.

}

while (!qs.done()) {
auto&& range = *qs.current_partition_range++;

if (!querier_opt) {
query::querier_base::querier_config conf(_config.tombstone_warn_threshold);
querier_opt = query::querier(as_mutation_source(), s, permit, range, qs.cmd.slice, trace_state, conf);
querier_opt.emplace(as_mutation_source(), s, permit, range, qs.cmd.slice, trace_state, conf);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I understand.

replica/table.cc Outdated
if (saved_querier) {
*saved_querier = std::move(querier_opt);
if (saved_querier && querier_opt) {
saved_querier->emplace(std::move(*querier_opt));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative approach: make querier a unique_ptr and (and use optimized_optional, or just unique_ptr's optionalness). De-pointerize the contents of queriers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's that simple. Maybe with same massage to the multishard querier.
The root problem is the separation of the readers that may be kept as inactive reads by the reader concurrency semaphore, and state they rely on, like the partition range and slice. Those are kept external to the reader using mechanisms like lifecycle policy. Some readers do keep some of the state (see the evictable_reader_v2) and in the case of the shard_mutation_querier, the range and slice are kept long term in reader_meta::remote_parts so they can be used for recreating the reader in case it was evicted.
Maybe the querier impl can be kept as a whole in remote_parts, but that might conflict with the querier cache.
I wish we had one mechanism for saving the query state that would work for all use cases.

Ah, and we also have query_state in table::query that keep partition ranges, but it is not kept long term.

querier.cc Outdated
@@ -121,7 +121,7 @@ static bool ranges_match(const schema& s, const dht::partition_range& original_r
return bound_eq(original_range.end(), new_range.end());
}

static bool ranges_match(const schema& s, dht::partition_ranges_view original_ranges, dht::partition_ranges_view new_ranges) {
static bool ranges_match(const schema& s, const dht::partition_ranges_view& original_ranges, const dht::partition_ranges_view& new_ranges) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partition_ranges_view is just two words (btw - can be replaced with std::span now). Passing it is cheap while dereferencing it is more expensive. I don't think this is an optimization (unless - the compiler refuses to pass it in registers).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll measure perf-simple-query on this change alone.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, it would be up to 30 instruction/op give or take a few as the next patch shows...
so I'll drop both

querier.cc Outdated
}

return std::nullopt;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this complication is worth 30 instructions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

@@ -65,29 +72,24 @@ public:
protected:
schema_ptr _schema;
reader_permit _permit;
lw_shared_ptr<const dht::partition_range> _range;
std::unique_ptr<const query::partition_slice> _slice;
querier_reader_context_ptr _ctx;
std::variant<flat_mutation_reader_v2, reader_concurrency_semaphore::inactive_read_handle> _reader;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could move all of the querier into an allocated object. Then moving queriers around is just moving unique_ptrs.

We could be even crazier and move all the contents into reader_permit (and then querier becomes a small wrapper around it). But we'll have many more reader_permit:s than queriers.

@scylladb-promoter
Copy link
Contributor

🔴 CI State: FAILURE

✅ - Build
✅ - Container Test
✅ - dtest with topology changes
✅ - dtest
❌ - Unit Tests

Failed Tests (1/30735):

Build Details:

  • Duration: 3 hr 26 min
  • Builder: i-08d76d2e3e5421cc2 (m6id.8xlarge)

Copy link
Contributor

@denesb denesb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the changes are good, but others has gone too far, in particular patch 3 (and to some extent, parts of patch 1). The readability and maintainability of the code is also something we should also optimize for. On the hot path we are willing to sacrifice some of the readability and maintainability for performance but the tradeoff should still be reasonable.

querier_opt = _querier_cache.lookup_data_querier(cmd.query_uuid, *s, ranges.front(), cmd.slice, semaphore, trace_state, timeout);
if (auto found = _querier_cache.lookup_data_querier(cmd.query_uuid, *s, ranges.front(), cmd.slice, semaphore, trace_state, timeout)) {
querier_opt.emplace(std::move(*found));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this just makes the code slightly harder to read, without any apparent gain.

replica/table.cc Outdated Show resolved Hide resolved
querier.cc Outdated
}

return std::nullopt;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.

Save a move of the querier object.
Suprisingly, this yields a saving of about 100 instructions/op:

```
base: 94575.94 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42288 insns/op,        0 errors)
post: 93323.42 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42181 insns/op,        0 errors)
```

Signed-off-by: Benny Halevy <[email protected]>
Keep both partition_range and partition_slice in a
single allocated structure to reduce the number of allocations
and clean up the code a bit in this area, since
the mutlishard mutation query always keeps both
in the reader_meta::remote_parts as a pair.

This change saves an allocation call and it reduces copies of range and slice,
yielding close to 300 insns/op improvement in perf-simple-query:
```
base: 93323.42 tps ( 63.1 allocs/op,   0.0 logallocs/op,  14.2 tasks/op,   42181 insns/op,        0 errors)
post: 96538.66 tps ( 62.1 allocs/op,   0.0 logallocs/op,  14.1 tasks/op,   41895 insns/op,        0 errors)
```

Signed-off-by: Benny Halevy <[email protected]>
@bhalevy bhalevy force-pushed the querier-optimize-partition_range_view branch from d2e2b09 to 37839ab Compare May 13, 2024 11:42
@bhalevy
Copy link
Member Author

bhalevy commented May 13, 2024

v2: simplified series

@bhalevy bhalevy changed the title [RFC] Querier: optimize for single range queries Querier: optimize moves of querier_opt, partition_range and partition_slice May 13, 2024
@bhalevy bhalevy requested review from denesb and avikivity May 13, 2024 11:47
@scylladb-promoter
Copy link
Contributor

🔴 CI State: FAILURE

✅ - Build
✅ - Container Test
❌ - dtest with topology changes
✅ - dtest
❌ - Unit Tests

Failed Tests (4/30696):

Build Details:

  • Duration: 6 hr 15 min
  • Builder: i-09815ad4dc3c21d12 (r5ad.8xlarge)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/none Backport is not required
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants