-
Notifications
You must be signed in to change notification settings - Fork 191
perf: [sc-60446] Enumerations loaded from schema execute generate_value_map
in a background task
#5501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
perf: [sc-60446] Enumerations loaded from schema execute generate_value_map
in a background task
#5501
Conversation
…meration-slow-value-map-future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure launching background tasks without the user knowing when they would finish is the right solution to performance problems like this.
@@ -268,7 +291,18 @@ class Enumeration { | |||
* value map of the enumeration. | |||
*/ | |||
const tdb::pmr::unordered_map<std::string_view, uint64_t>& value_map() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of launching a background thread upon creation of the enumeration, can we compute value_map_
upon the first call to value_map()
? This would alleviate most of my concerns.
@@ -122,6 +124,7 @@ shared_ptr<const Enumeration> enumeration_from_capnp( | |||
data_size, | |||
offsets, | |||
offsets_size, | |||
true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect a capnp deserialization function to completely finish its work once it returns.
tiledb::common::ThreadPool::Task value_map_future = | ||
resources.compute_tp().async( | ||
[](std::reference_wrapper<Enumeration> enumeration) { | ||
enumeration.get().generate_value_map(); | ||
return Status::Ok(); | ||
}, | ||
std::reference_wrapper<Enumeration>(*this)); | ||
value_map_future_ = std::move(value_map_future); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiledb::common::ThreadPool::Task value_map_future = | |
resources.compute_tp().async( | |
[](std::reference_wrapper<Enumeration> enumeration) { | |
enumeration.get().generate_value_map(); | |
return Status::Ok(); | |
}, | |
std::reference_wrapper<Enumeration>(*this)); | |
value_map_future_ = std::move(value_map_future); | |
value_map_future_ = | |
resources.compute_tp().async( | |
[](std::reference_wrapper<Enumeration> enumeration) { | |
enumeration.get().generate_value_map(); | |
return Status::Ok(); | |
}, | |
std::reference_wrapper<Enumeration>(*this)); |
(and format)
Enumeration::~Enumeration() { | ||
if (value_map_future_.valid()) { | ||
auto st = value_map_future_.wait(); | ||
if (!st.ok()) { | ||
tiledb::common::global_logger().warn( | ||
"Enumeration::~Enumeration(): Error computing value map: " + | ||
st.to_string()); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't log the warning; it's not important to tell the user that an operation they did not care for failed.
@@ -49,6 +50,7 @@ class EnumerationException : public StatusException { | |||
}; | |||
|
|||
Enumeration::Enumeration( | |||
const ContextResources& resources, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like a "data class" like Enumeration
using a "behavior class" like ContextResources
; it will make removing the Context
dependency on the C API harder.
We can generate |
Story details: https://app.shortcut.com/tiledb-inc/story/60446
In the story we have observed that
generate_value_map
can take a non-trivial amount of time to execute. This function builds a map from enumeration variant to key, so that we can detect duplicates and do fast lookups.However,
generate_value_map
is always called when we construct an Enumeration. However:Enumeration::index_of
API #5482 which has not appeared in a release as of this writing.Hence all users of Enumeration are paying the potentially expensive cost of an operation whose result is never used.
This pull request adds an option to migrate the execution of
generate_value_map
to a background task. If thebool async
flag is true thengenerate_value_map
will be called from a thread pool task, andEnumeration::value_map()
will wait for that computation to finish. We construct Enumerations withasync = true
when loading them for an ArraySchema andasync = false
when they are created by the user (in which case we want the duplicate validation to happen immediately).If an exception is thrown due to a corrupted Enumeration for some reason, then we will only see it upon calling
value_map()
in which case the exception thrown while executing the future will be thrown. This is tested in the added test caseEnumeration Creation Error - Async
.This requires threading a
ContextResources
through all of the call stacks which find their way toEnumeration::create
. This represents most of the changes in this PR, which are largely encapsulated in the single commit 28f97ce, so I recommend reviewing commit by commit.I have also added timers to measure time spent in
generate_value_map
and time spent waiting for the it to finish.TYPE: IMPROVEMENT
DESC: async
Enumeration::generate_value_map