-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read weight quantiles #252
Conversation
# Conflicts: # src/HealthGPS/static_linear_model.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
clang-tidy review says "All clean, LGTM! 👍" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found one small mistake, but otherwise it's all good except for a few queries. I think we need to rework the DataTypeTableColumn
API a bit, because it's not v nice to use atm.
std::map<hgps::core::Gender, std::vector<double>> weight_quantiles = { | ||
{hgps::core::Gender::female, std::vector<double>(quantiles_female.num_rows())}, | ||
{hgps::core::Gender::male, std::vector<double>(quantiles_female.num_columns())}}; | ||
for (size_t j = 0; j < quantiles_female.num_rows(); j++) { | ||
weight_quantiles[hgps::core::Gender::female][j] = | ||
std::any_cast<double>(quantiles_female.column(0).value(j)); | ||
} | ||
for (size_t j = 0; j < quantiles_male.num_rows(); j++) { | ||
weight_quantiles[hgps::core::Gender::male][j] = | ||
std::any_cast<double>(quantiles_male.column(0).value(j)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you intended the male vector to be quantiles_male.num_rows()
in size?
That aside, it's better not to resize your vectors upfront, but to resize them as you go along. If you want to hint to the compiler that you know the final size you can use the reserve()
method.
I think this should work:
std::map<hgps::core::Gender, std::vector<double>> weight_quantiles = { | |
{hgps::core::Gender::female, std::vector<double>(quantiles_female.num_rows())}, | |
{hgps::core::Gender::male, std::vector<double>(quantiles_female.num_columns())}}; | |
for (size_t j = 0; j < quantiles_female.num_rows(); j++) { | |
weight_quantiles[hgps::core::Gender::female][j] = | |
std::any_cast<double>(quantiles_female.column(0).value(j)); | |
} | |
for (size_t j = 0; j < quantiles_male.num_rows(); j++) { | |
weight_quantiles[hgps::core::Gender::male][j] = | |
std::any_cast<double>(quantiles_male.column(0).value(j)); | |
} | |
std::map<hgps::core::Gender, std::vector<double>> weight_quantiles = { | |
{hgps::core::Gender::female, {}}, | |
{hgps::core::Gender::male, {}}; | |
weight_quantiles[hgps::core::Gender::female].reserve(quantiles_female.num_rows()); | |
weight_quantiles[hgps::core::Gender::male].reserve(quantiles_male.num_rows()); | |
for (size_t j = 0; j < quantiles_female.num_rows(); j++) { | |
weight_quantiles[hgps::core::Gender::female].push_back(std::any_cast<double>(quantiles_female.column(0).value(j))); | |
} | |
for (size_t j = 0; j < quantiles_female.num_rows(); j++) { | |
weight_quantiles[hgps::core::Gender::male].push_back(std::any_cast<double>(quantiles_male.column(0).value(j))); | |
} |
That said, this is rather ugly. I see this is where you wanted to use #253.
The problem is we get a DataTypeColumn
here (the abstract base class), whereas your fix applies to PrimitiveDataTypeColumn
-- I can't think of a better way to reorganise this API without digging into it a bit more though (which I'm happy to do, @jamesturner246!).
One option would be to use dynamic_cast
to convert the DataTypeColumn
s into PrimitiveDataTypeColumn<double>
s, then we can use your fix in #253 to get at the underlying std::vector
s. What do you reckon, @jamesturner246?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS -- The way you've done it is also fine (aside from the sizing problem) and if we're going to rework the API anyway, there's not much point in mucking about with it now if it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this is where I was thinking on using that patch.
I thought on using push_back
, but for some reason I thought it was going to be less efficient that just allocating the space upfront, given that I knew it in advance, and just populating the elements.
Thanks for pointing the male/female issue. I'l fix it asap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not using something similar to #253, the only other option I can think of is implementing my own parser to read those files, but it is a pity given that there is already perfectly fine functionality to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's likely to be a really minor performance difference in this case, but the reason not to resize containers when creating them is because the contents then have to be default-initialised (in this case that just means they'll be zeroed). If you use reserve()
instead, you still allocate memory but there's no initialisation required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not using something similar to #253, the only other option I can think of is implementing my own parser to read those files, but it is a pity given that there is already perfectly fine functionality to do that.
I think we could rework the API so that you could get the values out by doing something like:
std::vector<double> my_values = column.values<double>();
If column
contains int
s or something else, you would get an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping aside mentioning the specific type
, this was my intention, but it seems I did it in the wrong place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You would need to update PrimitiveDataColumnType
as you've done as well, but we also need to figure out how to change the base class to make use of your new values()
method. Python makes this so much easier 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can wait until afterwards. Lots of changes going on there at the moment.
Co-authored-by: Alex Dewar <[email protected]>
clang-tidy review says "All clean, LGTM! 👍" |
…EPI/healthgps into read_weight_quantiles
clang-tidy review says "All clean, LGTM! 👍" |
1 similar comment
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
clang-tidy review says "All clean, LGTM! 👍" |
@jamesturner246 , this seems to work after the last change. So if we are moving forward with the merging, I'd suggest we first merge this PR and then #250 . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I'd just add the same vector checking to ModelDefinition
ctor as well, and good to go.
//} | ||
if (weight_quantiles_.empty()) { | ||
throw core::HgpsException("Weight quantiles dictionary is empty"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add the same check to *ModelDefinition
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
…EPI/healthgps into read_weight_quantiles
clang-tidy review says "All clean, LGTM! 👍" |
Implements the logic to read the CSV files with the weight quantiles. The relevant files are added and the
StaitcLinear.json
file is modified to actually use that data. It also updates the static linear model and its definition to actually handle this information, removing the placeholders used before.Close #251