Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read weight quantiles #252

Merged
merged 16 commits into from
Nov 10, 2023
Merged

Conversation

dalonsoa
Copy link
Contributor

@dalonsoa dalonsoa commented Nov 9, 2023

Implements the logic to read the CSV files with the weight quantiles. The relevant files are added and the StaitcLinear.json file is modified to actually use that data. It also updates the static linear model and its definition to actually handle this information, removing the placeholders used before.

Close #251

@dalonsoa dalonsoa requested review from alexdewar and jamesturner246 and removed request for alexdewar November 9, 2023 12:39
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

src/HealthGPS/static_linear_model.cpp Outdated Show resolved Hide resolved
src/HealthGPS/static_linear_model.cpp Show resolved Hide resolved
src/HealthGPS/static_linear_model.h Outdated Show resolved Hide resolved
Copy link
Contributor

github-actions bot commented Nov 9, 2023

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@alexdewar alexdewar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one small mistake, but otherwise it's all good except for a few queries. I think we need to rework the DataTypeTableColumn API a bit, because it's not v nice to use atm.

example_new/new_data/weight_quantiles_NCDRisk_female.csv Outdated Show resolved Hide resolved
Comment on lines 154 to 164
std::map<hgps::core::Gender, std::vector<double>> weight_quantiles = {
{hgps::core::Gender::female, std::vector<double>(quantiles_female.num_rows())},
{hgps::core::Gender::male, std::vector<double>(quantiles_female.num_columns())}};
for (size_t j = 0; j < quantiles_female.num_rows(); j++) {
weight_quantiles[hgps::core::Gender::female][j] =
std::any_cast<double>(quantiles_female.column(0).value(j));
}
for (size_t j = 0; j < quantiles_male.num_rows(); j++) {
weight_quantiles[hgps::core::Gender::male][j] =
std::any_cast<double>(quantiles_male.column(0).value(j));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you intended the male vector to be quantiles_male.num_rows() in size?

That aside, it's better not to resize your vectors upfront, but to resize them as you go along. If you want to hint to the compiler that you know the final size you can use the reserve() method.

I think this should work:

Suggested change
std::map<hgps::core::Gender, std::vector<double>> weight_quantiles = {
{hgps::core::Gender::female, std::vector<double>(quantiles_female.num_rows())},
{hgps::core::Gender::male, std::vector<double>(quantiles_female.num_columns())}};
for (size_t j = 0; j < quantiles_female.num_rows(); j++) {
weight_quantiles[hgps::core::Gender::female][j] =
std::any_cast<double>(quantiles_female.column(0).value(j));
}
for (size_t j = 0; j < quantiles_male.num_rows(); j++) {
weight_quantiles[hgps::core::Gender::male][j] =
std::any_cast<double>(quantiles_male.column(0).value(j));
}
std::map<hgps::core::Gender, std::vector<double>> weight_quantiles = {
{hgps::core::Gender::female, {}},
{hgps::core::Gender::male, {}};
weight_quantiles[hgps::core::Gender::female].reserve(quantiles_female.num_rows());
weight_quantiles[hgps::core::Gender::male].reserve(quantiles_male.num_rows());
for (size_t j = 0; j < quantiles_female.num_rows(); j++) {
weight_quantiles[hgps::core::Gender::female].push_back(std::any_cast<double>(quantiles_female.column(0).value(j)));
}
for (size_t j = 0; j < quantiles_female.num_rows(); j++) {
weight_quantiles[hgps::core::Gender::male].push_back(std::any_cast<double>(quantiles_male.column(0).value(j)));
}

That said, this is rather ugly. I see this is where you wanted to use #253.

The problem is we get a DataTypeColumn here (the abstract base class), whereas your fix applies to PrimitiveDataTypeColumn -- I can't think of a better way to reorganise this API without digging into it a bit more though (which I'm happy to do, @jamesturner246!).

One option would be to use dynamic_cast to convert the DataTypeColumns into PrimitiveDataTypeColumn<double>s, then we can use your fix in #253 to get at the underlying std::vectors. What do you reckon, @jamesturner246?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS -- The way you've done it is also fine (aside from the sizing problem) and if we're going to rework the API anyway, there's not much point in mucking about with it now if it works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is where I was thinking on using that patch.

I thought on using push_back, but for some reason I thought it was going to be less efficient that just allocating the space upfront, given that I knew it in advance, and just populating the elements.

Thanks for pointing the male/female issue. I'l fix it asap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not using something similar to #253, the only other option I can think of is implementing my own parser to read those files, but it is a pity given that there is already perfectly fine functionality to do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's likely to be a really minor performance difference in this case, but the reason not to resize containers when creating them is because the contents then have to be default-initialised (in this case that just means they'll be zeroed). If you use reserve() instead, you still allocate memory but there's no initialisation required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not using something similar to #253, the only other option I can think of is implementing my own parser to read those files, but it is a pity given that there is already perfectly fine functionality to do that.

I think we could rework the API so that you could get the values out by doing something like:

std::vector<double> my_values = column.values<double>();

If column contains ints or something else, you would get an exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping aside mentioning the specific type, this was my intention, but it seems I did it in the wrong place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would need to update PrimitiveDataColumnType as you've done as well, but we also need to figure out how to change the base class to make use of your new values() method. Python makes this so much easier 😆

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can wait until afterwards. Lots of changes going on there at the moment.

src/HealthGPS/static_linear_model.cpp Outdated Show resolved Hide resolved
src/HealthGPS/static_linear_model.cpp Show resolved Hide resolved
src/HealthGPS/static_linear_model.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

github-actions bot commented Nov 9, 2023

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

github-actions bot commented Nov 9, 2023

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
Copy link
Contributor

github-actions bot commented Nov 9, 2023

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@dalonsoa dalonsoa linked an issue Nov 10, 2023 that may be closed by this pull request
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@dalonsoa
Copy link
Contributor Author

@jamesturner246 , this seems to work after the last change. So if we are moving forward with the merging, I'd suggest we first merge this PR and then #250 .

Copy link
Contributor

@jamesturner246 jamesturner246 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I'd just add the same vector checking to ModelDefinition ctor as well, and good to go.

//}
if (weight_quantiles_.empty()) {
throw core::HgpsException("Weight quantiles dictionary is empty");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add the same check to *ModelDefinition below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@jamesturner246 jamesturner246 merged commit d3d9a2b into initialise_weights Nov 10, 2023
4 of 5 checks passed
@jamesturner246 jamesturner246 deleted the read_weight_quantiles branch November 10, 2023 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Read weight quantiles
3 participants