Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct calculation of size of calculated_stats_ vector #466

Merged
merged 5 commits into from
Jul 2, 2024

Conversation

TinyMarsh
Copy link
Contributor

@TinyMarsh TinyMarsh commented Jul 1, 2024

In #453 I introduced the calculated_factors_ vector to store the calculated stats for each bin for each factor the user wants to calculate stats for. The size of this vector is calculated and set in initialise_vector method, but initially I only allowed one "channel" per factor. This PR fixes this by including the other "channels" in the vector resize calculation, namely those specified in the already-existing initialise_output_channels method:

void AnalysisModule::initialise_output_channels(RuntimeContext &context) {
if (!channels_.empty()) {
return;
}
channels_.emplace_back("count");
channels_.emplace_back("deaths");
channels_.emplace_back("emigrations");
for (const auto &factor : context.mapping().entries()) {
channels_.emplace_back("mean_" + factor.key().to_string());
channels_.emplace_back("std_" + factor.key().to_string());
}
for (const auto &disease : context.diseases()) {
channels_.emplace_back("prevalence_" + disease.code.to_string());
channels_.emplace_back("incidence_" + disease.code.to_string());
}
channels_.emplace_back("normal_weight");
channels_.emplace_back("over_weight");
channels_.emplace_back("obese_weight");
channels_.emplace_back("above_weight");
channels_.emplace_back("mean_yll");
channels_.emplace_back("std_yll");
channels_.emplace_back("mean_yld");
channels_.emplace_back("std_yld");
channels_.emplace_back("mean_daly");
channels_.emplace_back("std_daly");
}

I have also renamed the vector to calculated_stats_ as it seems more appropriate.

Copy link
Contributor

@alexdewar alexdewar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced by all the hardcoded numbers, but if that's the best way to do it then it's ok.

Other than that, it all seems good.

Comment on lines 62 to 79
// The number of factors to calculate stats for is the number of factors minus the length of the
// `factors` vector.
size_t num_stats_to_calc = context.mapping().entries().size() - factors_to_calculate_.size();

// But we calculate the mean and standard deviation of each factor, so we need to multiply by 2
num_stats_to_calc *= 2;

// We also want to calculate the prevalence and incidence of each disease
num_stats_to_calc += 2 * context.diseases().size();

// We also want to keep the count, deaths, and emigrations
num_stats_to_calc += 3;

// We also want to calculate the normal, overweight, obese, and above weight prevalence
num_stats_to_calc += 4;

// Finally, we want to calculate the mean and standard deviation of YLL, YLD, and DALY
num_stats_to_calc += 6;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like there are a lot of hardcoded numbers here... Is there a way we could pass them in as various arguments so we don't have to remember to manually change them if we add/remove factors etc.?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you're right it's a bit horrible isn't it. I'll have a think about a nicer way to do this. Eventually the other "channels" stuff will be removed and with it the reference to what the calculated stats (column headings) actually are, so it might be good to have a human-readable list somewhere perhaps stored as a std::vector<std::string> as a data member in the class or something.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

src/HealthGPS/analysis_module.cpp Outdated Show resolved Hide resolved
src/HealthGPS/analysis_module.cpp Outdated Show resolved Hide resolved
src/HealthGPS/analysis_module.cpp Outdated Show resolved Hide resolved
src/HealthGPS/analysis_module.cpp Outdated Show resolved Hide resolved
src/HealthGPS/analysis_module.cpp Outdated Show resolved Hide resolved
src/HealthGPS/analysis_module.cpp Outdated Show resolved Hide resolved
src/HealthGPS/analysis_module.cpp Outdated Show resolved Hide resolved
src/HealthGPS/analysis_module.cpp Outdated Show resolved Hide resolved
src/HealthGPS/analysis_module.cpp Outdated Show resolved Hide resolved
src/HealthGPS/analysis_module.cpp Outdated Show resolved Hide resolved
@TinyMarsh
Copy link
Contributor Author

So I did some bits, but then realised that all I was doing was simple duplicating what the initialise_output_channels method was doing with the channels_ vector, so I have decided to just reuse that. @alexdewar you might want to take another look because this changes the changes in the PR a bit.

Copy link
Contributor

@alexdewar alexdewar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 😄

num_stats_to_calc += 6;
// And for each factor, we calculate the stats described in `channels_`, so we
// multiply the size of `channels_` by the number of factors to calculate stats for.
num_stats_to_calc *= channels_.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much cleaner!

@TinyMarsh TinyMarsh merged commit 0b0ce7a into main Jul 2, 2024
5 checks passed
@TinyMarsh TinyMarsh deleted the extend_analysis branch July 2, 2024 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants