Risk factor refactor #195

jamesturner246 · 2023-08-02T17:30:29Z

Fixes #166
(not really)

This has been somewhat of a lacklustre attempt that fell short of my ambition... All that was really achieved is the removal of duplicate risk factor listings from France.EBHLM.json, and changing a couple of data manager methods to fail on missing data, rather than warn and continue. As such, it should be quite quick to review. The main issue still seems to be the presence of code for handling Hierarchical static and dynamic models in the wider codebase, which is resisting refactor efforts.

In a moment of catharsis, I've now convinced myself that the best solution going forward is to merge static and dynamic RF model into a single model, which encapsulates all this extra HLM-specific code within it.

See #194. I'd appreciate your input, as I'm itching to get started merging them, rather than doing more piecemeal refactors that aren't solving the underlying problem.

github-actions · 2023-08-02T17:38:14Z

clang-tidy review says "All clean, LGTM! 👍"

israel-vieira · 2023-08-02T19:48:09Z

example/France.EBHLM.json

This file is autogenerated from a folder with 50+ files and the country data file, not sure we gain much by hand editing this, perhaps changing the script that generate this file would be more beneficial. Alternatively, you could remove this section from the config file and read from here instead.

Agreed, the respective healthgps-tools script should also be updated. I'm considering a Python rewrite of the scripts, to sidestep the Windows R binding dependency.

Good point @israel-vieira.

Though we do want the config files in this repo to be up to date and working with the current version of the code for testing purposes. Whether we edit it directly or just the script for generating it isn't so important I think, as long as it's correct.

I've made an issue for later.

israel-vieira · 2023-08-02T20:01:28Z

src/HealthGPS.Console/configuration.cpp

The list of diseases can be long, 30+. The current validation will list all the invalid diseases key, if any, before exiting. The exception alternative will break at the first invalid key, the use will need to keep on trying until all are correct, in the end, both methods are accomplishing the required validation, it is just a matter of style.

Mapping the config with the back-end in a case-sensitive world is never user friend, the identifier type tries to minimize the pain by making all keys lowercase.

Case-sensitivity: fair point. In which case, the task then becomes removing the parts of the codebase that are case-sensitive. I've spotted a few.

Diseases: agree, a style choice. My concern in these cases has always been that by warning and continuing for dodgy input, that it makes the problem slightly less obvious to diagnose when it fails later on in the program, than if it were to immediately fail on loading. Perhaps this can be replaced with a function which reads all diseases, detects missing ones with printed debug info, before throwing an exception for any missing inputs, as opposed to a case-by-case exception.

I'm with @israel-vieira on this - collecting all that is invalid and providing a full list to the user (within meaningful indication of what is wrong) results on a better user experience than having to try again and again until all issues are sorted. We have another project where this is the approach and is indeed much better.

All in all, having a strong validation and input pre-processing steps where things can fail early and inconsistencies fixed (casing, proxis, etc.) is a good thing that will result in simpler, easier to understand (and diagnose) code.

Personally, I don't consider users input validation as an exceptional circumstance, the validation should focus on helping the users to fix the wrong or missing information provided, perhaps the system should show valid options before exiting gracefully.

In general, users don't read or understand stack traces with a message hidden inside a large block of text, this is kind of information is more relevant for developers.

I think there are two issues here:

Whether to "fast fail" when loading the config or provide a list of failures at the end

Whether to implement this using exceptions or some other mechanism (e.g. std::optional)

My view is:

Yes, if the validation fails, we should continue attempting to load the file and provide a list of errors at the end, rather than aborting immediately (for the most part)

@jamesturner246's refactor of get_country and get_disease_info to use exceptions is sensible and makes things clearer

The problem with at least some of the config loading code atm is that if validation fails, it doesn't abort immediately, but it also doesn't abort after the validation step either, so while you may get a warning on the console, the program will blithely plough on using default values, which is not good (see #161).

I'm a big believer in using exceptions to propagate errors in C++; it can make code much cleaner and safer. I'll give an example of how you might do "lazy failure" when loading disease info using exceptions:

std::vector<core::DiseaseInfo> get_diseases_info(core::Datastore &data_api, Configuration &config) { const auto diseases = data_api.get_diseases(); fmt::print("\nThere are {} diseases in storage, {} selected.\n", diseases.size(), config.diseases.size()); std::vector<core::DiseaseInfo> result; std::set<std::string> missing_diseases; for (const auto &code : config.diseases) { try { result.emplace_back(data_api.get_disease_info(code)); } catch (const std::invalid_argument &) { missing_diseases.emplace(code); } } if (!missing_diseases.empty()) { std::cerr << "ERROR: The following diseases were not found:\n"; for (const auto &disease : missing_diseases) { std::cerr << "\t" << disease << "\n"; } throw std::runtime_error{"Some invalid diseases were given"}; } return result; }

(There might be things we could tweak -- e.g. I haven't used the fmt library -- but this is just to give an idea.)

This code assumes that the caller will appropriately handle any thrown std::runtime_error, which it absolutely should do (we might want to use a bespoke error type rather than std::runtime_error though). So it means you a) tell the user what errors have occurred and b) propagate the errors up the call stack. Ultimately, the main config loading function should be wrapped in a try...catch and if an error occurs, it just tells the user and exits the program (the user will already have been informed of the specific issues with the config loading by this point).

Note that this code is also safer, because a std::runtime_error will also be thrown if the "disease" value is missing, meaning that the caller knows that something has gone wrong, as opposed to the current state of affairs where just a warning message is displayed.

Agree with this ^. The user gets helpful human readable messages noting which keys were missing, and execution stops before there's a chance of accidental continuation on default values. I'm happy to merge, and happy to leave it with you if you already have the idea in mind.

alexdewar

I think this is the right approach on the whole. I've made some minor suggestions.

I agree with @dalonsoa and @israel-vieira that we do probably want to do "lazy failure" when loading the config and print errors as we go along, rather than aborting as soon as an error is found. I think what you've done is an improvement though, so I'd be inclined to merge this as is for now and worry about that later. As I was planning on tackling this myself (see #161), I'm happy for you to leave this task for me, if you like.

alexdewar · 2023-08-03T10:48:15Z

example/France.EBHLM.json

Good point @israel-vieira.

Though we do want the config files in this repo to be up to date and working with the current version of the code for testing purposes. Whether we edit it directly or just the script for generating it isn't so important I think, as long as it's correct.

src/HealthGPS.Console/configuration.cpp

src/HealthGPS.Datastore/datamanager.cpp

alexdewar · 2023-08-03T11:30:49Z

src/HealthGPS.Console/configuration.cpp

I think there are two issues here:

Whether to "fast fail" when loading the config or provide a list of failures at the end

Whether to implement this using exceptions or some other mechanism (e.g. std::optional)

My view is:

Yes, if the validation fails, we should continue attempting to load the file and provide a list of errors at the end, rather than aborting immediately (for the most part)

@jamesturner246's refactor of get_country and get_disease_info to use exceptions is sensible and makes things clearer

The problem with at least some of the config loading code atm is that if validation fails, it doesn't abort immediately, but it also doesn't abort after the validation step either, so while you may get a warning on the console, the program will blithely plough on using default values, which is not good (see #161).

I'm a big believer in using exceptions to propagate errors in C++; it can make code much cleaner and safer. I'll give an example of how you might do "lazy failure" when loading disease info using exceptions:

std::vector<core::DiseaseInfo> get_diseases_info(core::Datastore &data_api, Configuration &config) { const auto diseases = data_api.get_diseases(); fmt::print("\nThere are {} diseases in storage, {} selected.\n", diseases.size(), config.diseases.size()); std::vector<core::DiseaseInfo> result; std::set<std::string> missing_diseases; for (const auto &code : config.diseases) { try { result.emplace_back(data_api.get_disease_info(code)); } catch (const std::invalid_argument &) { missing_diseases.emplace(code); } } if (!missing_diseases.empty()) { std::cerr << "ERROR: The following diseases were not found:\n"; for (const auto &disease : missing_diseases) { std::cerr << "\t" << disease << "\n"; } throw std::runtime_error{"Some invalid diseases were given"}; } return result; }

(There might be things we could tweak -- e.g. I haven't used the fmt library -- but this is just to give an idea.)

This code assumes that the caller will appropriately handle any thrown std::runtime_error, which it absolutely should do (we might want to use a bespoke error type rather than std::runtime_error though). So it means you a) tell the user what errors have occurred and b) propagate the errors up the call stack. Ultimately, the main config loading function should be wrapped in a try...catch and if an error occurs, it just tells the user and exits the program (the user will already have been informed of the specific issues with the config loading by this point).

Note that this code is also safer, because a std::runtime_error will also be thrown if the "disease" value is missing, meaning that the caller knows that something has gone wrong, as opposed to the current state of affairs where just a warning message is displayed.

src/HealthGPS.Tests/Datastore.Test.cpp

Co-authored-by: Alex Dewar <[email protected]>

github-actions · 2023-08-03T17:35:03Z

clang-tidy review says "All clean, LGTM! 👍"

github-actions · 2023-08-03T17:35:52Z

clang-tidy review says "All clean, LGTM! 👍"

github-actions · 2023-08-03T18:00:14Z

clang-tidy review says "All clean, LGTM! 👍"

dalonsoa

To be honest, nothing to add that @alexdewar has not explained much, much better - and with examples.

alexdewar

LGTM!

jamesturner246 added 3 commits August 2, 2023 13:24

get_country should throw exception if country is missing.

161e70e

get_disease_info should throw exception if disease is missing.

8c1bcf4

Remove duplicate risk factors from France.EBHLM.json.

c6eec51

jamesturner246 requested review from dalonsoa, alexdewar and israel-vieira August 2, 2023 17:30

israel-vieira approved these changes Aug 2, 2023

View reviewed changes

alexdewar requested changes Aug 3, 2023

View reviewed changes

jamesturner246 and others added 2 commits August 3, 2023 18:26

Use EXPECT_*THROW not ASSERT

c1e237d

Co-authored-by: Alex Dewar <[email protected]>

Use a string instead of an Identifier where we can

ad428bb

Co-authored-by: Alex Dewar <[email protected]>

jamesturner246 added 2 commits August 3, 2023 18:45

Fix broken GetDiseaseInfoMatchesGetDisases test.

f71d951

Small tidy up.

b685faa

jamesturner246 requested a review from alexdewar August 3, 2023 17:56

dalonsoa approved these changes Aug 4, 2023

View reviewed changes

alexdewar approved these changes Aug 7, 2023

View reviewed changes

jamesturner246 merged commit 19b4519 into main Aug 7, 2023
5 checks passed

jamesturner246 deleted the rf_refactor_166 branch August 7, 2023 11:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Risk factor refactor #195

Risk factor refactor #195

jamesturner246 commented Aug 2, 2023

github-actions bot commented Aug 2, 2023

israel-vieira Aug 2, 2023

jamesturner246 Aug 2, 2023

alexdewar Aug 3, 2023

jamesturner246 Aug 3, 2023

israel-vieira Aug 2, 2023

jamesturner246 Aug 2, 2023

dalonsoa Aug 3, 2023

israel-vieira Aug 3, 2023

alexdewar Aug 3, 2023

jamesturner246 Aug 3, 2023

alexdewar left a comment

alexdewar Aug 3, 2023

alexdewar Aug 3, 2023

github-actions bot commented Aug 3, 2023

github-actions bot commented Aug 3, 2023

github-actions bot commented Aug 3, 2023

dalonsoa left a comment

alexdewar left a comment

Risk factor refactor #195

Risk factor refactor #195

Conversation

jamesturner246 commented Aug 2, 2023

github-actions bot commented Aug 2, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexdewar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Aug 3, 2023

github-actions bot commented Aug 3, 2023

github-actions bot commented Aug 3, 2023

dalonsoa left a comment

Choose a reason for hiding this comment

alexdewar left a comment

Choose a reason for hiding this comment