-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model exploration metrics #177
Conversation
… Doesn't work yet.
… threshold matrix entries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
The _aggregate_per_threshold_results()
function feels a little messy, but at the same time it's now much clearer what's going on. So I think it's overall an improvement.
for i in range(len(threshold_matrix)): | ||
results_dfs[i] = _create_results_df() | ||
|
||
prediction_results: dict[int, ThresholdTestResult] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a dict: index -> ThresholdTestResult
, would a list[ThresholdTestResult]
be simpler?
@@ -479,14 +488,21 @@ def _run(self) -> None: | |||
) | |||
|
|||
# Stores suspicious data | |||
suspicious_data = self._create_suspicious_data(id_a, id_b) | |||
# suspicious_data = self._create_suspicious_data(id_a, id_b) | |||
suspicious_data = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this makes sense for now, and we can remove suspicious_data soon when we work on #176.
…lumns and nnot saving suspicious data.
Yeah I agree the aggregate function is messy. Also the function to invert the threshold results "combine..." or whatever. It's the kind of thing you could do more concisely with Pandas but it would be unreadable. |
Restructures the collection of threshold testing results so we save one row per tested threshold combination. The data in each row is aggregated over the number of inner folds used to test on the thresholds.
There is some special code in the aggregation function to deal with tiny test data cases.
I commented out most code used to save the threshold testing against the training data and saving and testing "suspicious data" that we intend to remove soon. The suspicious data is set to None as a test for this next step, and tests pass.
Some no-longer relevant tests were commented out and some values changed to reflect the new shapes of the final threshold metrics tables in the tests.