Model exploration metrics #177

ccdavis · 2024-12-10T17:36:09Z

Restructures the collection of threshold testing results so we save one row per tested threshold combination. The data in each row is aggregated over the number of inner folds used to test on the thresholds.

There is some special code in the aggregation function to deal with tiny test data cases.

I commented out most code used to save the threshold testing against the training data and saving and testing "suspicious data" that we intend to remove soon. The suspicious data is set to None as a test for this next step, and tests pass.

Some no-longer relevant tests were commented out and some values changed to reflect the new shapes of the final threshold metrics tables in the tests.

… Doesn't work yet.

… threshold matrix entries

riley-harper

Looks good to me.

The _aggregate_per_threshold_results() function feels a little messy, but at the same time it's now much clearer what's going on. So I think it's overall an improvement.

riley-harper · 2024-12-10T17:57:11Z

hlink/linking/model_exploration/link_step_train_test_models.py

-        for i in range(len(threshold_matrix)):
-            results_dfs[i] = _create_results_df()
+
+        prediction_results: dict[int, ThresholdTestResult] = {}


If this is a dict: index -> ThresholdTestResult, would a list[ThresholdTestResult] be simpler?

riley-harper · 2024-12-10T17:57:50Z

hlink/linking/model_exploration/link_step_train_test_models.py

@@ -479,14 +488,21 @@ def _run(self) -> None:
        )

        # Stores suspicious data
-        suspicious_data = self._create_suspicious_data(id_a, id_b)
+        # suspicious_data = self._create_suspicious_data(id_a, id_b)
+        suspicious_data = None


Nice, this makes sense for now, and we can remove suspicious_data soon when we work on #176.

…lumns and nnot saving suspicious data.

ccdavis · 2024-12-10T18:46:48Z

Yeah I agree the aggregate function is messy. Also the function to invert the threshold results "combine..." or whatever. It's the kind of thing you could do more concisely with Pandas but it would be unreadable.

ccdavis and others added 4 commits December 6, 2024 17:31

WIP: refactor to combine threshold test results from all outer folds.…

93a5c4e

… Doesn't work yet.

WIP on correct metrics output; some tests break because of not enough…

dd49937

… threshold matrix entries

Cleaning up metrics

a041274

Tests pass

f083378

ccdavis requested a review from riley-harper December 10, 2024 17:36

riley-harper approved these changes Dec 10, 2024

View reviewed changes

Adjust hh model exploration test for new column names, no training co…

1f162dc

…lumns and nnot saving suspicious data.

ccdavis merged commit bde173d into v4-dev Dec 10, 2024
3 of 6 checks passed

riley-harper deleted the model-exploration-metrics branch December 16, 2024 21:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model exploration metrics #177

Model exploration metrics #177

ccdavis commented Dec 10, 2024

riley-harper left a comment

riley-harper Dec 10, 2024

riley-harper Dec 10, 2024

ccdavis commented Dec 10, 2024

Model exploration metrics #177

Model exploration metrics #177

Conversation

ccdavis commented Dec 10, 2024

riley-harper left a comment

Choose a reason for hiding this comment

riley-harper Dec 10, 2024

Choose a reason for hiding this comment

riley-harper Dec 10, 2024

Choose a reason for hiding this comment

ccdavis commented Dec 10, 2024