calculate_roc_values wrong function #24

zzak00 · 2024-04-18T09:43:26Z

Hello @tamerthamoqa,
I think there is an error in calculate_roc_values function with it comes to true_positive_rate, and false_positive_rate calculation, the mean should be calculated outside the loop, as we average across all folds

the new code:
def calculate_roc_values(thresholds, distances, labels, num_folds=10):
num_pairs = min(len(labels), len(distances))
num_thresholds = len(thresholds)
k_fold = KFold(n_splits=num_folds, shuffle=False)

true_positive_rates = np.zeros((num_folds, num_thresholds))
false_positive_rates = np.zeros((num_folds, num_thresholds))
precision = np.zeros(num_folds)
recall = np.zeros(num_folds)
accuracy = np.zeros(num_folds)
best_distances = np.zeros(num_folds)

indices = np.arange(num_pairs)

for fold_index, (train_set, test_set) in enumerate(k_fold.split(indices)):
    # Find the best distance threshold for the k-fold cross validation using the train set
    accuracies_trainset = np.zeros(num_thresholds)
    for threshold_index, threshold in enumerate(thresholds):
        _, _, _, _, accuracies_trainset[threshold_index] = calculate_metrics(
            threshold=threshold,
            dist=distances[train_set],
            actual_issame=labels[train_set],
        )
    best_threshold_index = np.argmax(accuracies_trainset)

    # Test on test set using the best distance threshold
    for threshold_index, threshold in enumerate(thresholds):
        (
            true_positive_rates[fold_index, threshold_index],
            false_positive_rates[fold_index, threshold_index],
            _,
            _,
            _,
        ) = calculate_metrics(
            threshold=threshold,
            dist=distances[test_set],
            actual_issame=labels[test_set],
        )

    (
        _,
        _,
        precision[fold_index],
        recall[fold_index],
        accuracy[fold_index],
    ) = calculate_metrics(
        threshold=thresholds[best_threshold_index],
        dist=distances[test_set],
        actual_issame=labels[test_set],
    )
    
    best_distances[fold_index] = thresholds[best_threshold_index]

# Calculate mean values of TPR and FPR across all folds
true_positive_rate = np.mean(true_positive_rates, 0)
false_positive_rate = np.mean(false_positive_rates, 0)


return (
    true_positive_rate,
    false_positive_rate,
    precision,
    recall,
    accuracy,
    best_distances,
)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

calculate_roc_values wrong function #24

calculate_roc_values wrong function #24

zzak00 commented Apr 18, 2024

calculate_roc_values wrong function #24

calculate_roc_values wrong function #24

Comments

zzak00 commented Apr 18, 2024