Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calculate_roc_values wrong function #24

Open
zzak00 opened this issue Apr 18, 2024 · 0 comments
Open

calculate_roc_values wrong function #24

zzak00 opened this issue Apr 18, 2024 · 0 comments

Comments

@zzak00
Copy link

zzak00 commented Apr 18, 2024

Hello @tamerthamoqa,
I think there is an error in calculate_roc_values function with it comes to true_positive_rate, and false_positive_rate calculation, the mean should be calculated outside the loop, as we average across all folds

the new code:
def calculate_roc_values(thresholds, distances, labels, num_folds=10):
num_pairs = min(len(labels), len(distances))
num_thresholds = len(thresholds)
k_fold = KFold(n_splits=num_folds, shuffle=False)

true_positive_rates = np.zeros((num_folds, num_thresholds))
false_positive_rates = np.zeros((num_folds, num_thresholds))
precision = np.zeros(num_folds)
recall = np.zeros(num_folds)
accuracy = np.zeros(num_folds)
best_distances = np.zeros(num_folds)

indices = np.arange(num_pairs)

for fold_index, (train_set, test_set) in enumerate(k_fold.split(indices)):
    # Find the best distance threshold for the k-fold cross validation using the train set
    accuracies_trainset = np.zeros(num_thresholds)
    for threshold_index, threshold in enumerate(thresholds):
        _, _, _, _, accuracies_trainset[threshold_index] = calculate_metrics(
            threshold=threshold,
            dist=distances[train_set],
            actual_issame=labels[train_set],
        )
    best_threshold_index = np.argmax(accuracies_trainset)

    # Test on test set using the best distance threshold
    for threshold_index, threshold in enumerate(thresholds):
        (
            true_positive_rates[fold_index, threshold_index],
            false_positive_rates[fold_index, threshold_index],
            _,
            _,
            _,
        ) = calculate_metrics(
            threshold=threshold,
            dist=distances[test_set],
            actual_issame=labels[test_set],
        )

    (
        _,
        _,
        precision[fold_index],
        recall[fold_index],
        accuracy[fold_index],
    ) = calculate_metrics(
        threshold=thresholds[best_threshold_index],
        dist=distances[test_set],
        actual_issame=labels[test_set],
    )
    
    best_distances[fold_index] = thresholds[best_threshold_index]

# Calculate mean values of TPR and FPR across all folds
true_positive_rate = np.mean(true_positive_rates, 0)
false_positive_rate = np.mean(false_positive_rates, 0)


return (
    true_positive_rate,
    false_positive_rate,
    precision,
    recall,
    accuracy,
    best_distances,
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant