Skip to content

I think DEEP should be Euclidean distance? #574

Open
@mageirakos

Description

@mageirakos

Thank you for providing a common format and benchmark suite for many standard datasets.

Issue:

I believe the original DEEP dataset is using Euclidean distance, not Angular as you have it.
Since, the vectors are l2-normalized, the two distances are highly correlated but not the same, so you might not notice immediately from QPS-Recall.

The only reason I am not certain and have a question mark in the title, is that based on #145, your download source is different and on another format from the following sources (.fvecs vs .ibin).

Sources:

I'm looking at big-ann-benchmarks regarding this issue, since the author of the original paper for DEEP is listed one of the organizers of the original '21 challenge (Artem Babenko). I've also consistently seen deep mentioned for euclidean distance on research papers, which makes sense as, to the best of my knowledge, that's more common for images, and IP/angular is more common for text data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions