Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about make Triplet dataset #1

Open
Kim-yonguk opened this issue Dec 11, 2019 · 2 comments
Open

about make Triplet dataset #1

Kim-yonguk opened this issue Dec 11, 2019 · 2 comments

Comments

@Kim-yonguk
Copy link

Is this a way to make hard triplet online? Is it offline?

@tamerthamoqa
Copy link
Owner

tamerthamoqa commented Dec 11, 2019

I would say it is Online since you are only selecting the triplets in a batch that pass the hard-negatives triplet selection condition instead of pre-computing the number of triplets you want to train on that pass the condition by doing a full pass on the training set at the start of each epoch.

Please do keep in mind my understanding may be false. I think the triplet generation before training would not count as Offline as it is only randomly generating triplets and not pre-computing any embeddings that pass the triplet selection condition. I have used this triplet selection method from tbmoon's 'facenet' repository and edited it to provide a numpy file containing the generated triplets to provide some 'reproducibility' in experiments, but the general way I know of generating triplets is to randomly pick anchors, positives and negatives on the fly to prevent selection bias.

It seems you will need a large batch size to get better performance using the Triplet Loss method, so you will need a GPU with a large VRAM (24 GB or more preferably) or multiple GPUs in parallel. I think the original FaceNet paper used a batch size of 1800 triplets and enforced a certain number of images per each identity in their dataset (40 face images per identity) that contained hundreds of millions of face images and they used a Semi-Hard negative triplet selection method.

It seems doing only a normal cross-entropy loss classification on the VGGFace2 dataset using an Inception-ResNet-V1 model architecture like in David Sandberg's 'facenet' repository will yield better results with less instability during training, so giving that a shot wouldn't hurt.

If you find any more information please let me know.

tamerthamoqa pushed a commit that referenced this issue Jan 2, 2021
@AGenchev
Copy link
Contributor

Before we compute the embeddings, it is not known whether the negative in the triplet selected is Hard, Semi-Hard or Easy. The random generation before a pass might yield many "Easy" triplets. When these are fed into a "large" "mini"-batch to be evaluated during training (and only the Hard/Semi-hard selected), then we call it "Online".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants