-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add convenience wrapper for quick similarity computation #1
base: master
Are you sure you want to change the base?
Conversation
Hello Miraç, I am very excited you have found the package useful! And I have looked at your pull request. But I am confused, because the addition is contradictory to the purpose of imagehash package: (1) The main purpose of imagehash package is to avoid each-to-each image comparison, which happens, when using images4.Similar or any (!) Similar function. Hashes in this package are not meant for each-to-each image comparison. func Similar implies each-to-each comparison. You propose a func Similar, which merges image4.Similar with imagehash similarity. But it is faster to use image4.Similar directly. That is imagehash is not needed at all for this scenario. And it should not impact precision of comparison. (2) You propose the Image struct, which is a very large object to keep it in memory for every image. The purpose of hashes and icons was to avoid keeping images in memory. Additional purpose of icon is generalization for comparison, because pixel-by-pixel comparison of full images does not generalize well. Actually hashes are an additional memory-saving step to avoid keeping icons in memory. RECOMMENDATION Try to use images4 directly, unless you work with billions images. If you do work with billions of images, then imagehash(es) are used to compare by those hashes only - not any Similar func. One huge hash table is necessary - not the one like in the example on the package README. It is probably the confusion of terms, because I use "hash" as precise term related to an address in hash table, unlike many people use it for, say, Hamming distance calculations. In imagehash, no distances are calculated between hashes. But for icons - yes, distances are calculated. Am I missing something? Let me know what you think. Thank you for the contribution! It helps clarify package descriptions. |
Hello Vitali, Thanks for getting back to me! IIUC, you have two concerns: 1. PerformancePerhaps my PR got a bit confusing with all the images and similarities: My PR doesn't actually run return foundSimilarImage && images4.Similar(img.Icon, img2.Icon) When foundSimilarImage is false,i.e. when imagehash comparison doesn't yield a match or in other words, when images are dissimilar, images4.Similar() is not called because the first part of the expression ( This PR indeed improves performance as opposed to using I ran some informal experiments: 2. MemoryWhen I wrote this snippet for my own project before creating this PR, I didn't have type Image struct {
Icon images4.IconT
CentralHash uint64
HashSet []uint64
} I added I didn't consider the memory impact. You are right, when working with billions of images that would be prohibitive. I removed it now. Just as an example, after this change, in my experiments an average I hope that makes sense! Cheers, |
Miraç, got it, thanks! At this point I am thinking:
Hopefully your benchmarks reflect purely the comparison step with all icons/hash structs ready, without including image reads time. I am going slowly on this to make sure we are getting an elegant solution. Best, |
Thanks for working on this, it works like a charm!
As I was using your libraries, I thought it was a low-hanging fruit to add a simple convenience wrapper around imagehash & images4, to simplify user experience. It seems to me like new
Image.Similar()
is the only thing users need from the imagehash library, so we can hide away the complexity of centralHash and hashSet.Additionally, users could directly use the
imagehash.Image
struct to create a hashtable for quick lookup, e.g.:LMK if you think this is a good idea and what tests you'd like me to add for them.