MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
-
Updated
Mar 26, 2024 - Python
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.
Compare html similarity using structural and style metrics
Golang metrics for calculating string similarity and other string utility functions
A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..
Aim is to come up with a job recommender system, which takes the skills from LinkedIn and jobs from Indeed and throws the best jobs available for you according to your skills.
Easy-to-use Java similarity algorithms for text and numeric-series
A package to compute medical segmentation metrics.
This is an implementation of the paper written by Yuhua Li, David McLean, Zuhair A. Bandar, James D. O’Shea, and Keeley Crockett
Exploring Jaccard and Cosine similarities performances then visualising their output using k means and kmeans with pca. Additional input on time series analysis, web scrapping and twitter scrapping.
BagMinHash - Minwise Hashing Algorithm for Weighted Sets
Spark functions to run popular phonetic and string matching algorithms
SetSketch: Filling the Gap between MinHash and HyperLogLog
A collection of string comparisons algorithms
A text similarity computation using minhashing and Jaccard distance on reuters dataset
ProbMinHash – A Class of Locality-Sensitive Hash Algorithms for the (Probability) Jaccard Similarity
Package provides java implementation of big-data recommend-er using Apache Spark
TreeMinHash: Fast Sketching for Weighted Jaccard Similarity Estimation
MinMax Circular Sector Arc for External Plagiarism’s Heuristic Retrieval Stage code
A Clojure library for querying large data-sets on similarity
Add a description, image, and links to the jaccard-similarity topic page so that developers can more easily learn about it.
To associate your repository with the jaccard-similarity topic, visit your repo's landing page and select "manage topics."