Skip to content

A utility library for comparing strings via the Jaccard similarity algorithm

License

Notifications You must be signed in to change notification settings

soenneker/soenneker.utils.string.jaccardsimilarity

Repository files navigation

Soenneker.Utils.String.JaccardSimilarity

A utility library for comparing strings via the Jaccard similarity algorithm

Installation

dotnet add package Soenneker.Utils.String.JaccardSimilarity

Why?

Jaccard similarity is great for comparing sets of items, and it's often used for tasks like detecting similar documents or recommending content. It's useful because:

Set-Focused:

It works well when you care about what elements are present, not their order.

Scale Doesn't Matter:

It's not influenced by how big the sets are, just by what they share.

Efficient:

It's quick to calculate making it suitable for large datasets.

Handles Noise Well:

It stays reliable even if there's extra, less important information in the sets.

Usage

var text1 = "This is a test";
var text2 = "This is another test";

double result = JaccardSimilarityStringUtil.CalculateSimilarityPercentage(text1, text2); // 60