Python wrapper on strsim a Rust implementations of string similarity metrics:
- Hamming
- Levenshtein - distance & normalized
- Optimal string alignment
- Damerau-Levenshtein - distance & normalized
- Jaro and Jaro-Winkler - this implementation of Jaro-Winkler does not limit the common prefix length
The normalized versions return values between 0.0
and 1.0
, where 1.0
means
an exact match.
pip install xdistances
Go to https://xdistances.readthedocs.io for the full documentation.
>>> import xdistances
>>> xdistances.hamming("hamming", "hammers")
3
>>> xdistances.hamming("hamming", "hammer")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: Lenght mismatch
>>> xdistances.levenshtein("kitten", "sitting")
3
>>> xdistances.normalized_levenshtein("kitten", "sitting")
0.5714285714285714
>>> xdistances.osa_distance("ac", "cba")
3
>>> xdistances.damerau_levenshtein("ac", "cba")
2
>>> xdistances.normalized_damerau_levenshtein("levenshtein", "löwenbräu")
0.2727272727272727
>>> xdistances.jaro("Friedrich Nietzsche", "Jean-Paul Sartre")
0.39188596491228067
>>> xdistances.jaro_winkler("cheeseburger", "cheese fries")
0.9111111111111111
If you don't want to install Rust itself, you can run $ ./dev
for a
development CLI if you have Docker installed.
Benchmarks require a Nightly toolchain. Run $ cargo +nightly bench
.