Compare Fuzzy Matching Algorithms

So What is Fuzzy Matching?

Fuzzy Matching (also called Approximate String Matching) is a technique used in computer science to determine how similar two strings of text are to each other. This is often used in situations where it is not possible to perform an exact match, such as when dealing with data that contains spelling errors, or when trying to match names or other text that can be written in multiple different ways. With fuzzy matching, a computer program can determine the degree of similarity between two strings of text, and can use this information to make decisions or to provide suggestions to a user. For example, a spell checker might use fuzzy matching to suggest alternative spellings for a word that is not in its dictionary, or a search engine might use fuzzy matching to provide suggestions for related searches.

There are many different algorithms that can be used for fuzzy matching, and the best one to use will depend on the specific situation and the type of data being matched. Some commonly used algorithms for fuzzy matching include the Levenshtein distance algorithm, the Jaro-Winkler distance algorithm, and the Damerau-Levenshtein distance algorithm. The Levenshtein distance algorithm calculates the minimum number of single-character edits (such as insertions, deletions, or substitutions) that are needed to transform one string into another, and is often used for spelling correction. The Jaro-Winkler distance algorithm is similar to the Levenshtein distance algorithm, but also takes into account the number of transpositions (i.e., when two characters are swapped) that are needed to transform one string into another. The Damerau-Levenshtein distance algorithm is an extension of the Levenshtein distance algorithm that also allows for the insertion, deletion, substitution, and transposition of characters, and is often used for more complex fuzzy matching tasks.

Fuzzy Matching Algorithms in Tilores

We provide the following Fuzzy Matching algorithms for the deuplication and linking of data in Tilores (docs):

Cosine (Cosine similarity)

DamerauLevenshteinAT (Damerau-Levenshtein distance with adjacent transpositions)

DamerauLevenshteinOSA (Damerau-Levenshtein with optimal string alignment distance)

Jaccard (Jaccard)

Jaro (Jaro similarity)

JaroWinkler (Jaro-Winkler similarity)

LCS (longest common subsequence)

Levenshtein (Levenshtein distance)

SorensenDice (Sørensen–Dice coefficient)

QGram (q-gram)

Hamming (Hamming distance - not in tool above)

Are we missing a fuzzy matching algorithm you would like to test? Let us know.

About Tilores

When you need to do fuzzy matching on high-volume data in real-time, you need a built-for-purpose technology: enter Tilores.

Consistently fast search response times

Built for unlimited serverless scaling

Real-time data ingestion and simultaneous search.

Configure matching rules easily in the UI

Data privacy compliant by design