SΓΈrensen-Dice Coefficient
Similarity metric that measures overlap between two samples using character bigrams.
Tilores uses SΓΈrensen-Dice in production β so you can automate matching with rules you configure.
Try it yourself
How it works
The SΓΈrensen-Dice coefficient is similar to Jaccard but weights the intersection more heavily: it divides twice the intersection by the sum of both set sizes. This tends to produce higher similarity scores than Jaccard for partial matches. Like Jaccard, it uses character bigrams and is insensitive to string order. It is commonly used in data deduplication where a slightly more lenient matching threshold is desired.
Use cases in entity resolution
Related tools
Don't implement this yourself
Tilores Studio runs the full matching engine β this algorithm plus configurable rules and real-time entity resolution β locally on your machine. Free, no account, no cloud. Load your own data and see it working in minutes.