Precision and Recall in Entity (Identity) Resolution

Measuring the matching performance of an entity/identity resolution system can be challenging. Nevertheless, if labelled data is available - i.e. a volume of record data where you know already that certain records should be matched or linked to each other - then a classical “precision and recall” approach can be taken, where the overall performance of the entity resolution system can be measured as a “F number”.

Precision in Entity Resolution

Precision measures how many of the matches your entity resolution system identified are actually correct. For example, if your system says "John Smith from Company A" and "J. Smith from Company A" are the same person, and it's right, that's a true positive. But if it incorrectly matches two different John Smiths, that's a false positive. Precision is calculated as: True Positives / (True Positives + False Positives).

Recall in Entity Resolution

Recall measures how many of the actual matches in your dataset your system successfully found. If there are 100 pairs of records that should be matched because they refer to the same entity, and your system only finds 80 of them, your recall would be 80%. Recall is calculated as: True Positives / (True Positives + False Negatives).

F-Score in Entity Resolution

The F-score (or F1 score) combines precision and recall into a single metric, giving equal weight to both. It's particularly useful in ER because you often need to balance between being too aggressive in matching (which hurts precision) and too conservative (which hurts recall). The F1 score is calculated as: 2 * (Precision * Recall) / (Precision + Recall).

Example

Let's say you have a database of 1000 customer records, and there are actually 100 duplicate pairs (200 records that should be matched as pairs). Your entity/identity resolution system:

Identifies 90 pairs as matches
Of these 90 pairs, 80 are correct matches (true positives)
This means 10 are incorrect matches (false positives)
And it missed 20 actual matches (false negatives)

In this case:

Precision = 80/90 = 89% (89% of the matches it found were correct)
Recall = 80/100 = 80% (it found 80% of all actual matches)
F1 Score = 2 * (0.89 * 0.80) / (0.89 + 0.80) = 84%

When developing an entity resolution system, you might tune your matching thresholds or rules based on whether precision or recall is more important for your use case. For instance:

In a medical records system, you might prioritize precision because incorrectly merging two different patients' records could be dangerous
In a customer database deduplication task, you might lean towards higher recall to ensure you don't miss opportunities to consolidate customer information

F-Score Variations in Entity Resolution

An F1 score gives equal weight to precision and recall, however as discussed above, in certain circumstances we may want to prioritise precision or recall.

The general formula for F-beta scores is: F_β = (1 + β²) * (precision * recall) / (β² * precision + recall)

Where β is a parameter that determines the weight of recall relative to precision:

When β = 1, you get the standard F1 score (equal weight)
When β = 2, you get the F2 score (weights recall higher)
When β = 0.5, you get the F0.5 score (weights precision higher)

F2 Score:

The F2 score weighs recall twice as much as precision
This is useful when finding false negatives (missed matches) is more costly than false positives
Example use case: Identifying potential fraud cases where missing a fraudulent transaction is worse than flagging a legitimate one for review

F0.5 Score:

The F0.5 score weighs precision twice as much as recall
This is useful when false positives are more costly than false negatives
Example use case: Automated merging of patient medical records where incorrect matches could cause serious problems

F Score Examples

Using our previous example: With precision = 89% and recall = 80%:

F1 score = 84% (as we calculated before)
F2 score = 81% (favoring recall, so slightly lower because our recall was lower than precision)
F0.5 score = 87% (favoring precision, so slightly higher because our precision was higher than recall)

Summary

In entity resolution, you might choose different F-scores based on your specific needs:

Use F0.5 when the cost of wrongly merging records is high
Use F2 when missing actual matches is more problematic than making a few wrong matches
Use F1 when both types of errors are equally costly

Posts

Explore Similar Articles

The API to unify scattered customer data in real-time.