Join the weekly live Tilores demo webinar

Can LLMs be used for Entity Resolution?

By

Steven Renwick

With the rise of Large Language Models (LLMs) and their seeming omnipotence, it is perhaps not surprising that one of the most frequent questions I hear these days from senior (non-technical) executives is: can’t LLMs do entity resolution?

To answer this question in one word, allow me to use a handy German portmanteau: Jein (which is Ja (yes) and Nein (no) combined to basically mean “hmm, kinda”).

Yes, if you have two data records, such as customer profiles, you could ask a LLM if these two records belong to the same identity, and it will probably work quite well. Similarly, if you have a short list of customer records, you could ask an LLM to identify duplicate records and it will probably do a decent job.

But this is really a case of “when you have a sledgehammer, every problem looks like a nut”.

So what are the drawbacks of using LLMs for entity resolution? And where do they actually make a lot of sense?

Consistency & Reliability

Ask an LLM the same question in slightly different ways and there is a good chance you get a different answer. For example, asking 'Are John Smith at 123 Main St and J. Smith at 123 Main Street the same person?' versus 'Do these customer records refer to the same individual' might yield different confidence levels or even contradictory answers from the same LLM.

Then there are the straight out hallucinations. This means you can’t really rely on LLM-based entity resolution to always give the same fuzzy matching answer based on the same data, which is especially dangerous in a regulated environment.

A rules-based entity resolution system will always give the same results when the same data is repeatedly run through the system.

Explainability

“LLM says so” is not good enough to explain why two data records have been linked. Yes, you could ask the LLM to explain every pairwise connection, but this adds overhead and still has the above consistency and reliability problem.

In a rule-based entity resolution system, the rule that was used to trigger the linking between two records can be contained in the entity graph itself. This allows a user (or regulator!) to see exactly why two records were linked, down to the level of seeing which algorithms were applied to what data attributes with what thresholds and weightings.

Precision & Fine-tuning

When matching record data in entity resolution, you need to be able to control the precision (false positive rate) and recall (false negative rate) based on your use case. In some situations - such as in a credit bureau - a false positive match can be very dangerous, leading to potential fines, so you need to have very high precision. In other cases, such as fraud detection, you are looking for potential customer account matches, so you want to have high recall and can tolerate false positives.

With LLM-based entity resolution, you can’t easily adjust matching thresholds or implement use-case-specific matching tolerance.

Rules-based entity resolution allows bespoke matching accuracy based on your use case. Any matching adjustments can immediately be seen and measured, enabling you to fine-tune your matching configuration and, ultimately, the appropriate accuracy for your use case.

Temporal and Transient linking

In a specialised entity resolution solution, data can get complex but the system will handle it in a logical way so that it is still understandable and usable. For instance, individuals may change over time - a person’s name may change, they may have multiple addresses or email addresses over time. A company might change its name, or merge with another.

Representing that data relationship might mean records A to D are only linked transiently, via records B and C in a so-called entity-graph. Such an association needs to be persisted to make sense for future updates.

This is something that a LLM would not be able to handle when looking at a large body of data since an LLM cannot really maintain a memory of the relationships between different records over thousands or millions of data records.

Performance & Scalability Issues

LLMs are just slow. Processing a single record could take a few seconds, meaning it is completely unsuitable for real-time use cases, such as fraud-detection or customer on-boarding. Try to do entity resolutions on hundreds or thousands of records simultaneously and you will face significant problems.

High-performance entity resolution systems like Tilores are able to ingest and link records in milliseconds, and also simultaneously match thousands of records per second in parallel, without any performance degradation.

Cost

Token-based pricing makes large-scale LLM-based entity resolution extremely expensive. Processing millions of entity pairs through API calls becomes prohibitively costly.

Purpose-built entity resolution solutions are built with cost-efficiency in mind, meaning lower operational costs for high-volume workloads. We are not talking about small differences here. Using a LLM for entity resolution could easily be 100x more expensive, and possible 1000s of times more expensive than a bespoke entity resolution solution.

Then there is the API-rate limiting that most LLMs apply…

Sledgehammer for a Nut

There are many other drawbacks of using LLMs for entity resolution, but I think you get the point by now. Yes, they can do entity resolution, but LLMs are not the right tool for a job which is already solved by existing bespoke entity resolution systems. They are the proverbial sledgehammer being used to open a nut, when the nutcracker already does a fine job.

So when should we use LLMs in entity resolution?

It might sound like we are just being biased and anti-LLM, but that could not be further from the truth (ok, we are biased). LLMs are one of the most amazing technical transformations of our lifetime, and they do have their place in entity resolution, if used appropriately.

LLMs in Named Entity Recognition

Named entity recognition (NER) is the term for the extraction of entity attribute data from unstructured or semi-structured data. Imagine you had a large number of free text criminal court records. An LLM would excel at extracting the name of the defendants and the lawyers involved to create structured court records.

It would be highly likely that court records belonging to the same person would be extracted by an LLM with inconsistencies, especially if they are involved in multiple cases, therefore if I then wanted to link all court records that are linked to a particular individual, entity resolution would be needed in a subsequent step.

In this respect, LLMs make a perfect pre-entity resolution pre-processing tool for named entity recognition that would otherwise be difficult to achieve with regular natural language processing (NLP) techniques.

LLMs in the Loop

An important feature of an entity resolution system, is the so-called “human in the loop” whereby an individual human (e.g. a data analyst or domain specialist) can look at proposed pairwise matches that could belong together but do not meet a definitive threshold, but do warrant manual review.

It may be feasible to replace the human in this process with an agentic LLM to make these decisions. The above LLM shortcomings would still be applicable, but on a low enough volume of matches (on the assumption that even two human agents could propose different matching results) this would be an appropriate use for LLMs in entity resolution.

Guardrails for an “LLM in the Loop” would include justifying, in natural language, why each pairwise connection was made. And such agentic record connections would be flagged in the entity resolution system for potential future human auditing.

Entity Resolution based RAG for LLMs

Where we at Tilores are bullish about the potential for LLMs in entity resolution is actually the opposite way around: using an entity resolution system as a data source for LLMs - so-called retrieval augmented generation (RAG).

At Tilores we call this “IdentityRAG” where an LLM is connected, via LangChain, to a Tilores instance containing identity (e.g. customer) data. In cases where you need the LLM to be accurately retrieving information about specific identities - such as for customer service - an entity/identity resolution system excels at 1) ensuring that the correct identity is retrieved and 2) providing all possible information about the individual from disparate data sources within an organisation.

Where LLMs still make sense in Entity Resolution

For quick analysis of small data sets, I would probably use an LLM before an entity resolution system like Tilores
As a pre-processing step for Named Entity Recognition
As a replacement for humans in agentic edge care analysis.

Explore Similar Articles

The API to unify scattered customer data in real-time.