← Back to Blog
IdentityRAG July 4, 2024 · 5 min read

Improving LLM accuracy with Entity RAG

SR
Steven Renwick
Tilores
Improving LLM accuracy with Entity RAG

Author: Steven Renwick, CEO, Tilores. Published 4 July 2024 (2024-07-04). Updated 31 May 2026 (2026-05-31).

TL;DR (as of 2026): A vector database is the right RAG source for general document knowledge. But when an LLM prompt is about a specific customer, a vector database alone falls short — inaccurate search input can miss the relevant customer, and one customer may have several records that must be linked together for a complete 360° view. Adding entity resolution — which deduplicates and links a customer's scattered records into one resolved profile at ingestion — gives the LLM the correct, complete customer context at query time. It complements rather than replaces the vector database. Tilores calls this combination EntityRAG.

What is the difference between a vector database and an entity resolution API for AI applications?

The three data sources below are all used to ground an LLM in RAG. They differ in what they are good at, and in how well they handle a prompt about one specific customer. The full reasoning is preserved in the sections beneath this table.

RAG data sourceBest atLimitation for customer-specific RAG
Vector databaseRetrieving similar unstructured knowledge (e.g. policy documents)Can miss the right customer on inexact input; can't link a customer's multiple records into one profile
Graph databaseRepresenting connected, interlinked dataNo fuzzy search by default; slow on transitive hops; hard to guarantee retrieved data belongs to a single customer
Entity resolution API (EntityRAG)Deduplicating and linking a customer's records into one resolved profile at ingestionComplements rather than replaces the vector DB — used together for customer-aware RAG

Since Large Language Models (LLMs) exploded in popularity, largely led by OpenAI's ChatGPT, enterprises have been exploring how to run LLMs on private, internal data sets that LLMs have never seen before and which must remain private.

Retrieval Augmented Generation (RAG) is the technique of augmenting LLMs with additional data from specific sources that you control. In this post we introduce EntityRAG, created by Tilores, as a significant improvement for the accuracy and reliability of LLMs, especially in regulated environments.

Typically, a RAG data source is based on a vector database. Vector search excels at retrieving data from unstructured data sources. For example, if you were building RAG for an insurance company, you might store all of your company's standard insurance policy terms in a vector database. The LLM would therefore be able to retrieve similar knowledge and use the insurance company's insurance policies as context via the vector database when provided with an appropriate prompt.

However, what data source should a LLM use when the prompt is about a specific customer? In this case, a vector database would not be suitable for the following reasons:

  • Potential inaccuracies in search input could fail to find the relevant customer.
  • The relevant customer might have more than one customer record, which must be accurately linked together to have a complete customer 360° view.

To address this, we propose the use of entity resolution technology as a complementary data source to vector databases in RAG. We name this EntityRAG.

What is Entity Resolution?

Entity resolution is the process of deduplicating, linking, and making searchable, data records that relate to a real-world entity (i.e. person, company, product) that originate from one or more data sources. When used specifically for person data it is often referred to as "identity resolution".

It relies on the use of fuzzy matching algorithms, such as Cosine similarity (which is often used in vector databases), to recognise when records differ slightly for specific attributes, but should be considered the same.

Typically, entity resolution is used for financial crime detection, fraud prevention and as part of a customer data platform (CDP) to match customer profiles from different sources.

Importantly for regulated industries, where accuracy of data is critical, entity resolution systems can be fine-tuned to provide highly accurate, reliable and explainable results.

What about Graph Databases?

Graph databases, such as Neo4j, have also been gaining popularity lately as a RAG source. Indeed graphs are exceptionally well-suited for organizing and depicting diverse and interlinked data in a structured way and can be complementary to vector databases in RAG.

However, they have certain limitations that make them less than ideal for certain LLM applications.

  • Lack of fuzzy search
  • Difficulty in handling transitive hops when interlinked records need to be retrieved quickly.
  • Lack of accuracy for distinguishing and delineating distinct entities.

By default, graph databases do not include fuzzy search. So if you search for "Phillip Mitchel" you would not retrieve the data for "Philipp Mitchell" due to the spelling mistake. You can overcome this by using a search engine, such as Elasticsearch, but then you are introducing another complex system that needs to be managed.

Transitive hops are a particular problem with graph databases. Because all data is, in theory, connected to all other data, retrieving data when the graph needs to traverse several nodes (data records) can be very slow.

The accuracy of distinct entities is a particular problem in regulated industries. When you retrieve the data related to one customer, and it contains data from records that have been linked together, perhaps from disparate sources, you need to be certain that they belong together.

While graph databases excel at representing complex relationships, they may face challenges in certain scenarios, particularly when dealing with customer data in regulated industries. The interconnected nature of graph structures can sometimes make it difficult to guarantee that retrieved data pertains only to a single customer, especially when dealing with large, complex datasets.

EntityRAG for accuracy and reliability

To examine whether entity resolution in combination with a vector database would produce superior results to a vector database alone, we teamed up with enterprise LLM specialists Kern.ai to create a demonstration scenario in a travel insurance company example. Scroll to the bottom for the webinar video.

In this demo, we tested three setups:

  • Baseline RAG - a vectorDB and a LLM
  • Enhanced RAG - using Kern's custom reranking model
  • Enhanced RAG + EntityRAG - enriching the enhanced RAG with results from Tilores

Baseline RAG

Baseline RAG as configured in Kern.ai's cognition platform. Baseline RAG using a Vector DB alone produced very vague results.

Enhanced RAG

Enhanced RAG using Kern.ai's custom reranking LLM. Enhanced RAG produced a more nuanced response but there is still some ambiguity.

Enhanced RAG + Tilores EntityRAG

In this case, the first name and last name were extracted from the customer message and used to search for the customer in Tilores. The name as extracted from the input includes spelling mistakes. What is retrieved from Tilores are two customer profiles, which have been linked, which include information about all current and previous insurance plans.

The resulting response is far more specific and useful compared to the previous two results, as thanks to the EntityRAG the LLM knows which customer is relevant and which insurance policies they currently have.

Conclusion

Our collaboration with Kern.ai has demonstrated that an advanced entity resolution system like Tilores can significantly enhance the accuracy and relevance of large language model (LLM) responses. This innovation, embodied in EntityRAG, is set to help regulated enterprises develop LLM applications with improved accuracy and relevance.

Once your customer data is unified within Tilores, this unified, deduplicated data serves as a reliable source of customer truth, powering functions like fraud prevention, KYC (Know Your Customer), and marketing operations — without the task of building an internal data infrastructure or the expense of migrating data into a centralized data warehouse.

The current view (2026): GraphRAG and where entity resolution still wins

Since this article was first published, the graph-database approach to RAG has been formalised under the name GraphRAG. Microsoft Research introduced GraphRAG on 13 February 2024, using an LLM-generated knowledge graph as the retrieval layer. Microsoft's stated strengths are "connecting the dots" across disparate information and holistic, whole-dataset summarization, plus provenance so an answer can be shown to be grounded in the dataset.

Neo4j Labs now positions GraphRAG explicitly as a hybrid: it "combines scored vector search with structural graph search," using vector search to find relevant chunks and then following graph relationships outward (multi-hop), and it frames this as replacing "black box vector only search" with a traceable path.

This validates the original point of this article rather than overturning it: graph and vector are complementary, and a knowledge graph adds structure and provenance. But GraphRAG is optimised for understanding a corpus of documents — not for guaranteeing that the records retrieved for one named customer actually belong to that customer.

That single-customer guarantee is exactly where entity resolution still wins. It adds fuzzy matching by default (so "Phillip Mitchel" still resolves to "Philipp Mitchell"), it deduplicates and links a customer's scattered records into one resolved profile at ingestion, and in regulated industries it can be tuned for explainable, auditable matches. So in 2026 the practical pattern is not "vector vs graph vs entity resolution" but: use a vector database (or GraphRAG) for document knowledge, and add an entity resolution layer when the answer depends on getting one customer's identity exactly right.

Want to try this yourself?

Build your own EntityRAG: sign up for a free Tilores account to unify your customer data for your own Entity RAG application, or explore Tilores entity resolution software to see how real-time matching links a customer's records into one resolved profile.

Frequently asked questions

What is the difference between a vector database and an entity resolution API for AI applications?
A vector database excels at retrieving similar knowledge from unstructured sources, such as an insurance company's standard policy terms. But when the prompt is about a specific customer, a vector database can fail: inaccuracies in the search input may miss the relevant customer, and a single customer may have more than one record that must be accurately linked for a complete 360° view. Entity resolution addresses this by deduplicating, linking, and making customer records searchable — so it works as a complementary RAG data source alongside the vector database. Tilores calls this combination EntityRAG.
Why are graph databases not ideal as the customer-data source for RAG in regulated industries?
Graph databases lack fuzzy search by default (a search for "Phillip Mitchel" would not retrieve "Philipp Mitchell" without adding a separate search engine such as Elasticsearch), they can be slow when retrieval needs several transitive hops across linked records, and in regulated industries it can be hard to guarantee that interlinked data retrieved for one customer belongs only to that customer.
Does combining entity resolution with a vector database actually improve LLM answers?
In a travel-insurance demonstration built with enterprise LLM specialists Kern.ai, three setups were compared: baseline RAG (vector DB + LLM) produced vague results; enhanced RAG with Kern.ai's custom reranking model was more nuanced but still ambiguous; enhanced RAG plus Tilores EntityRAG returned the most specific, useful response, because the misspelled customer name resolved to two linked customer profiles containing all current and previous insurance plans.
Does a vector database alone work for customer-specific prompts?
Not reliably. A vector database is excellent for retrieving similar unstructured knowledge, such as an insurance company's standard policy terms. But when the prompt is about one named customer, two things break: inaccuracies in the search input can fail to find the relevant customer, and that customer may have more than one record that must be linked together for a complete 360° view. Entity resolution closes that gap by linking the scattered records into one resolved profile — which is why Tilores proposes it as a complementary RAG source alongside the vector database.

Webinar video

Watch the EntityRAG webinar

Ready to try entity resolution?

Start Building Free →