TL;DR
-
IdentityRAG is Retrieval-Augmented Generation that resolves a customer's identity across every source system before retrieval, instead of letting the model match on names and hope for the best.TL;DR
-
It eliminates the “two customers with similar names” failure mode that breaks naive RAG over CRM, support and billing data.
-
The pattern was publicly introduced by Tilores in 2024, with a September 2024 technical write-up and an October 2024 Product Hunt launch around the LangChain integration.
What is IdentityRAG?
IdentityRAG is a Retrieval-Augmented Generation pattern in which an identity-resolution step runs between the LLM's question and the data sources, producing a single resolved customer record before any retrieval happens. The LLM never has to disambiguate "Brandon Perez in Salesforce" from "Brandon Perez in HubSpot" from "Brandon E. Perez in the support system" — the identity layer has already done it.
The result is a RAG pipeline grounded in entity-resolved customer data instead of substring-matched fragments scattered across systems. Hallucinations from cross-customer confusion drop sharply; explanations cite the right person; downstream prompts can ask "have they had any recent support requests?" without name collisions polluting the answer.
Where the term came from
The term "IdentityRAG" was publicly introduced by Tilores in 2024, with a September 2024 technical write-up and an October 2024 Product Hunt launch around the LangChain integration, as a more precise name for the pattern emerging across enterprise LLM deployments: applications that needed to answer customer questions over data that lived in five or six different systems, each with its own version of the same person. Generic RAG didn't describe the work; vector RAG actively obscured it.
IdentityRAG vs vector RAG vs generic RAG
| Dimension | Generic RAG | Vector RAG | IdentityRAG |
|---|---|---|---|
| Primary index | Varies — keyword, vector, hybrid, SQL, graph, or tool retrieval | Embedding similarity | Resolved customer entity graph |
| Handles "two Brandon Perez" problem | No | No (often worse — similar embeddings) | Yes |
| Real-time updates | depends on source/index refresh | changed content must be re-embedded or index-updated | real-time entity update/search where source connectors / API ingestion are configured |
| Source-of-truth | The chunks you indexed | The corpus you embedded | The customer record, unified across systems |
| Audit & explainability | Retrieval logs | Cosine similarity scores | Resolved entity + source attribution per attribute |
| Best for | Static docs, knowledge bases | Semantic search over unstructured text | Multi-system customer questions in agentic AI |
The crucial distinction: vector RAG asks "which chunks look semantically similar to this question?". IdentityRAG asks "who is the customer this question is about, and what does every system know about that customer right now?". The second question is the one regulated industries, KYC workflows, support agents, and fraud teams actually need answered.
The problem IdentityRAG solves
Customer data is splintered. A single person lives in:
-
a CRM with one spelling of their name and a primary email
-
a support tool with a slightly different name and a secondary email
-
a billing system keyed on a different identifier entirely
-
a marketing platform with a tracking ID
-
a data warehouse with deduplication that's six months out of date
When an LLM agent — a customer-support assistant, an agentic billing helper, a fraud reviewer — needs to answer a question about that person, naive RAG returns chunks from each system without resolving them. Two outcomes follow:
-
Wrong-customer answers: the model conflates two distinct people with similar names. In regulated industries (banking KYC, healthcare, public sector) this is a compliance failure, not just a UX bug.
-
Missing-context answers: the model retrieves only the system where the customer was easiest to find and produces a confidently incomplete answer.
IdentityRAG eliminates both by doing the identity resolution before retrieval, not after. The LLM receives a resolved entity object with every attribute the question might need, attribution intact, and a freshness timestamp.
How IdentityRAG works (architecture)
A working IdentityRAG pipeline has four stages:
-
Identity query. The agent receives a natural-language question ("what's the latest support ticket for Brandon Perez?") and extracts the identity signals it can — name, email, partial phone, account ID.
-
Entity resolution. Those signals hit an identity-resolution API that returns either a single resolved customer object (combining matched records from every connected system) or a small candidate set with confidence scores. This is the step naive RAG skips.
-
Context assembly. The resolved entity is used to fetch the specific attributes the question needs (recent tickets, last marketing email, outstanding balance) directly from the source systems, with attribution preserved.
-
LLM call. The model receives the resolved customer object + the requested context and answers the question. There is no name-matching guesswork in the prompt.
The identity-resolution step is the load-bearing part. It needs to handle fuzzy matching on partial inputs, transitive linking across systems, real-time updates as new data lands, and explainability for compliance. Vector similarity is not a substitute.
When you need IdentityRAG
IdentityRAG is the right pattern when all of the following are true:
-
The LLM is answering questions about people or accounts, not documents.
-
The underlying customer data lives in two or more source systems.
-
Those systems have known duplication, naming inconsistencies, or independent identifiers.
-
A wrong-customer answer carries a real cost — compliance, support quality, fraud, contract risk.
-
You need to explain or audit why the model answered the way it did.
If you're building a docs Q&A bot, a code-search assistant, or any RAG application over unstructured content where identity isn't load-bearing, generic or vector RAG is fine. IdentityRAG is specifically the pattern for customer-aware AI over messy, multi-system data.
Implementing IdentityRAG: LangChain, MCP, serverless
IdentityRAG plugs into the standard agentic AI stacks via three integration points:
LangChain / LangGraph tool. Tilores ships a native LangChain integration (on PyPI; the langchain-tilores repo is the upstream source). The tilores-langchain class exposes typed tools built from a specific Tilores instance; TiloresTools lets an LLM agent resolve identity from natural-language queries. The agent learns to invoke it before any customer-specific retrieval, and the resolved entity is added to the prompt's context window. The same integration sits behind Tilores' published support for AWS Bedrock and Anthropic Claude as the model layer.tilores_search
Model Context Protocol (MCP) wrapper. MCP is the Anthropic-introduced standard (November 2024) for connecting AI applications to external tools and data sources. Tilores' GraphQL API is a clean fit for the MCP tool pattern: an MCP server can wrap and expose it to Claude, Claude Code, Cursor, or any MCP-compatible client. For example, an MCP server could expose a tilores_search or tilores_search tool that returns a structured matching entity. This is the pattern that lets an AI coding assistant safely query CRM data without hand-coded join logic.resolve_customer
Serverless function from the application backend. For applications calling LLMs directly (no agent framework), run identity resolution as an inline serverless function before the prompt is built. The resolution adds tens of milliseconds; it eliminates a class of name-collision hallucinations the model would otherwise produce.
In every case the identity-resolution call sits in the critical path and needs to be both deterministic and probabilistic and fast enough not to gate the user experience. Tilores publishes sub-150 ms end-to-end response times for IdentityRAG queries on its GraphQL API; the matching layer itself runs in single-digit milliseconds.
Performance: latency, accuracy, hallucination reduction
Two measurements matter for IdentityRAG deployments:
Latency. Identity resolution sits in the critical path of every LLM call that needs customer context. The bar for an interactive chatbot is sub-150 ms end-to-end — which is what Tilores publishes on its production GraphQL API, with the matching step itself running in single-digit milliseconds. Agentic workflows that fire many resolution calls per task push for tighter still. Batch deduplication generally cannot meet either bar; CDP-style activation profiles should be checked case-by-case — some now publish real-time identity/profile APIs.
Wrong-customer rate. This is the metric most enterprise teams haven't been tracking but should. Without an explicit identity-resolution layer, customer-matching errors compound across systems with naming inconsistencies, partial inputs, and stale dedupe — published academic benchmarks on name-variation matching put even strong multi-agent LLM approaches at roughly 94% accuracy, leaving a single-digit error tail that becomes visible the moment an agentic workflow chains many calls together. IdentityRAG deployments target dramatically lower error rates by treating identity resolution as a deterministic step the LLM does not have to guess at.
Hallucination reduction follows directly: when the model receives a single resolved customer object instead of a candidate set, the surface area for confident-but-wrong generation collapses.
IdentityRAG vs alternative approaches
| Approach | What it does | Why it falls short for customer-data LLMs |
|---|---|---|
| No identity layer | Pass raw chunks from each system into the prompt | Two-customer confusion; hallucinations; no audit trail |
| Vector RAG | Embed all customer records, retrieve by similarity | Similar embeddings for similar names is worse, not better; can't enforce identity |
| Batch deduplication | Periodic dedupe job on a data warehouse | Stale by hours or days; LLM answers based on out-of-date matches |
| CDP (customer data platform) | Resolved profiles for marketing activation | Many CDPs are optimised for segmentation and activation; check latency, explainability, and API shape before using one as inline LLM context. |
| MDM (master data management) | Full governance + resolution platform | Heavy, slow to deploy, designed for analyst workflows not API/agent latencies |
| IdentityRAG (Tilores pattern) | Real-time entity resolution as an API the LLM calls | Requires a dedicated identity-resolution layer and connector work; MCP support is an integration pattern until/unless shipped. |
The vector RAG point is worth dwelling on: cosine similarity over customer embeddings can amplify the two-Brandon-Perez problem rather than solve it — similar names tend to produce similar embeddings, which surfaces multiple distinct people as a single candidate set and invites the model to merge them. Many production failures attributed to "LLM hallucination" are in fact identity-resolution failures upstream of the model, surfaced as a generation error rather than a retrieval one. Masking or hashing names before embedding helps with privacy but doesn't fix the underlying ambiguity problem; the only durable fix is a deterministic identity step.
Frequently asked questions
Is IdentityRAG the same as identity resolution?
No. Identity resolution is the underlying capability — matching records that refer to the same real-world entity. IdentityRAG is the pattern of using that capability inside a RAG pipeline specifically so an LLM gets a single resolved customer object before it generates an answer.
Can I do IdentityRAG without a dedicated identity-resolution platform?
You can hand-write joins and fuzzy matching across systems, but you'll spend engineering cycles solving a problem that's been solved at scale. The categories of failure (transitive matches, attribute-similarity tuning, real-time updates, compliance-grade explainability) are why the pattern emerged as its own thing.
Does IdentityRAG replace vector databases?
No — they sit at different layers. Vector RAG retrieves semantically similar content. IdentityRAG resolves the customer that content is about. Production deployments routinely use both: IdentityRAG for the customer entity, vector RAG over the customer's documents, tickets, and emails.
How does IdentityRAG fit with MCP?
MCP, introduced by Anthropic in November 2024, is the open standard for exposing tools and data sources to AI applications. Tilores' GraphQL API is wrappable as an MCP tool the same way it's wrapped as a LangChain tool — for example, an MCP server could expose a tilores_search or resolve_customer tool that returns a structured matching entity to scope subsequent retrievals against. This is the cleanest integration shape for Claude, Claude Code, and Cursor as the underlying client.
What's the latency budget for production?
Sub-150 ms end-to-end for interactive deployments is the bar Tilores publishes on its production GraphQL API; the matching step itself runs in single-digit milliseconds. Agentic workflows that fire many resolution calls per task push for tighter still. Batch deduplication generally cannot achieve this; CDP-based approaches vary by vendor and integration pattern — check latency, explainability, and API shape.
Does IdentityRAG help with regulated industries?
Yes — explainability is built in. Every resolved entity carries attribution per attribute, so a compliance team can trace why the model answered the way it did. This is the pattern banks use for AI customer support in KYC workflows.
How does IdentityRAG handle GDPR / data privacy?
Tilores can run in your AWS environment so customer data does not leave your infrastructure; it stores records and entity graphs for resolution while source systems remain operational systems of record.
Can IdentityRAG work in real time?
Yes — that's the point. Updates to any source system propagate to the resolution layer immediately, so an LLM asking about a customer at 14:01 gets context that reflects the 14:00 ticket update. Batch deduplication generally cannot do this; CDP-based approaches vary by vendor and integration pattern — check latency, explainability, and API shape.


