Beyond Vector Databases: How Identity Resolution Powers Customer-Centric AI

The AI infrastructure stack is maturing rapidly. Vector databases have become the default choice for retrieval-augmented generation (RAG), and for good reason — they’re excellent at finding semantically similar content. But when it comes to customer data, vector similarity has a fundamental blind spot.

The Limits of Vector Similarity for Customer Data

Vector databases work by converting text into high-dimensional embeddings and finding records with similar embeddings. This is powerful for document search, knowledge base queries, and content recommendation. But customer identity doesn’t work like semantic similarity.

Consider these two customer records:

Record A: “Sarah Johnson, sarah.j@acme.com, CRM”
Record B: “SARA JOHNSON, +49 555 1234, ERP”

These records have low text similarity. The name is spelled differently, one has an email and the other has a phone number, and they come from different systems. A vector database might not even retrieve Record B when searching for Record A.

But they’re the same person. And an AI application that only knows about Record A is working with half the picture.

Identity Resolution: The Missing Layer

Identity resolution solves a different problem than vector search. Instead of “what text looks similar?”, it answers “which records belong to the same real-world person?”

This requires domain-specific matching logic:

Name matching that understands “Sara” ≈ “Sarah” ≈ “S.” but ≠ “Sandra”
Address normalization that equates “Hauptstr. 14” with “Hauptstraße 14”
Cross-attribute linking — records that share a phone number but have different names may still be the same person
Transitive matching — A matches B, B matches C, so A-B-C are all one entity, even though A and C share no attributes

No amount of embedding sophistication can replicate this. It’s a fundamentally different operation than similarity search.

The Customer-Centric AI Stack

For AI applications that interact with customer data — support chatbots, sales copilots, marketing assistants, fraud detection systems — the ideal stack combines both approaches:

Identity resolution (Tilores) — resolves which records belong to which person, creates unified golden records
Vector database — indexes unstructured content (documents, chat logs, knowledge articles) for semantic search
LLM — generates responses using both the resolved customer context and the retrieved documents

Identity resolution handles the structured customer data. Vector search handles the unstructured content. The LLM synthesizes both into a coherent response.

Why This Matters Now

As enterprises move from experimentation to production AI, accuracy becomes non-negotiable. A support chatbot that gives a customer incomplete order history isn’t just unhelpful — it’s a trust violation. A fraud detection system that can’t see all of a customer’s accounts has blind spots that cost real money.

The companies that get customer-centric AI right will be those that invest in data foundation, not just model capability. The most sophisticated LLM in the world can’t compensate for fragmented input data.

IdentityRAG: Putting It Together

We built IdentityRAG as a reference implementation for this architecture. It’s a LangChain retriever that uses Tilores to resolve customer identities before the LLM generates a response.

The result: AI applications that know who your customers are — not just what text looks similar to their name.

Explore the IdentityRAG reference implementation on GitHub, or start with the free tier to try Tilores with your own data.