Real-Time vs Batch Entity Resolution: When You Need Each

By Steven Renwick, CEO & co-founder, Tilores.

TL;DR: Use real-time entity resolution when a live application, AI agent, fraud check or support workflow must act on the current resolved record. Use batch entity resolution for bounded, offline work: periodic dedupe, historical backfills, warehouse loads and cost-controlled reconciliation where no live decision waits.

See it on your data: book a demo to walk through real-time entity resolution for your live workflows, or get the evaluation build to try it locally — then review the Tilores entity resolution software for teams that need current resolved customer context in live systems.

Real-time vs batch entity resolution: what changes?

Entity resolution, identity resolution and record linkage all describe the same core problem: deciding which records refer to the same real-world person, company or object. The Springer reference on data matching describes this as identifying, matching and merging records that correspond to the same entities across one or more databases.

The real-time versus batch decision is not about whether matching is deterministic or probabilistic. Good entity resolution often combines deterministic rules, probabilistic ML matching and fuzzy matching. The decision is about when the match happens, when the resolved view becomes usable, and whether a live system depends on it.

Dimension	Real-time entity resolution	Batch entity resolution
When matching happens	As records or events are ingested, so the resolved entity can update continuously.	During a scheduled or triggered job over a bounded dataset, export or file set.
Latency to a usable resolved record	Designed for live use after ingestion and matching, subject to the implementation and source-system delay.	Available after the batch job finishes and the result is loaded into the system that consumes it.
Fit for AI agents and live decisions	Strong fit when an agent, application or support workflow must retrieve current resolved context at query time.	Weak fit for live decisions; useful for preparing offline datasets or periodic snapshots.
Fit for fraud and AML checks	Strong fit when onboarding, payment risk or monitoring decisions depend on the freshest customer and relationship view.	Useful for periodic review, retrospective investigation and offline risk analysis.
Fit for periodic dedupe and backfill	Can absorb new records after the initial load, but may be unnecessary if nothing live depends on the result.	Strong fit for one-off historical loads, monthly dedupe, static exports and cost-bounded reconciliation.
Freshness of the resolved view	Current to the ingestion path and correction logic.	A snapshot of the data as it existed when the job ran.
Typical cost profile	Ongoing service cost and operational ownership, justified when freshness changes outcomes.	Bounded compute and storage cost, efficient when the work can wait.
Operational complexity	Requires reliable ingestion, idempotent updates, correction handling, observability and query-serving design.	Requires export discipline, scheduling, job monitoring, result loading and rerun handling.

When do you need real-time entity resolution?

You need real-time entity resolution when the answer changes what a system does now. If a customer signs up, starts a payment, opens a support case, asks an AI agent a question or changes an email address, the application should not rely on yesterday’s dedupe output if the identity decision affects risk, access, privacy or customer experience.

This is especially clear in fraud and AML work. FinCEN’s CDD FAQs describe ongoing customer due diligence as including ongoing monitoring to identify and report suspicious transactions and, on a risk basis, maintain and update customer information. That is hard to support well if the identity layer only sees a periodic export.

Real-time resolution is also the right fit when a resolved record is part of the serving path for an application. Tilores resolves and links records at ingestion, then applications retrieve the current resolved context at query time. That distinction matters: the entity is not assembled inside the AI prompt or at query time; the application queries a current resolved view that has already been maintained by the identity layer.

Is batch matching enough for AI agents?

Batch matching is enough for AI agents only when the agent is working from an offline dataset and no live customer or transaction decision depends on the answer. It is not enough when the agent has to answer a customer, update a ticket, assess risk, route a case or call a tool during a live interaction.

Modern agents use external functions to fetch data and call application actions. Official function-calling documentation describes this pattern as a way for models to interface with external systems and access data outside the model. Related agent documentation describes the same pattern for fetching data, calling APIs and executing actions.

If the tool returns a stale or fragmented customer view, the model can still produce a fluent answer. The problem is that it may be grounded in the wrong customer, the wrong account, or an incomplete relationship graph. For the same reason, vector search is not a substitute for identity resolution. Vector search can retrieve similar documents; it should not decide whether two customer records are the same person or company.

For more detail on that boundary, see Tilores’ article on whether LLMs can be used for entity resolution and the related EntityRAG article.

Where does batch entity resolution still make sense?

Batch entity resolution still makes sense when the input is bounded and the output is not needed by a live workflow. That includes monthly marketing-list dedupe, a one-off migration from a legacy CRM, a historical backfill before launch, warehouse reconciliation, analytics preparation and periodic data-quality review.

Apache Flink’s training material gives a useful general distinction: batch processing works over bounded streams, while stream processing works over unbounded streams that are processed as data arrives. Its execution-mode documentation also states that batch mode is for bounded jobs with known fixed input that do not run continuously, while streaming mode is for continuous incremental processing.

That maps cleanly to record linkage. If the input is a fixed export and the business only needs a final reconciled file, batch is often simpler and cheaper. If new records keep arriving and the application must query the current entity view, treating the problem as a nightly file job creates avoidable staleness.

A useful test: if the system can wait for the job to finish, batch may be fine. If a customer, agent, transaction or risk decision is waiting, use real-time resolution.

Streaming vs batch record linkage: what is the real difference?

Streaming record linkage means the identity layer can incorporate new records and update the resolved view while the system continues to run. Batch record linkage means the system processes a known set of records, produces an output, and then another system consumes that output.

The matching logic can be similar in both modes. You still need normalization, blocking or indexing, field comparison, deterministic rules, fuzzy matching, probabilistic scoring, thresholds, review handling and correction. The difference is that real-time systems must maintain the resolved entity state safely as records arrive, while batch systems can postpone the output until the job is complete.

This is why a simple “real-time is faster” framing is too thin. The question is whether the resolved entity is part of the live operating surface. In a live fraud check, a stale or fragmented identity can let repeat abuse look like a new customer. In an AI support agent, stale context can cause the agent to answer from the wrong ticket or account. In a monthly warehouse dedupe, the same delay may be perfectly acceptable.

Tilores is built for the real-time side of that decision: records are resolved as they are ingested, and applications can retrieve current resolved context through an API. The Tilores comparison of AWS and Tilores entity resolution systems is a useful related read for teams evaluating how batch and API-oriented approaches behave in practice.

When to use each: decision list

AI agent answering a live customer query → real-time. The agent must act on the current resolved record, not last night’s snapshot.
Monthly marketing-list dedupe of a static export → batch. No live decision depends on the result; cost-bounded offline reconciliation is fine.
Real-time fraud or payment risk check → real-time. A stale or fragmented identity can make a returning customer, account or device look unrelated at the point of decision.
One-off historical backfill or initial load → batch. A bounded job over a fixed dataset is usually the cleanest way to create the first resolved baseline.
Customer support assistant updating a live ticket → real-time. The assistant needs the current account, contact, entitlement and relationship context before it drafts or acts.
Warehouse analytics load → batch. If the warehouse consumes a daily or weekly snapshot and analysts are not making live operational decisions from it, batch is appropriate.
Ongoing KYC or AML relationship monitoring → usually real-time for the operating view, batch for review packs. The live system needs current relationships; compliance teams may still use periodic review extracts.
Cost-capped reconciliation of an old archive → batch. If the archive is static and the goal is to clean it once, an offline job is easier to budget and review.

How should this fit next to MDM, CDP, governance and warehouses?

Entity resolution should not be positioned as a replacement for MDM, CDP, data governance, KYC-AML systems or a warehouse. It sits next to them. MDM may own stewardship and master-data operating models. A CDP may own activation and audience workflows. A warehouse may own analytics. Governance systems may own policy, lineage and retention. KYC and AML platforms may own screening, case management and regulatory workflow.

The identity-resolution layer answers a narrower question: which source records belong to the same real-world entity, with enough evidence and freshness for the consuming system to use that answer. Once that answer exists, the other systems can do their own jobs with less duplication and less fragmentation.

That complement is important for architecture. A batch job can feed a warehouse or a stewardship review. A real-time identity-resolution layer can serve an AI agent, fraud check, onboarding flow or customer-facing application. Many mature teams need both. The mistake is letting the slower offline path become the dependency for a live decision.

FAQ

What is the difference between real-time and batch entity resolution?

Real-time entity resolution resolves records as they arrive and keeps the resolved view current for live queries. Batch entity resolution runs over a bounded dataset or export and produces results after the job finishes. Use real-time for live decisions; use batch when the data and decision can wait.

When do I need real-time identity resolution?

You need real-time identity resolution when an application, AI agent, fraud check, onboarding flow or support workflow must act on the current resolved record. If the decision is made while a customer or transaction is live, last night’s match file is usually the wrong dependency.

Is batch matching enough for AI agents?

Batch matching can prepare offline datasets for AI systems, but it is not enough when an AI agent must answer or act during a live interaction. The agent should retrieve current resolved context at query time from records that were resolved as they were ingested.

Can real-time and batch entity resolution run side by side?

Yes. Many teams use batch jobs for historical backfills, warehouse reconciliation and periodic dedupe, while using real-time resolution for new events and live decisions. The important point is to make the serving path explicit so applications know which view they are using.