Enterprise Entity Resolution at Scale: Inside a 110-Million-Record Company Graph
Most entity resolution demos look great on a few thousand clean records. Enterprise entity resolution is a different problem entirely. At enterprise scale you are resolving hundreds of millions of records, from sources that were never designed to agree with each other, while the system stays live, fast, and auditable — and while the rest of your data platform keeps running around it.
This article looks at what enterprise-scale entity resolution actually requires, using a real enterprise deployment as the reference point: how Exiger built a 60-million-cluster company entity graph across 110 million source records to power its AI-driven supply chain intelligence. The full customer story lives on the case study page — here we focus on the lessons for any enterprise evaluating entity resolution software.
What “enterprise-scale” actually means
When teams say they need entity resolution, they usually mean matching. At enterprise scale, matching is only one of six requirements, and rarely the one that breaks first:
- Volume — hundreds of millions of records, with cluster sizes that can run into the thousands. Naïve O(n²) comparison is unusable here; an enterprise system needs blocking and graph strategies that hold up as data grows.
- Real-time ingestion — new records have to resolve into the graph continuously as they arrive, not in an overnight batch. A static, batch-only system is a non-starter for most enterprise workflows.
- Accuracy on both axes — high precision is easy if you sacrifice recall. Enterprise data quality depends on getting both right, because under-clustering quietly fills the database with duplicates.
- Explainability — every resolution decision needs to be inspectable: which records matched, on which attributes, under which rule. Enterprise software that can’t show its work is a compliance and debugging liability.
- Customisable rules — enterprise data is specific. A modifiable, transparent rules system beats a black box you can’t adapt.
- Architecture fit — the system has to sit inside existing enterprise infrastructure (cloud, data warehouse, streaming) without forcing everything else to be rebuilt around it.
Miss any one of these and an entity resolution tool that benchmarks beautifully will still fail in production.
The proof point: 110M records, resolved
Exiger is an enterprise leader in supply chain risk and due diligence, trusted by more than 150 Fortune 500 companies and over 60 federal agencies. To map supply chains reliably, it first had to answer a deceptively hard question at enterprise scale: which companies are actually in the chain?
Working with Tilores, Exiger now resolves:
- 110M+ source records, drawn from 16+ distinct data sources
- into 60M canonical company clusters
- with a full initial load completing in under 24 hours
- and searches returning in under 100 milliseconds, even under parallel load
- lifting the overall pairwise F1 score by roughly 30 points over the previous in-house baseline
That combination — enterprise volume, real-time resolution, and a measurable accuracy gain — is the bar enterprise entity resolution has to clear.
Why enterprise data breaks naïve matching
Enterprise sources span a wide spectrum. At one end sit clean, well-structured legal-entity registries, ideal for high-confidence matching. At the other sit global shipping and customs records — a dataset of extraordinary value for tracking real-world trade flows, whose nature reflects how it is captured at the point of origin: company names can carry embedded address fragments, extra tokens, transliteration variants, and missing attributes.
That spread is exactly why standard fuzzy matching struggles in the enterprise. Address-based matching is strong when addresses are reliable — but across enterprise trade data, much of the time there is no usable address. The decisive capability for Exiger was token-weighted matching, which weights rare, meaningful tokens over common ones, plus consistency rules that reject a record from a cluster when a designated field (such as a VAT ID) conflicts — even when other rules would have matched it.
“The other options had their strengths — a lot of them were strong on address-based entity resolution. The issue is that a lot of our entities, we just don’t have an address. It doesn’t really solve the problem we have, which is grouping records from different companies regardless of their addresses.” — Simon Baker, SVP of AI Products & Supply Chain Intelligence, Exiger
Build versus buy at enterprise scale
The instinct at many enterprises is to build entity resolution in-house. It is worth being honest about the cost. A production-grade engine — ingestion, transformation, blocking, matching, the entity graph, a query API, and real-time updates — is years of specialised work, and the hardest part isn’t getting matches; it’s controlling over-clustering while holding onto recall as volume grows.
Even with a capable vendor, reaching enterprise production took close to nine months of iterative engineering for Exiger, including a fix for clusters that grew past 6,000 records — a problem that produced clique-based graph compression now available to every Tilores customer. The lesson for most enterprises: unless entity resolution is your product, buying a system that already solves blocking, real-time updates, and explainability frees your team to build on top of a clean entity layer instead of maintaining one.
An evaluation checklist for enterprise entity resolution
If you are comparing enterprise entity resolution software, test against your hardest data, not a clean sample:
- Run a real benchmark. Measure precision, recall, F1, and cluster purity on annotated ground truth from your own sources — vendors’ headline numbers are easy to flatter.
- Test the messy edge, not the clean middle. The system that wins on address-rich data may collapse on the records that actually matter to you.
- Demand explainability. Every match should return the attributes, source records, and rules behind it.
- Check real-time behaviour. Confirm single-record resolution latency under load — not just batch throughput.
- Confirm architecture fit. It should slot into your cloud, warehouse, and streaming stack without a rebuild.
- Inspect the rules. You want a transparent, modifiable rules system, not a black box.
Enterprise entity resolution isn’t a feature you switch on — it’s foundational data infrastructure that everything downstream depends on. As Simon Baker put it: “if you start off with bad initial input, you could end up with garbage in, garbage out. A lot of models will benefit from a good entity system early in the pipeline.”
See the full story: read the Exiger × Tilores case study, or explore how Tilores powers Supply Chain Intelligence at enterprise scale.
See what resolved entity data does for your business — and your AI.