Resolving the world's
companies at scale
How Exiger built a 60-million-cluster company entity graph across 110 million source records to power AI-driven supply chain intelligence.
AI-powered supply chain risk, due diligence & compliance
Company entity resolution — 110M+ records into 60M clusters
5.5 billion underlying shipment records, near-global coverage
In production
A foundation for everything they map
Exiger is a market leader in due diligence and supply chain risk intelligence — trusted by more than 150 Fortune 500 companies and over 60 federal agencies, and named a Leader in the 2025 Gartner Magic Quadrant for Supplier Risk Management.
At the heart of its supply chain intelligence sits a deceptively hard problem: to map a supply chain reliably, you first need to know precisely which companies you are mapping. Entity resolution — recognising that records from different sources refer to the same real-world company — is foundational infrastructure for everything Exiger does.
“To map a supply chain really well, you need to understand which companies you are mapping to. This is where an entity resolution system is extremely crucial for our mission.”
A database of the world's companies — from registries to global trade data
There is no universal census of companies, so Exiger resolves more than 16 distinct sources into a single entity graph — and those sources span a wide spectrum. At one end sits clean, well-structured legal-entity reference data, ideal for high-confidence matching. At the other sits global shipping and customs data: a dataset of extraordinary value for tracking real-world trade flows, whose nature reflects how it is captured at the point of origin.
Customs and shipping records the world over are captured under real-world operational conditions — handwritten forms, scanned documents passing through international ports, and downstream optical character recognition. The data that emerges is rich and uniquely valuable, but inherently variable: company names can carry embedded address fragments, extraneous tokens, transliteration variants, and missing attributes. Matching records of this kind reliably against clean registry data is a different class of problem from standard entity matching.
Exiger's previous in-house, rule-based system had been tuned for high precision — and paid for it in recall. Lower recall meant under-clustering: duplicates accumulated across the database, making supply-chain maps and due-diligence searches harder for customers. The goal going into the evaluation was to raise recall dramatically without sacrificing precision.
“The names can be very challenging to work with — extraneous tokens, transliteration variants, and sometimes parts of the address embedded in the company name itself. A normal fuzzy name search, or any kind of exact name search, is just not going to match those records together.”
The decision came down to the hardest data
Exiger ran a rigorous two-month evaluation across multiple solutions, scoring precision, recall, F1, and cluster purity on an annotated ground-truth dataset drawn from its own sources. Competing solutions were strong on address-based resolution — but for much of Exiger's data there is no usable address, or no clear way to know which one to pick. Against the shipping and customs data, token-weighted matching proved decisive.
Two further factors confirmed the choice. First, flexibility: Tilores exposes a transparent, modifiable rules system rather than a black box. Second, architecture — Tilores sat cleanly as an entity layer inside Exiger's existing AWS infrastructure, leaving upstream pipelines and downstream systems untouched, with no need to "carve out a sandbox."
Two people named John Smith in one city are distinguishable. Two companies called Ideal Industries Inc. in the same state may be genuinely different organisations with no reliable discriminating attribute. Company names are reused, abbreviated, transliterated, embedded within larger fields, and stripped of their legal forms — matching them at global scale, with real recall, needs a different technical approach.
“The other options had their strengths — a lot of them were strong on address-based entity resolution. The issue is that a lot of our entities, we just don't have an address, or we didn't know which address to pick. It doesn't really solve the problem we have, which is grouping records from different companies regardless of their addresses.”
Nine months to production at scale
Exiger is cloud-native on AWS. Custom Spark pipelines route records from its sources into Tilores via the GraphQL API; a Kafka event stream propagates cluster events back into Exiger's downstream systems and data warehouse — continuously and automatically. Tilores acts as a pure entity layer, so no existing infrastructure had to be rebuilt around it.
Reaching production took close to nine months of iterative, technically demanding collaboration. Improving recall came early; the harder challenge was reducing over-clustering while holding onto that recall. The final configuration comprises 23 distinct clustering rules:
- ▹ Weighted-token matching — handles variable, partial company names while down-weighting common terms that cause over-clustering
- ▹ Geographical distance matching based on geo-coordinates
- ▹ ID-based matching where reliable identifiers exist
- ▹ Dedicated rules for Chinese business names and complex diacritics
- ▹ Fuzzy search and common abbreviation normalisation
- ▹ Consistency rules — reject a record from a cluster when a designated field (e.g. VAT ID) conflicts, even if other rules fired
Further reading: how Tilores consistency rules work, and the clique-based graph compression that emerged from this project.
When the scale exceeded expectations
At Exiger's scale, some clusters grew past 6,000 records with tens of thousands of matching edges — beyond anything Tilores had previously encountered. The fix, one of the most complex the team had tackled, was in production within a few weeks of the problem being reported. The resulting clique-based graph compression is now a core capability available to every Tilores customer.
“It's a complicated problem. We really emphasise data quality. We needed to make sure this was the best system we could find, and we never lowered our standards — we kept our bar of accuracy and we just had to make sure we met it.”
A resolved entity graph at global scale
The Tilores-powered entity graph now sits at the centre of Exiger's data infrastructure — resolving 110 million source records into roughly 60 million canonical company clusters, with searches returning in under 100 milliseconds even under parallel load. New records resolve continuously as they arrive, and a full initial load of 110 million records completes in under 24 hours. Across the collaboration, the overall pairwise F1 score rose roughly 30 points above the starting baseline.
Adoption is building across Exiger's product portfolio as teams integrate the entity API. Exiger has also begun running ML models over the graph to flag potential clustering issues and feed corrections back into Tilores — a reinforcing quality loop. Future entity types, including products and persons, are already under discussion.
“This is a very fundamental piece of data that will have downstream effects. We have AI models all over the place, and if you start off with bad initial input, you could end up with garbage in, garbage out. A lot of models will benefit from a good entity system early in the pipeline.”
More than a vendor relationship
The iterative nature of the build meant the Tilores team was effectively embedded alongside Exiger's data science and engineering function. Questions posted to a shared Slack channel were answered in detail by senior Tilores engineers — including the CTO — whatever the complexity.
“The Tilores team is one of the most professional teams I've ever worked with in my career. They really were part of our team — it felt like a family. They can have discussions at a very pedantic technical level and explain things at a business level when needed. They were a joy to work with.”
“What surprised me most was the amount of attention we received from the Tilores team during implementation. That kind of attention from a vendor — I haven't really seen it before. It's a phenomenal application, and the amount of work that gets done is a real credit to the team.”
Start from accurate company identity —
build better intelligence on top
See how Tilores resolves company records at scale. Available on AWS Marketplace.
Working with supply-chain data? See the Supply Chain Intelligence solution.