Case Studies / Exiger

Resolving the world's
companies at scale

How Exiger built a 60-million-cluster company entity graph across 110 million source records to power AI-driven supply chain intelligence.

Industry

AI-powered supply chain risk, due diligence & compliance

Use case

Company entity resolution — 110M+ records into 60M clusters

Scale

5.5 billion underlying shipment records, near-global coverage

Status

In production


110M+
Source records
ingested and resolved
60M
Canonical clusters
resolved company entities
<24h
Full initial load
110M records clustered
+30pts
F1 improvement
vs. in-house baseline
100ms Query latency across 60M clusters
23 Clustering rules tuned jointly
5.5B Underlying shipment records
16+ Distinct data sources

About Exiger

A foundation for everything they map

Exiger is a market leader in due diligence and supply chain risk intelligence — trusted by more than 150 Fortune 500 companies and over 60 federal agencies, and named a Leader in the 2025 Gartner Magic Quadrant for Supplier Risk Management.

At the heart of its supply chain intelligence sits a deceptively hard problem: to map a supply chain reliably, you first need to know precisely which companies you are mapping. Entity resolution — recognising that records from different sources refer to the same real-world company — is foundational infrastructure for everything Exiger does.

“To map a supply chain really well, you need to understand which companies you are mapping to. This is where an entity resolution system is extremely crucial for our mission.”
Simon Baker
Simon Baker
SVP of AI Products & Supply Chain Intelligence, Exiger

The Challenge

A database of the world's companies — from registries to global trade data

There is no universal census of companies, so Exiger resolves more than 16 distinct sources into a single entity graph — and those sources span a wide spectrum. At one end sits clean, well-structured legal-entity reference data, ideal for high-confidence matching. At the other sits global shipping and customs data: a dataset of extraordinary value for tracking real-world trade flows, whose nature reflects how it is captured at the point of origin.

Customs and shipping records the world over are captured under real-world operational conditions — handwritten forms, scanned documents passing through international ports, and downstream optical character recognition. The data that emerges is rich and uniquely valuable, but inherently variable: company names can carry embedded address fragments, extraneous tokens, transliteration variants, and missing attributes. Matching records of this kind reliably against clean registry data is a different class of problem from standard entity matching.

Exiger's previous in-house, rule-based system had been tuned for high precision — and paid for it in recall. Lower recall meant under-clustering: duplicates accumulated across the database, making supply-chain maps and due-diligence searches harder for customers. The goal going into the evaluation was to raise recall dramatically without sacrificing precision.

“The names can be very challenging to work with — extraneous tokens, transliteration variants, and sometimes parts of the address embedded in the company name itself. A normal fuzzy name search, or any kind of exact name search, is just not going to match those records together.”
John Willcox
John Willcox
Data Scientist, Exiger

Why Tilores

The decision came down to the hardest data

Exiger ran a rigorous two-month evaluation across multiple solutions, scoring precision, recall, F1, and cluster purity on an annotated ground-truth dataset drawn from its own sources. Competing solutions were strong on address-based resolution — but for much of Exiger's data there is no usable address, or no clear way to know which one to pick. Against the shipping and customs data, token-weighted matching proved decisive.

Two further factors confirmed the choice. First, flexibility: Tilores exposes a transparent, modifiable rules system rather than a black box. Second, architecture — Tilores sat cleanly as an entity layer inside Exiger's existing AWS infrastructure, leaving upstream pipelines and downstream systems untouched, with no need to "carve out a sandbox."

Why company matching is harder than person matching

Two people named John Smith in one city are distinguishable. Two companies called Ideal Industries Inc. in the same state may be genuinely different organisations with no reliable discriminating attribute. Company names are reused, abbreviated, transliterated, embedded within larger fields, and stripped of their legal forms — matching them at global scale, with real recall, needs a different technical approach.

“The other options had their strengths — a lot of them were strong on address-based entity resolution. The issue is that a lot of our entities, we just don't have an address, or we didn't know which address to pick. It doesn't really solve the problem we have, which is grouping records from different companies regardless of their addresses.”
Simon Baker
Simon Baker
SVP of AI Products & Supply Chain Intelligence, Exiger

Implementation

Nine months to production at scale

Exiger is cloud-native on AWS. Custom Spark pipelines route records from its sources into Tilores via the GraphQL API; a Kafka event stream propagates cluster events back into Exiger's downstream systems and data warehouse — continuously and automatically. Tilores acts as a pure entity layer, so no existing infrastructure had to be rebuilt around it.

Reaching production took close to nine months of iterative, technically demanding collaboration. Improving recall came early; the harder challenge was reducing over-clustering while holding onto that recall. The final configuration comprises 23 distinct clustering rules:

  • Weighted-token matching — handles variable, partial company names while down-weighting common terms that cause over-clustering
  • Geographical distance matching based on geo-coordinates
  • ID-based matching where reliable identifiers exist
  • Dedicated rules for Chinese business names and complex diacritics
  • Fuzzy search and common abbreviation normalisation
  • Consistency rules — reject a record from a cluster when a designated field (e.g. VAT ID) conflicts, even if other rules fired

Further reading: how Tilores consistency rules work, and the clique-based graph compression that emerged from this project.

When the scale exceeded expectations

At Exiger's scale, some clusters grew past 6,000 records with tens of thousands of matching edges — beyond anything Tilores had previously encountered. The fix, one of the most complex the team had tackled, was in production within a few weeks of the problem being reported. The resulting clique-based graph compression is now a core capability available to every Tilores customer.

“It's a complicated problem. We really emphasise data quality. We needed to make sure this was the best system we could find, and we never lowered our standards — we kept our bar of accuracy and we just had to make sure we met it.”
Simon Baker
Simon Baker
SVP of AI Products & Supply Chain Intelligence, Exiger

Results

A resolved entity graph at global scale

The Tilores-powered entity graph now sits at the centre of Exiger's data infrastructure — resolving 110 million source records into roughly 60 million canonical company clusters, with searches returning in under 100 milliseconds even under parallel load. New records resolve continuously as they arrive, and a full initial load of 110 million records completes in under 24 hours. Across the collaboration, the overall pairwise F1 score rose roughly 30 points above the starting baseline.

Adoption is building across Exiger's product portfolio as teams integrate the entity API. Exiger has also begun running ML models over the graph to flag potential clustering issues and feed corrections back into Tilores — a reinforcing quality loop. Future entity types, including products and persons, are already under discussion.

“This is a very fundamental piece of data that will have downstream effects. We have AI models all over the place, and if you start off with bad initial input, you could end up with garbage in, garbage out. A lot of models will benefit from a good entity system early in the pipeline.”
Simon Baker
Simon Baker
SVP of AI Products & Supply Chain Intelligence, Exiger

In Their Own Words

More than a vendor relationship

The iterative nature of the build meant the Tilores team was effectively embedded alongside Exiger's data science and engineering function. Questions posted to a shared Slack channel were answered in detail by senior Tilores engineers — including the CTO — whatever the complexity.

“The Tilores team is one of the most professional teams I've ever worked with in my career. They really were part of our team — it felt like a family. They can have discussions at a very pedantic technical level and explain things at a business level when needed. They were a joy to work with.”
Simon Baker
Simon Baker
SVP of AI Products & Supply Chain Intelligence, Exiger
“What surprised me most was the amount of attention we received from the Tilores team during implementation. That kind of attention from a vendor — I haven't really seen it before. It's a phenomenal application, and the amount of work that gets done is a real credit to the team.”
John Willcox
John Willcox
Data Scientist, Exiger

Start from accurate company identity —
build better intelligence on top

See how Tilores resolves company records at scale. Available on AWS Marketplace.

Working with supply-chain data? See the Supply Chain Intelligence solution.