Companies House
directors deduplicated

Companies House is the UK's registry of companies and directors. Unfortunately, the database has many duplicate director records. So we used Tilores to deduplicate the entire dataset.


The Results

10 million records. 614,121 duplicates linked.

10,108,807
Director records ingested

Full UK Companies House register

9,494,686
Unique director entities

After Tilores identity resolution

614,121
Duplicate records linked

6% of all records were duplicates

Static snapshot of Companies House data as of end of July 2023. Results are experimental and for research purposes only — false positives are unavoidable given the data's limitations.


Profile Metric Map

Duplicate director density across the UK

Each region shows the number of director entities, total profiles, and how far the data deviates from the ideal of one profile per person.


Real Examples

Even famous directors have duplicates

DB
David Beckham
6 profiles · 15 companies

The husband of Victoria Beckham has 6 separate Companies House profiles across 15 companies — including Victoria Beckham Limited. Companies House actually shows seven profiles; one is only linked to a dissolved company.

ES
Ed Sheeran
3 profiles · 11 companies

Three versions of Edward Christopher Sheeran exist in the registry, across 11 companies — including FAT PUNT LTD and the catchily named HAYAGOTATOURBOI TOURING LLP.

DB
Duncan Bannatyne
8 profiles · 33 companies

Our favourite former Dragons' Den dragon — once in the Royal Navy, court-martialed for threatening to throw his commanding officer overboard. Monaco resident Duncan has 8 profiles and is linked to 33 companies.

PJ
Peter Jones
8 profiles · 43 companies

Another Dragons' Den personality with 8 duplicate accounts across 43 companies. A data entry error even listed Peter's former name as "James Holdgate" — who turns out to be a company secretary.


Root Cause

Why is Companies House director data so bad?

Every time you register a new company and list yourself as a director, Companies House treats you as a completely new record if you use a different registered postal address.

Companies are only added once to Companies House and have a unique ID — the company registration number. Individual directors, by contrast, are added multiple times by the directors themselves, with no unique identifier connected to them. A classic identity resolution problem.

UK company registration costs as little as £12 and takes minutes, but Companies House does not verify the accuracy of the information filed. The Guardian has described the registry as full of fakes, fast sign-ups and frauds ↗.

Same person — three records
JOHN SMITH
14 High Street, London · ABC Trading Ltd
J. Smith
14 High St, London · Smith & Partners LLP
John A. Smith
14 High Street, London EC2 · XYZ Holdings
Different address at each company registration = three separate director records

Why It Matters

The consequences of unresolved director data

Due diligence is broken

Searching for a director in Companies House doesn't show all their associated companies. You have to search separately for each name variant and manually work out which records belong to the same person.

KYC and AML gaps

When directors have multiple unlinked profiles, compliance teams can miss the full picture — directorships at companies that should have triggered screening are invisible.

Banking compliance liability

Customers with multiple accounts can become a compliance liability. Banks performing perpetual KYC should know the instant a new customer joins whether they're related to an existing account.

Fraud goes undetected

Hidden connections between companies through shared directors — using slightly different name spellings — can mask fraud networks that only become visible through entity resolution.


The Method

How we deduplicated 10 million director records

01
Download the data

Downloaded ~5 million companies via the Companies House API — rate limited to 600 queries per 5 minutes. Total download time: over one month for all 10 million director records.

02
Develop matching rules

Used data samples to build fuzzy matching rules using Metaphone and Levenshtein distance algorithms to catch name variants, abbreviations, and spelling differences.

03
Run entity resolution

Imported all 10 million records into Tilores. AWS serverless architecture means import speed is only limited by how many resources are deployed — not a technical bottleneck.

Matching rules

Two director records were linked when they matched on similar name + same birth month/year, or similar name + identical postcode. Name similarity was calculated using Metaphone combined with Levenshtein distance.

Rule A
Similar name + same birth month/year
Rule B
Similar name + identical postcode

Important: Companies House only publicly provides the birth month — not the day. In a general population this would produce too many false positives, but for company directors (a smaller, specific subset) the match rate is acceptable for a public showcase. False positive matches are unavoidable with this data.


Legal Basis

Why we can do this

Companies House information is publicly available and free to search. The data forms the basis of all business credit reports supplied by agencies like Experian and Creditsafe, and is used by many organisations for due diligence, marketing, and lead generation.

Tilores relies on the legitimate interest ground for processing this personal data — providing a free service to help people conduct better due diligence on companies and their directors, to reduce risk and avoid fraud.

This is an experimental showcase. Results should not be considered definitive and are for research purposes only. No accuracy guarantees are made.


Want to resolve your own data?

If Companies House itself wanted to fix this — they're hosted on AWS, so Tilores could sort their data quickly. Same goes for any organisation that relies on Companies House data.