Does the US have a duplicate voter problem? We analysed 50 million records to find out.

Voter fraud is a politically charged topic in the United States. But strip away the politics and what remains is a data quality problem — one that entity resolution is well-suited to diagnose.

We gathered publicly available voter registration data from seven US states — Georgia, Florida, Michigan, North Carolina, Pennsylvania, Ohio, and Arkansas — covering approximately 50 million voter profiles. We then ran Tilores entity resolution across the full dataset. Here's what we found.

The methodology

Our analysis uses three matching techniques in combination:

Text similarity matching — fuzzy name matching to catch variations like "James" vs "Jim", hyphenated surnames, and transcription errors
Geographic proximity matching — linking records where addresses are similar or represent the same location under different formats
Temporal range matching — using date of birth ranges to account for data entry inconsistencies

Records flagged as potential duplicates were reviewed for confidence score, with confirmed matches requiring alignment across multiple fields. The 61 confirmed fraud cases represent instances where the same person appears to have voted more than once.

Data was sourced September–November 2023. All source data is publicly available under state open records laws.

The headline numbers

Across all seven states, we identified approximately 400,000 potential duplicate registrations — representing around 0.8% of the voting population in the analysed states.

That figure demands context. Not all duplicates represent fraud. The majority represent a genuine data management problem: voters who moved state or county, voters whose names appear in multiple formats, and voters who died but whose registrations were not removed in time. The system is producing duplicates faster than it can clean them up.

When voters move

The most common source of duplicates is interstate movement. When a voter moves from Georgia to Florida and registers in their new state, their old Georgia registration is not automatically cancelled. Both records remain active.

Name variations compound the problem. "James R. Wilson" in Georgia may appear as "Jim Wilson" in Florida — different enough that simple exact-match rules fail to link them, but identical enough that fuzzy matching catches the connection immediately.

This is not a detection failure so much as a structural one. US voter registration is managed at the state level with no real-time cross-state deduplication layer. Tilores identified thousands of these cross-state pairs in the data.

Close calls: where duplicates exceeded election margins

The most striking finding involves two states that were decided by narrow margins in the 2020 presidential election.

Pennsylvania was decided by 80,555 votes. We found 80,142 potential duplicate registrations in the state — a duplicate count that exceeds the margin of victory. We are not suggesting these duplicates affected the outcome; the confirmed fraud cases remain a small fraction. But the data illustrates how unresolved entity records create uncertainty in close elections.

Georgia was decided by 11,779 votes. We found 51,876 potential duplicates — more than four times the winning margin.

Florida had the highest absolute count: 148,516 potential duplicates at a 1.1% rate.

County-level variation

Duplicate rates vary significantly below the state level. In Arkansas, the statewide average is 0.92% — but Searcy County reaches 2.06%, more than double. This kind of county-level variation suggests the problem is partly driven by local data management practices rather than systemic factors alone.

Unusually, Arkansas shows a statewide voter registration rate of around 26% — far below the national average. This likely reflects aggressive purging practices rather than low civic participation, and it affects how duplicates are counted.

The dead may be voting

Michigan presents a different problem. We found registration rates exceeding 90% in some counties — a figure that becomes statistically implausible when you consider population turnover. The most likely explanation: deceased voters remain on the rolls because death registries are not synchronised with voter registration databases in real time.

A voter who dies while registered creates an orphaned record. Without automatic deduplication against death registries, that record persists indefinitely. It cannot vote on its own — but it creates an exploitable gap, and it inflates registration counts in ways that distort analysis.

Party affiliation in duplicate registrations

We analysed party affiliation among duplicate registrations where that data was available. Democrats showed a duplicate rate of 0.96%; Republicans 0.89%. The difference is not statistically significant enough to suggest any systematic pattern by party.

Among the 61 confirmed fraud cases in Ohio and Pennsylvania:

31 were affiliated with the Democratic Party
21 were affiliated with the Republican Party
6 cases involved both parties
3 had no party affiliation
2 were cross-state cases spanning Pennsylvania and Ohio

The confirmed fraud rate is vanishingly small relative to the total voting population — but it is real, and it was detectable through entity resolution.

What would actually fix this

Three changes would materially reduce the duplicate problem in US voter data:

Real-time cross-state deduplication. When a voter registers in a new state, that event should trigger a query against other state rolls. If a match is found above a confidence threshold, the old registration should be flagged for review and cancellation. This requires inter-state data sharing agreements and a common entity resolution layer — neither of which currently exists at scale.

Automatic death registry synchronisation. Social Security Administration death data should be matched against voter rolls continuously, not on an annual or ad hoc basis. The matching must use fuzzy logic, not exact match — death records and voter records use different name and address formats.

Fuzzy matching as a standard, not exact-match rules. Georgia's "exact-match" requirement for voter registration verification was specifically designed to catch duplicates — but exact matching misses the vast majority of genuine duplicates because real-world names and addresses are never entered consistently. Fuzzy entity resolution catches what exact-match rules miss.

Identity resolution and democracy

The deeper issue this analysis surfaces is a structural one: democratic processes that depend on accurate entity data will always be vulnerable when that data is managed without entity resolution.

This is not unique to voter registration. The same problem — records of the same entity fragmenting across systems, formats, and time — appears in healthcare, financial services, supply chains, and anywhere else that data is collected from multiple sources over time.

Voter data is just the most politically visible instance of a problem that is fundamentally about data infrastructure.

Why we did this

Tilores was built for situations like this: large datasets, real-world name and address variation, high stakes if the resolution is wrong. We ran this analysis because the voter registration dataset is one of the few large-scale, publicly available identity datasets in the US — and because the results illustrate what entity resolution can find, and what is missed without it.

If your organisation is working with data where duplicate records carry real-world consequences — in fraud, compliance, healthcare, or government — entity resolution is the right starting point.

View the interactive showcase →