150 Million Records, 99.5% Accuracy, 36 Hours
When a major enterprise approached us with the challenge of unifying 150 million customer records scattered across dozens of source systems, we knew this would be the ultimate stress test for Tilores. What we didn’t know was just how fast the results would come.
The Challenge
The customer — a large financial services organization operating across multiple European markets — had a problem that will sound familiar to any data engineering team:
- 150 million customer records spread across 30+ source systems
- No shared identifier between systems — each had its own customer ID scheme
- Massive duplication — the same customer appeared under different names, addresses, and email formats
- Regulatory pressure — GDPR compliance required a unified view for data subject access requests
- Previous attempts failed — two prior projects over 18 months had not delivered usable results
The company had tried building entity resolution in-house using Elasticsearch and custom matching logic. After 18 months and significant engineering investment, the accuracy was below 80% and the system couldn’t handle the full data volume without multi-day batch processing windows.
The Approach
We connected Tilores to the customer’s data sources using a combination of prebuilt connectors (for Salesforce and their data warehouse) and the REST API (for legacy systems). The process followed three phases:
Phase 1: Data Ingestion (4 hours)
All 150 million records were ingested into Tilores via bulk import. Tilores’s built-in data transformation layer handled normalization automatically — standardizing name formats, parsing addresses, normalizing phone numbers across country codes, and handling character encoding differences between legacy systems.
Phase 2: Entity Resolution (28 hours)
Tilores’s matching engine processed the entire dataset, comparing records across multiple attributes using configurable fuzzy matching rules. The matching considered:
- Name similarity (handling nicknames, transliterations, maiden names)
- Address normalization (street abbreviations, postal code formats across countries)
- Phone number matching (international formats, mobile vs. landline)
- Email domain equivalence (gmail.com vs. googlemail.com)
- Custom business attributes (account numbers, policy references)
Phase 3: Validation and Tuning (4 hours)
A sample of resolved entities was reviewed by the customer’s data quality team. Initial accuracy was 98.2%. After one round of rule tuning — adjusting the weight given to address matching for cross-border customers — accuracy reached 99.5%.
The Results
After 36 hours of total processing time (including validation), the customer had:
- 150 million records resolved into 47 million unique entities — a 69% reduction in apparent customer count
- 99.5% matching accuracy validated by the data quality team
- Full data lineage — every resolved entity links back to its source records with confidence scores
- Real-time updates — new records are now resolved on ingestion in under 10ms
For comparison, the previous in-house attempt had achieved 80% accuracy after 18 months of development and required 72-hour batch processing windows for each full run.
Why It Worked
Three architectural decisions made this possible:
Serverless scaling. Tilores’s serverless architecture scaled automatically to handle the ingestion and matching load. No capacity planning, no cluster sizing, no infrastructure provisioning. The system allocated the compute it needed and released it when done.
Purpose-built matching engine. Unlike generic search engines repurposed for entity resolution, Tilores’s matching engine was designed from the ground up for this exact problem. It combines data transformation, fuzzy matching, and entity assembly in a single optimized pipeline.
Explainable rules, not black-box ML. The matching rules are configurable and explainable. When the data quality team found edge cases in the initial run, they could understand exactly why two records were (or weren’t) matched, and adjust the rules accordingly. This made the tuning phase take hours instead of weeks.
What’s Next
The customer is now using Tilores in production for real-time entity resolution. Every new customer interaction — whether it’s a new account opening, a support ticket, or a transaction — is resolved against the unified entity graph in under 10ms.
The unified customer view now powers their GDPR compliance workflow, fraud detection systems, and personalized marketing campaigns. What used to require quarterly batch jobs and manual data quality reviews now happens automatically, continuously, in real-time.
Want to see what Tilores can do with your data? Start with the free tier — no credit card required.
Ready to try entity resolution?
Start Building Free →