Identity resolution
glossary
Key terms and concepts in entity resolution, data matching, and customer data unification.
Processing all records at once in a scheduled job. Suitable for initial data loads and periodic cleanup, but introduces delays between data ingestion and entity resolution.
A technique used to reduce the number of record comparisons in entity resolution. Records are grouped into "blocks" based on shared attributes, and comparisons only happen within blocks.
A numerical value (typically 0-1 or 0-100%) that indicates how likely it is that two records refer to the same entity. Higher scores mean more certainty in the match.
A comprehensive view of a customer that combines data from all touchpoints and systems into a single profile. Entity resolution is the foundation that makes Customer 360 possible.
Software that creates a unified customer database from multiple sources. CDPs use entity resolution to match and merge customer records into unified profiles.
The process of comparing records to identify those that refer to the same entity. The core computational step in entity resolution, using algorithms like Jaro-Winkler, Levenshtein, and Soundex.
The process of standardizing data formats before matching — converting names to consistent casing, parsing addresses, formatting phone numbers, handling character encoding.
Tracking the origin and history of data — which source system a record came from, when it was ingested, and how it was transformed. Essential for audit trails and compliance.
The process of identifying and removing duplicate records within a single dataset or across multiple datasets. A subset of entity resolution focused on eliminating redundancy.
Matching records based on exact agreement of one or more identifiers (e.g., exact email match). High precision but misses variations and typos.
A graph structure where nodes represent entities and edges represent relationships or shared attributes between them. Used to visualize and navigate resolved identity data.
The process of determining whether two or more records refer to the same real-world entity (person, company, or object). Also known as record linkage, deduplication, or data matching.
When two records that refer to the same entity are not matched. Leads to incomplete customer views and missed fraud connections.
When two records are incorrectly matched as the same entity when they actually refer to different people. A major concern in fraud detection and compliance.
String comparison techniques that find approximate matches rather than exact matches. Handles typos, abbreviations, and format variations. Algorithms include Levenshtein, Jaro-Winkler, and Soundex.
The single, authoritative, unified view of an entity assembled from all matched source records. Contains the best-available data from each source system.
A specific type of entity resolution focused on identifying whether records belong to the same person. Used in fraud detection, KYC, marketing, and customer data platforms.
Retrieval-Augmented Generation enhanced with identity resolution. Instead of relying solely on vector similarity, IdentityRAG resolves customer identities before providing context to LLMs.
A discipline that ensures an organization has a single, consistent definition of its key data entities (customers, products, suppliers). Entity resolution is a core capability within MDM.
Matching records using statistical models that account for the probability of agreement and disagreement across multiple attributes. Handles uncertainty better than deterministic matching.
Resolving entities at the moment data is ingested, rather than in periodic batch jobs. Enables use cases like real-time fraud detection and instant Customer 360 updates.
The process of identifying records in one or more data sources that refer to the same entity. Closely related to entity resolution but often used in statistical and academic contexts.
When record A matches B, and B matches C, all three are considered the same entity — even if A and C don't directly match. Also called "chaining" or "closure."
A single record that represents a resolved entity, combining data from all matched source records. The primary unit of measurement in Tilores pricing.