Automating Data Deduplication in Snowflake with Tilores
Preventing Free Trial Abuse using Identity Resolution with your Snowflake Customer Data
The problem
Ending up with duplicate customer data is almost unavoidable, whether it is due to acquiring data from different sources, customers registering multiple times or mistakes made during manual data entry. Another possible cause of duplicate customer data is when customers sign up online multiple times, each time slightly changing their details, to take advantage of free trials or first-use discounts.
This leads to multiple rows in a table that represent the same customer, company or any other kind of entity. A classic identity resolution problem. So how can one solve this in Snowflake?
What to expect?
In this guide we will start with a Snowflake table that contains duplicated customer data and we will end up with another table introducing a new field (entity_id) which is a unique identifier for each customer.

An **entity_id **will be automatically calculated for any new data added to the table.
Create test table in Snowflake
Create a new worksheet and run the following:
Download the result as CSV. We will use this in the next step to automatically generate the matching rules.
Connecting Snowflake to Tilores
Sign up for a few Tilores account at app.tilores.io. After signing in choose “Upload Data File”. And then upload the CSV file from the previous step, Then click “Yes, use rules” and complete the wizard. It will take around 3 minutes to finish setting up.
Now we will create an API integration which will allow Snowflake to make calls to Tilores.
Snowflake will always be the one initiating the connection, sending and requesting data from Tilores, not the other way around.
In the Tilores UI go to Integration — Snowflake and under the section “Where to get these values?” copy the code sections and run them in a new worksheet in your Snowflake workspace.
The result of the last query should have API_AWS_IAM_USER_ARN, copy the value and use it in User ARN field in Tilores UI. Also copy the value of **API_AWS_EXTERNAL_ID **and use it in External ID in Tilores UI then click **UPDATE **to activate the connection.
The following banner should show:

Now Snowflake and Tilores are connected.
Setup automatic entity identification in Snowflake
Next we will create a task which will process any new data added to the **customers **table we created at the beginning and create a row with an **entity_id **in a new table customers_entities. Run the following in Snowflake:
Testing the automation
After two minutes, a row is created in customers_entities for each customer. So if we perform a join we should be able to tell which customer rows belong to which actual customer.
Should result in:

Based on the entity ids, it shows that these rows represent only three actual customers.
And if we now add another customer with a similar name and a few spelling mistakes, and a different email domain. It should end up with the same **entity_id **as the first row.
Wait two minutes, then run the following again:
The result should now be:

In this case entity_id is the actual customer identifier.
Conclusion - Identity Resolution in Snowflake
By introducing unique entity IDs for customers, this identity resolution approach ensures an accurate customer 360 view in Snowflake and allows for easy customer segmentation for marketing. Further use cases include detecting duplicate account creation associated with free trial abuse.
Ready to try entity resolution?
Start Building Free →