How do you treat your duplicates?
Stop for a moment and think about this… The natural reaction for most companies is to delete duplicate records without much thought. But we LOVE duplicates! ..so much information to glean.
So, there are really 3 main choices to consider.
These choices are not mutually exclusive and when used in combination produce favorable results:
Kill – One of the top reasons for deleting duplicates is AFTER the combining of the best fields (see #3). Another reason is that the information in each field is exactly the same. If and when you DO choose to outright delete duplicates – be sure you are deleting the CORRECT records based on a PRIORITY of the best fields! Then – and this is key – keep those duplicates and compare them to your final file to ensure you are removing or keeping the correct records.
Keep – There are many reasons and methodologies for keeping duplicates. Duplicates can be an indication of a customer visiting multiple store locations, or completing online and offline transactions. There may be times where you need to maintain historical information such as previous addresses and prefer to keep duplicates but move to a separate file. Very often, data is originating from different systems and it is not feasible to dedupe between these systems, instead, code your duplicates so you can identify and handle them correctly.
Combine – Extracting the BEST data points from each duplicate and combining them into one record is one ideal solution. A few examples are cases when there is additional contact information such as phones, emails or digital identifiers, also notes, purchases, and dates of contact. Combining duplicates can occur in many different configurations that bring great power and insight into your data.
With a complete Data Quality Management Program, there are some thoughts to consider first BEFORE embarking on any deduping. The details are shared in a previous video that we created but as a quick overview, these thoughts include:
What are the inputs/sources of the data? What do you consider a duplicate? (Do you even know? ) and, How will you use the data?
Not sure what is the best for your data? Email us and unload your issues!