Duplicate Data: Performing quantitative risk analysis for lead generation

Neon blue, red, yellow, dark blue, orange, green white polkadotted blurs on a background
Duplicates in your Salesforce.com CRM cause mirky lead attribution, poor results.

As I blogged on on February 28, I met a gentleman whose lead generation agency is tasked with generating leads for a healthcare services company.

I asked him, “Does your client have a master data management strategy?” The answer was no.

The client is a large organization. They have millions of records of legacy data in Siebel and they are migrating to Salesforce CRM and Pardot. This legacy data could be a great asset if the client had a master data management system in place. The client does not, and this poses a risk for the lead generation agency.

Duplicate data can impede the agency’s ability to receive credit for leads they generate. Not receiving credit or attribution can make it look like the agency did not perform as well as they did – not good for managing stakeholder expectations.

I’m reading the PMBOK and studying for my Project Management Professional (PMP) certification, and I recommend the agency perform risk assessment by:

  • Modeling how duplicates could occur (qualitative)
  • Counting known and “unknown” duplicates (quantitative)

Below are two questions for the agency to ask their client to asses the risk that duplicate data poses to this lead generation project.


Do last-in leads get attribution credit?

If yes

Leads you generate are going to be credited to your agency – as long as duplicates don’t get in the way.

If yes, but we have no way to enforce the policy

After migrating Siebel to Salesforce, use DemandTools to perform an initial dedupe. Configure DupeBlocker to merge old leads into new leads in Salesforce.

Warning: Do not configure DupeBlocker to merge new Lead records into old Lead records in Salesforce. The new Lead record is what is linked to the record in Pardot. To preserve the link between the two systems, merge old Salesforce Lead records into new Lead records – meaning delete the old and copy relevant fields into the new.

If no

It’s unlikely the client will say no to this question. Most clients give last-in attribution.


How complete are the client’s records in Salesforce and Pardot? Here are the 6 Necessary Fields for identifying duplicate customer data: 1) First name, 2) Last name, 3) Email, 4) Phone, 5) Lat/long and 6) Social media handle.

If 90% of records have 4 out of the 6 Necessary Fields

You’re in good shape. You will be able to manage duplicates 90% of the time.

If 90% of records have less than 4 out of the 6 Necessary Fields

Then no matter how you slice it, there is a substantial risk of having “unknown” duplicates which can cause your agency not to receive credit for generating leads.

Say, I am in your database twice. I signed up seven years ago with: Eisaiah, Engel, 858-335-1414. I respond today to an offer and your database creates a new record: Eisaiah, Engel, d25engel@yahoo.com. Your system is not going to be able to match my two records; they will remain as duplicates.

If I make a purchase and my seven year old Lead record gets attribution for it then your agency is not going to get credit – even though you generated the new Lead which caused me to buy.

As I am learning in my project management professional (PMP) certification class, the risk of duplicate data is best explained to the client in the planning stage – before project execution begins. Here are some thoughts on how a project manager can forecast the quantity of “unknown” duplicates in a Salesforce CRM.

First, perform a de-dupe on the data. Save the logs because you will need to know:

  • How many total records existed at the beginning and end?
  • How many duplicates were found?
  • By which fields were duplicates matched?

Make a matrix report showing how many records contain each of the 6 Necessary Fields. Mash the counts from this matrix report against the following “macroeconomic” data.

The average person has:

When you mash this data, you will obtain a measure of the total number of “unknown” duplicates. I performed this exercise with a single field (phone number) example at the bottom of my previous blog post.

Say your rate of “unknown” duplicates is 6.5%. If your client is expecting 100,000 leads then you can justify falling short by 6,500 leads. Those leads are likely there but are not attributed to your agency due to the old, duplicate records receiving credit. It is important for the client to agree to this reduced expectation at the beginning of the project.

Ideally, the client would eliminate duplicates by implementing an enterprise master data management system (like one that I designed) that merges records across all enterprise databases and appends the 6 Necessary Fields from data service providers like TowerData or FullContact. Master data would improve productivity for the lead generation agency and the client’s own sales, customer service and operations departments.

Implementing a master data management strategy is out of scope for this project. But it is something that the client’s IT governance team should consider in 2017.

One thought on “Duplicate Data: Performing quantitative risk analysis for lead generation

Leave a Reply

Your email address will not be published. Required fields are marked *