Wednesday, November 22, 2017

Do Customer Data Platforms Need Identity Matching? The Answer May Surprise You.

I spend a lot of time with vendors trying to decide whether they are, or should be, a Customer Data Platform. I also spend a lot of time with marketers trying to decide which CDPs might be right for them. One topic that’s common in both discussions is whether a CDP needs to include identity resolution – that is, the ability to decide which identifiers (name/address, phone number, email, cookie ID, etc.) belong to the same person.

It seems like an odd question. After all, the core purpose of a CDP is to build a unified customer database, which requires connecting those identifiers so data about each customer can be brought together. So surely identity resolution is required.

Turns out, not so much. There are actually several reasons.

- Some marketers don’t need it. Companies that deal only in a single channel often have just one identifier per customer.  For example, Web-only companies might use just a cookie ID.  True, channel-specific identifiers sometimes change (e.g., cookies get deleted).  But there may be no practical way to link old and new identifiers when that happens, or marketers may simply not care.  A more common situation is companies have already built an identity resolution process, often because they’re dealing with customers who identify themselves by logging in or who transact through accounts. Financial institutions, for example, often know exactly who they’re dealing with because all transactions are associated with an account that’s linked to a customer's master record (or perhaps not linked because the customer prefers it that way). Even when identity resolution is complicated,  mature companies often (well, sometimes) have mature processes to apply a customer ID to all data before it reaches the CDP. In any of these cases, the CDP can use the ID it’s given and not need an identity resolution process of its own.
- Some marketers can only use it if it’s perfect. Again, think of a financial institution: it can’t afford to guess who’s trying to take money out of an account, so it requires the customer to identify herself before making a transaction. In many other circumstances, absolute certainty isn’t required but a false association could be embarrassing or annoying enough that the company isn’t willing to risk it. In those cases, all that’s needed is an ability to “stitch” together identifiers based on definite connections. That might mean two devices are linked because they both sent emails using the same email address, or an email and phone number linked because someone entered them both into a registration form. Almost every CDP has this sort of “deterministic” linking capability, which is so straightforward that it barely counts as identity resolution in the broader sense.

- Specialized software already exists. The main type of matching that CDPs do internally – beyond simple stitching – is “fuzzy” matching.  This applies rules to decide when two similar-looking records really refer to the same person. It's most commonly applied to names and postal addresses, which are often captured inconsistently from one source to the next. It might sometimes be applied to other types of data, such as different forms of an email address (e.g. draab@raabassociates.com and draab@raabassociatesinc.com). The technology for this sort of matching gets very complicated very quickly, and it’s something that specialized vendors offer either for purchase or as a service. So CDP vendors can quite reasonably argue they needn’t build this for themselves but should simply integrate an external product.

- Much identity resolution requires external data. This is the heart of the matter.  Most of the really interesting identity resolution today involves linking different devices or linking across channels when there’s no known connection. This sort of “probabilistic” linking is generally done by vendors who capture huge amounts of behavioral data by tracking visitors to popular Web sites or users of popular mobile applications, or by gathering deterministic links from many different sources. They then build giant databases (or "graphs" if you want to sound trendy) with these connections.  Even matching of offline names and addresses usually requires external data, both to standardize the inputs (to make fuzzy matching more accurate) and to incorporate information such as address and name changes that cannot be known by inspecting the data itself.  In all these situations, marketers need to use the external vendors’ data to find connections that don’t exist within the marketers’ own, much more limited information. If the external vendor provides matching functions in addition to the data, the CDP is relieved of the need to do the matching internally.

In short, there’s a surprisingly strong case that identity resolution isn’t a required feature in a CDP.  All the CDP really needs is basic stitching and connections to external services for more advanced approaches.  As cross-device and cross-channel matching become more important, CDPs will be more reliant on external vendors no matter what capabilities they’ve built for themselves. One important qualifier is the CDP implementation team still needs expertise in matching, so they can help clients set it up properly. But while it’s great to find a CDP vendor with its own matching technology, lack of that technology shouldn’t exclude a vendor from being considered a CDP.

No comments: