With More Marketing Data Sources in 2023, Watch Out for These 3 Pitfalls

By Bobby Atefi

More data means more insights. At least, that seems to be the presumption behind a recent Salesforce survey of 6,000 marketers about 2023. The research finds that brands are looking to greatly diversify their data sources, from identity data to transactional records. From 10 sources on average in 2021 to 15 in 2022, they anticipate increasing to 18 sources this year.

Upping data diversity is a wise move, especially considering data deprecation from Apple policy changes, GDPR and more. But more data also means more complexity and noise to contend with. More data means brands have more work to do, like determining what to do when data conflicts, and how to roll more types of records into a single customer profile. All of that can lead to confusion and misalignment.

When working with multiple sources of data, here are three key challenges that marketers will need to consider:

  • Same person, different data. More than a quarter of Americans have over four email addresses. Meanwhile, the average U.S. household has 22 connected devices. That's a lot of signals per person. With different sources for the data and potentially different platforms handling all of them, brands will need a system in place to deduplicate and reconcile across the records.

    For a simple example, a customer could sign up to a brand's loyalty program with the email jdoe@xyz.com but subscribes to a newsletter with janed@abc.com. The company needs to recognize that those two email addresses and all the data that's appended to them both tie back to one individual. If not, they may be subject to problems like missteps in customer journey management and wasted spending when it comes to frequency capping.

  • More data to corral. Pulling data from more outlets also takes extensive data cleanup and standardization for accurate connections and comparisons. This sounds like a minor detail, but it's mission-critical for building a single record from multiple sources that may notate information like timestamps and location differently.

    For many companies, using more types of data will also mean higher data volume – which means more work filtering out bad data, from dummy addresses like "test@test.com" to IP addresses loaded in unusable formats. Companies must establish processes and automation to handle that prep work (technically called data wrangling) so that more data yields sharper understanding – not more confusion.

  • Overconnected Identifiers. Starbucks gets an average of 500 customers per store per day in the U.S. Customers enter common locations constantly – and often share the same Wi-Fi when they do. Friends and relatives regularly share login credentials across households, and share devices within a household.

    To be sure, multiple data sources can strengthen data models by providing more reference points to work from. But many data points can also have a "hall of mirrors" effect – making already muddy connections even more opaque. Without the right approach, it's possible to mistakenly assume that all activity on the devices that logged in at that location are a single household—a presumption that can lead to interpreting data signals incorrectly.

    Over-connectedness is just one example of data issues that require an inquisitive approach, by asking questions like "how many devices are in a typical household?" or "how long and how often are devices typically logged into that location?" Thus, companies will be much more effective at using data.

When it comes to data sources, more information can mean more power, but potentially also more confusion, errors, and missed opportunities. One takeaway from all this is that identity resolution, which can provide a single, accurate view of the user across the many available data sets and records, is only becoming more critical as the data landscape evolves.

Identity resolution has several layers, and companies are well served to take them all into account. While a CDP or cloud partner can help resolve two different data sets that share a similar "match key" like a first party cookie or email address, companies will be left with gaps and a lack of scale.

To get a richer picture of their audience and to have what they need for targeting at scale, they also need to resolve disparate data sets that don't have a match key. This requires access to an identity graph among other things. Luckily, as identity becomes a bigger part of the advertising landscape, more partners are building these capabilities and are becoming aware of the need for more robust identity resolution. Companies need to seek out partners that understand the scope of the work that needs to be done to manage data effectively and accurately.

The data explosion is coming right at a time where companies are interested in harnessing the power of data. While "more is more" isn't the answer, brands will still need to deal with huge amounts of data in near real-time to get identity to work at scale. Being able to manage lots of data safely, have trusted partners, and a way to resolve data accurately are foundational elements of success.

The views and opinions expressed are solely those of the contributor and do not necessarily reflect the official position of the ANA or imply endorsement from the ANA.

Bobby Atefi is chief data scientist of MediaWallah.