Finding Truth in Data

Through research and experiential learning engagements, Wharton Customer Analytics (WCA) works with corporate partners to transform business thinking, translate research findings, teach students analytics, and demystify machine learning and AI application technology. Along with Nielsen's Jonathon Wells, WCA's Mary Purk discussed some approaches marketers can take to ensure truth in data at the 2022 ANA Masters of Data & Technology Conference.

Using big data associated with TV, the pair gave examples of how biases can appear in all data sets and how big data can sometimes leave out key context, which makes useful analysis more difficult. Additionally, the pair provided some key tips for how marketers can overcome these obstacles to get the most out of their data sets.

Words of Wisdom

"There is no data that is unbiased and there is no data that is error free. Understanding and acknowledging that is the first step toward finding truth in data."
     — Jonathon Wells, SVP of data science at Nielsen

Key Takeaways

Finding truth in marketing data is critical because it is truth in data that will allow marketers to best influence consumers' behaviors. The future of data analysis is the democratization of artificial intelligence and machine learning. This environment will create scalable computing platforms, open-source developer tools, and marketplaces for data. These developments will in turn create new industry disruptions and risks.

Automated data decisions stand to create three key types of risk:

  • Social: Disadvantaged minorities may continue to be left behind by subconscious biases affecting data algorithms and AI.
  • Reputation: Brands can be perceived as biased or prejudiced due to actions taken through automated data.
  • Regulation: Brands or companies may be sued for unfair practices.

Big data sets are a must, but they do have limitations. Typically, these types of data sets are built around billing information or online behaviors, not demographic profiles. As a result, they lack rich profile details like age, income, and ethnicity. For this reason, big data sets come with an increased possibility of waste and fraud. AI and machine learning can help to mitigate some of these limitations.

There is no datapoint that can substitute for those from real, verified people. For example, big data around the number of TV sets in the average home is derived from reporting by set top box providers and smart TV brands. Forty percent of TVs currently in use in the market would not be covered by these two main data sources because they are not connected to them. Additionally, these big data sources are not able to accurately track who is watching a particular program or, in the instance of set top box providers, if anyone is even watching at all. The latter uncertainty results from the fact that users leave set top boxes on when they turn off their TVs, given the illusion of a viewer who isn't actually there. These are examples of situations in which big data can be supplemented by data from real, verified individuals.

To solve for some of the issues surrounding big data accuracy, there is a four-step process for integrating big data into overall measurement:

  • Step One: Ensure data quality by cleaning big data sources.
  • Step Two: Aggregate supplemental data, like identifying household characteristics and demographics in the case of big data around TV. This can be done via machine learning.
  • Step Three: Undertake a process of bias and error correction. In the instance of TV data, this process would work to get a clearer idea of who is watching a given program at a given time.
  • Step Four: Calculate data point quality to ensure you're only using the most accurate data.

Action Steps

Tips for how to break biases in data include:

  • Educate your teams broadly on data and algorithmic bias issues.
  • Understand the potential data biases and problems that might occur within training datasets that influence AI models.
  • Lean on leaders, who must be responsible for how tools, data, and models are used in the organization.

Q&A with Mary Purk, executive director of Wharton Consumer Analytics at The Wharton School at the University of Pennsylvania; Jonathon Wells, SVP of data science at Nielsen

Q. What's your No. 1 recommendation for how companies should invest in their data over the next 12 to 24 months?

Jonathon Wells: It's really about investing in the data architecture, the data structure, and the literacy in data. We have a tendency, at the moment, to spend a lot of time focusing on the AI and the machine learning, but the reality is those things are only as good as the data that comes in. Understanding data's imperfections and how to best interpret it is going to unlock the full value of your data sets.

Mary Purk: Invest in people who are working in data and strive to make these teams more diverse and more inclusive to begin to mitigate some of the issues with data bias that we discussed. It will also bring in new and different ways of thinking, which is always useful for solving a problem. You should require that of your vendors, as well.

Q. How can companies manage the cost of data management and achieving unbiased and truthful data sets? Are there certain tools that companies should be using?

Wells: There's a wide range of tools and, depending on your use case, some are better suited than others. Balancing the cost again comes back to the notion of how you unlock the true value of your data. The more you can invest in data knowledge and data automation, the less taxing and costly it will be for your data teams.

Purk: Make sure you define your problems clearly and correctly so that you can understand exactly what you need to invest in to fix them.


"Finding Truth in Data." Mary Purk, executive director of Wharton Consumer Analytics at The Wharton School at the University of Pennsylvania; Jonathon Wells, SVP of data science at Nielsen. 2022 ANA Masters of Data & Technology Conference, 3/28/22.

You must be logged in to submit a comment.