The Hidden Cost of Bad Data: Where Bad Data Skews Insights and Decisions

In the age of AI, DIY research tools, and instant surveys, access to data has never been easier and faster. But recently, the market research industry has experienced a very concerning trend: low-quality, unreliable, or “bad” data often generated by AI bots is infiltrating datasets at dangerous rates.

These responses are in addition to the usual run-of-the-mill bad responses we have historically encountered (i.e. humans being careless, rushing through a survey, straight-lining through responses and leaving gibberish open-ended responses). AI responses are thoughtful, human-like and hard to detect. If left unchecked, this data can distort insights and mislead even experienced marketing professionals.

To understand the impact bad data could have on any research project, we conducted research on the sharing economy, with split sample: one half of the sample was rigorously cleaned and vetted (going through the same data-quality process every Lab42 project undergoes), and the other was left largely uncleaned to reflect a database that would be received from a DIY platform or otherwise not quality controlled sample.

We looked for differences in the following areas:
1. Demographics
2. Behaviors
3. Motivations

The differences between the 2 datasets were subtle but important, and could lead a marketing professional to make different business decisions that could be misdirected, costly and have long term implications.

This analysis isn’t just about the benefits of cleaning data—it’s about recognizing that as an industry, we have an AI issue that we need to address immediately, before we lose credibility, trust and relevance while causing harm to our clients and their businesses.

1. Demographics:

We first analyzed our datasets to detect any demographic differences between them

Observation: The two datasets skewed differently across gender, age, income, and employment status.

  • Cleaned data:

    • Skews male, ages 35–54, higher incomes, more employed full-time (also more unemployed, interestingly).

  • Non-cleaned data:

    • Skews female, 55+, lower incomes, more homemakers or retired.

demographic comparison chart

Implications: Differences in respondent composition can completely shift the story your data tells.

Examples of how demographic differences could impact decisions:

  • Segmentation: Misidentifying your audience can lead to focusing and targeting a completely different group of consumers that may have very little in common with your actual audience/customer base.

  • Messaging: Misidentifying your audience could lead to messaging that is not relevant and may not resonate with your true audience/customer base .

  • Pricing: Misreading the profile of your core consumers can result in over- or under-valued offerings.

2. Behaviors:

The second area we analyzed were the audiences stated behaviors.

Observation: While some patterns are consistent across both datasets  (same top categories of claimed behavior change in past 3 months: groceries, dining out, gas), the respondents in the clean dataset indicate a trend towards spending less while respondents in the non-clean dataset have more mixed behaviors.

  • Cleaned data:

    • Generally their stated behaviors indicate a trend towards spending less. Significantly more indicate past 3 month behavior change due to increased prices when it comes to eating out at restaurants.

    • More are expecting their future spending in clothing and accessories and childcare to decrease.  Significantly more claimed to have participated in Amazon Prime Days.

    • In addition, more broadly, respondents in the clean dataset were more likely to participate in different aspects of the sharing economy, from subscribing to streaming services, using rideshare services, to buying secondhand clothes.

  • Non cleaned dataset: 

    • For these respondents, claimed behavior was more mixed. Though the top 3 categories where their behavior changed due to price increases were the same as the respondents in the clean dataset, eating out at restaurants was a distant second category. In addition, more  are expecting future spending in clothes & accessories and childcare both to increase in the next 3 months. 

    • More respondents in this dataset claimed to have participated in Walmart Days, and significantly more claim to have spent more this year vs. previous years.
      Respondents in the non-cleaned dataset were less likely to participate in the sharing economy across the board.

Implications: Misrepresenting claimed behavior, past or future and the magnitude of behavior change, can skew consumer priorities and result in misdirected investment in channels, offers, and messages.

Examples of how behavioral differences could impact decisions:

  • Promotional Strategy: Incorrect assumptions about how much categories are impacted by consumers' claimed past or intended future spend can lead to companies overspending or underspending in certain categories to match consumer demand.

  • Channel Allocation: Misjudging participation in promotional events or channels may result in under- or over-investment in those events or channels.

  • Missed opportunities: Misreading consumer trends can cause brands to underestimate and underinvest in growing markets or new brands.

3. Motivations:

The third area we analyzed is what motivates consumers to participate in promotional events.

Observation: Both datasets showed that the number one driver of participation in promotional events is to take advantage of deals.  Besides that, other motivators' ranking and magnitude differed.

  • Cleaned data:

    • The number one driver of participation was to take advantage of deals to counteract rising pricing on everyday items, followed by “I was planning to purchase, so waited for the sale” as a distant second. Very few respondents indicated they were motivated by social media or other advertising.

  • Non-cleaned data:

    • Though the number one driver of participation was to take advantage of deals to counteract rising pricing on everyday items, directionally fewer respondents selected this answer.  In addition, significantly more in the non-clean sample indicated they were motivated to participate by social media and advertising.

Implications: Misunderstanding what truly motivates consumers and to what extent they are motivated, can lead to ineffective messaging and promotions.

Examples of how motivational differences could impact decisions:

  • Campaign Messaging: Emphasizing secondary motivators while neglecting core drivers can reduce message effectiveness.

  • Channels: Overestimating the influence of social media can lead to misplaced investment in digital campaigns that may have limited impact.

Conclusion

This analysis highlights the cost of bad data: even subtle differences in demographics, behaviors, and motivations can significantly change our insights, lead to bad business decisions, and wasted investment. As hard to detect AI-generated responses grow as a percent of research responses, companies cannot afford to think of data cleaning as optional—rigorous vetting of sample suppliers, screening protocols and data cleaning processes must become standard practice. When the data is trustworthy, so too are the decisions you make.

Athos Maimarides

Athos has over 20 years of market research experience. He began his career in a boutique market research firm in Dallas before working for Millward Brown where he gained experience across different methodologies and industries. Athos has a Master’s in Market Research from the University of Texas, Arlington and a Bachelor’s Degree in Accounting from the University of Texas, Austin.

Next
Next

The Essential Data Quality Checklist for Researchers