Fighting Bots and Bad Responses Without Burning Out Your Audience

If you’ve conducted online research lately, you know the data quality can be questionable at best and outright fraudulent at worst.

But would you believe 41% of respondents in a recent study we conducted to investigate data quality issues were flagged as poor quality for failing at least one of our data quality questions?

We’ve seen this uptick firsthand. Historically we have seen between 25%-30% of data gathered classified as bad quality data. To counteract this, we already built into our surveys multiple data quality questions on top of our manual data quality checks to ensure our clients receive reliable insights.

But as bots become more sophisticated and people become more inattentive, resulting in an increase in bad data quality, we wanted to dig deeper to understand how many and what types of questions are most effective at catching bad actors, whether they’re bots, humans using AI to answer survey question or just disengaged humans.

To find out, we designed a study with twelve data quality questions spanning different types and categories: direct asks, factual questions, brand awareness tests, visual tasks, etc. Our approach was deliberately strict. If a respondent failed even one question, we flagged them as unreliable. The goal was to find out how many and which types of questions caught the most bad respondents without kicking out the good ones.

How Each Question Performed

Category Caught %
Factual question17%
Common knowledge12%
Attention checker12%
Trap question11%
Bot catcher8%
Factual question8%
Common knowledge6%
Consistency5%
Factual question4%
Factual question3%
Direct ask3%
Attention checker1%

Looking at these results individually, you can see the variation. Some questions caught 17% of bad respondents, while others barely made a dent with 1% failure.

Each question has value on its own, but we wanted to figure out the optimal combination of questions that gives us maximum coverage without forcing survey fatigue onto the respondents?

To answer this, we ran a TURF analysis (Total Unduplicated Reach and Frequency) to identify which combination of questions would catch the most unique bad respondents with the fewest questions.

What We Found: Optimal Question Combo

Question Unique Bad Respondents Caught
Factual question406
+ Common knowledge+295
+ Trap question+96
+ Common knowledge+76
+ Bot catcher+48
Total Unique Bad Responses Captured921 out of 1,132 (81%)

The first three questions caught the majority of bad respondents (70%), and adding the additional Common Knowledge and Bot catcher questions increased coverage to 81%. Including additional questions showed diminishing returns, with each additional only nabbing about 20 bad responses per. 

This research tells us a lot. If you aren’t actively screening for data quality within your surveys, nearly half of your research budget could be paying for garbage data. 

The good news is that you don’t need a dozen data quality questions and risk annoying or tiring out your good respondents. Having a combination of 3-5 well written data quality questions combining factual, common knowledge, trap and bot catchers, can catch and remove over 80% of poor quality respondents before they ever hit the Thank You page, saving you time and money in cleaning and reconciling.  

Layering in strategically placed open ended questions and traditional metrics like speeding, straightlining, and IP deduplication will give you a more accurate dataset, and more importantly, research you can trust to make decisions on.

Jon Pirc

Jon has spent his professional career as an entrepreneur and is constantly looking to disrupt traditional industries by using new technologies. After working at Sandbox Industries as a ‘Founder in Residence’, Jon founded Lab42 in 2010 as a way to make research more accessible to smaller companies. Jon has a Bachelor’s of Science in Psychology from Northern Illinois University.

Next
Next

The Hidden Cost of Bad Data: Where Bad Data Skews Insights and Decisions