Have AI Bots Infiltrated Your Survey Data?
The explosive growth of AI has brought about incredible technological advancements in a very short amount of time. While there are countless benefits of well regulated AI, there are also significant problems that can be created or made worse by AI - especially when it comes to quantitative market research, where accuracy and reliability of data is critical.
AI bots, posing as real respondents, have become survey ruiners, distorting data and wreaking havoc in online research. It’s a real issue that the team at Lab42 deals with on a daily basis and we have seen the growth firsthand. We’re constantly learning, adapting, and executing best practices to reduce the number of fraudulent respondents, including identifying and removing any bots that get into a survey.
Who / What are the bots?
We hear the phrase all the time, but what exactly is a ‘bot’? At a high level, a bot is a piece of software programmed to act like real people online - from leaving comments in online forums to answering online surveys and polls. Since they don’t have genuine opinions or experiences, they either copy answers from elsewhere or randomly respond to survey questions.
Studies have shown that 15-30% of responses in online survey research could be from bots - skewing the data collected and rendering it nearly useless - and potentially leaving a stain on your brand and reputation.
What are the motives?
What’s in it for the bots? It’s generally the same thing that drives most people - money. Online bots can help bad actors get paid for the false responses they enter into surveys. And because the quality of generative AI has improved so much over just the past 6 months, it’s becoming more and more difficult to identify and eliminate the bots from online surveys - especially while the surveys are live and in the field.
What are the risks?
Bots pose a variety of risks to online quantitative research - not only to the accuracy of the data being collected, but also the operations and reputation of your company. A few specific risks include:
Corrupt data
Bots are able to pose as humans and get past automated data quality checks, leading to corrupted data sets. This can mislead researchers and skew the results, potentially leading to incorrect conclusions and misguided strategies.
More work
If you are using a DIY platform and not well versed in identifying and cleaning online survey results but want to be vigilant about data quality, you will likely end up spending a significant amount of time employing bot detection and mitigation techniques in your survey development, programming, and analysis. This diverts resources from analyzing and using the data to drive good decisions.
Financial losses
As seen in the 2023 State of Bot Mitigation Report, bots can cause financial losses to organizations conducting or relying on online research due to corrupted data and the subsequent misinformed decisions.
Preventative Measures
To stop these bots, it takes a mix of smart moves and constant diligence. Here are just a few techniques that Lab42 uses to eliminate bots and poor quality respondents from our surveys:
Unbiased sample sourcing - By getting respondents from several different sources, we create a harder barrier for bots to get through. The multi-source approach not only distributes the risk, but also adds layers of verification that help filter out fraudulent respondents.
Layering of fraud/bot mitigation techniques: A multi-pronged approach is key to gaining ground in this battle. At Lab42, we include a mix of automated and programmatic quality checks, combined with old fashioned manual data quality checks.
Automated data quality checks include:
Respondent deduplication via cookie and IP address tracking
Speed traps
Answering assorted questions from our bank of data quality questions
Identifying duplicate answers across multiple questions
Identifying copy/paste within a survey
Manual data checks
On a daily basis, our research team downloads all client survey data and reviews it - on a respondent by respondent level - to ensure the open ended questions are answered thoughtfully and actually answer the questions being asked.
It is through these manual data checks we are able to identify most bot attacks. A few identifying marks for bot answers:
The responses are longer: Bot responses are typically longer and seemingly more thought out than human responses.
They share the same theme: We’ve noticed many bot answer a given survey question in similar ways. For example, in a past survey where we asked why respondents replaced their smartphones, nearly 75% of the respondents we identified as bots answered that their phone fell into a body of water, a bucket of liquid, or they dropped it into a lake (or some similar type of response).
They typically come from the same sample source: When a bot attack happens, we’ve found that they usually come from one sample source. We’ll see large influxes of completed responses coming from specific sources, and then manually turn off that sample source.
They come in during off hours: If you see a large number of completed responses that came through between midnight and 3:00 am, there’s a chance your survey may have been hit by some bad bots. We’ve noticed the bots attempt to complete surveys during off-hours to avoid any real-time checking that may take place while employees are in the office.
Continuous improvement
This is a cat and mouse game, and unfortunately the bots are continuously evolving to outmaneuver detection methods. It’s important to stay up-to-date on the latest available techniques, as well as employing some of your own. For example, Lab42 has recently incorporated images, audio, and video into some of our data quality check questions in a new attempt to prevent fraudulent respondents from getting into our surveys.
What’s next?
The growth and evolution of AI bots highlight both the positives and negatives of advancing technology. As businesses and market researchers deal with the implications of an increase in the frequency and strength of these fraudulent responses, understanding more about these bots and how they bypass mitigation techniques is crucial to building a strong game plan to block them. This will help ensure the insights collected through online research are genuine, valid, and from real people - not just pieces of code.