Data Collection & Processing

AI tools can streamline many stages of data collection and processing, improving speed, efficiency, and in some cases, participant engagement. For example, AI-powered chatbots can support data collection by guiding respondents through surveys in an engaging, conversational format.

That said, AI should be seen as a tool to support—not replace—human researchers, especially when it comes to ensuring data quality and ethical oversight.

AI can support behavioural researchers by helping to:

  • Collect and process both qualitative and quantitative data (e.g. using chatbots or AI transcription tools)
  • Create synthetic data or populations to study in isolation or supplement human datasets
    • Synthetic data is artificially generated to mimic the statistical properties of real data, without containing any identifiable information to help protect individuals' privacy
  • Deliver real-time behavioural interventions or guide participants through them (e.g. via virtual coaches)
  • Automate reminders and adapt interventions based on user responses

 

Tools

Common survey platforms, such as Qualtrics (paid), are now integrating AI features such as Conversational Feedback, which uses LLM GPT-4 Turbo to generate follow-up questions based on respondents' initial answers. More flexible solutions include customisable chatbots built with platforms like Rasa (free & paid), which can be tailored for behavioural studies to simulate interviews or collect narrative data. Researchers have also started using LLMs, such as GPT-based bots, in moderated or semi-structured conversational formats to guide participants through qualitative tasks, improving scalability and responsiveness while maintaining conversational depth.

Vink is a free, open-source transcription tool that adapts OpenAI’s Whisper speech-to-text model to support rigorous and efficient qualitative research. Designed to be accessible and privacy-conscious, Vink enables researchers to transcribe multilingual interview data on local machines without cost or internet dependency. It has been positively evaluated for ease of use and performance across 14 languages, offering a scalable, secure alternative to expensive or limited transcription software. For qualitative interview data, Otter AI can help transcribe audio recordings to text. However, enhanced informed consent, secure data management and human verification are highly recommended in academic research.

For data processing, Julius AI Data Transformation can produce a preliminary assessment report on your dataset that identifies errors and outliers before transformation. This validation process allows users to detect missing values and add relevant data points for the analysis. This blog post might give an accessible overview.

There are general-purpose synthetic data generation tools, such as YData’s SDK, that use generative AI to create behavioural, tabular, and time-series data under specific conditions, as outlined in their user guides. More specialised tools also exist, for example, Synthea is an open-source synthetic patient generator that simulates complete, realistic (but fictional) medical histories. Designed for public health research and intervention testing, Synthea data is free from privacy, cost, and access restrictions, making it well-suited for academic, clinical, and government use.

Examples

Researchers developed an AI-powered chatbot to conduct open-ended surveys using Juji (juji.io) a platform where users can create customised chatbots. In partnership with a market research firm, they conducted a field study exploring players’ perceptions of new video game trailers. The chatbot collected more informative and relevant responses than traditional online surveys, improving the overall quality of the data.

  • Xiao, Z., Zhou, M. X., Liao, Q. V., Mark, G., Chi, C., Chen, W., & Yang, H. (2020). Tell Me About Yourself: Using an AI-Powered Chatbot to Conduct Conversational Surveys with Open-ended Questions. ACM Transactions on Computer-Human Interaction, 27(3), 1–37. https://doi.org/10.1145/3381804

However, other researchers compared the effectiveness of chatbots and traditional web surveys in longitudinal data collection on daily news consumption. The findings indicated that web surveys generally yielded higher data quality and more favourable user evaluation. While chatbots offer the potential for increased interactivity and engagement, the study cautioned that they do not necessarily enhance data quality.

  • Zarouali, B., Araujo ,Theo, Ohme ,Jakob, & and de Vreese, C. (2024). Comparing Chatbots and Online Surveys for (Longitudinal) Data Collection: An Investigation of Response Characteristics, Data Quality, and User Evaluation. Communication Methods and Measures, 18(1), 72–91. https://doi.org/10.1080/19312458.2022.2156489

Researchers conducted a systematic review to evaluate the feasibility and effectiveness of AI-based chatbots for promoting health behaviour change across a range of domains, such as smoking cessation. The review found that chatbots can deliver personalised, engaging, and scalable interventions, with several studies reporting positive outcomes on behaviour and user satisfaction. However, the evidence base remains limited by short follow-up durations, variability in chatbot design, and inconsistent reporting of outcomes. The authors conclude that while AI chatbots are a promising tool for health promotion, more rigorous and long-term evaluations are needed to establish their effectiveness and guide best practices.

  • Aggarwal, A., Tam, C. C., Wu, D., Li, X., & Qiao, S. (2023). Artificial Intelligence–Based Chatbots for Promoting Health Behavioral Changes: Systematic Review. Journal of Medical Internet Research, 25(1), e40789. https://doi.org/10.2196/40789

Researchers developed agentic AI architecture to simulate the attitudes and behaviours of over a thousand real individuals. By applying large language models to qualitative interviews, the generative agents replicated participants' responses on the General Social Survey with 85% accuracy. This approach demonstrates the potential of AI-driven agents to emulate human behaviour in social science research, offering a scalable method for studying individual and collective behaviours.

  • Park, J. S., Zou, C. Q., Shaw, A., Hill, B. M., Cai, C., Morris, M. R., Willer, R., Liang, P., & Bernstein, M. S. (2024). Generative Agent Simulations of 1,000 People (arXiv:2411.10109). arXiv. https://doi.org/10.48550/arXiv.2411.10109

However, while agentic AI provides opportunities, it also has risks. Researchers of a recent paper on potential harms arising from increasingly agentic AI systems argue that as machine learning systems gain more autonomy, they can introduce new forms of harm, particularly affecting marginalised communities. They emphasise the importance of proactively anticipating these harms rather than merely responding to them after deployment. The paper concludes that recognising the agency of algorithmic systems does not absolve human responsibility but highlights the need for robust governance to mitigate systemic and long-term impacts.

  • Chan, A., Salganik, R., Markelius, A., Pang, C., Rajkumar, N., Krasheninnikov, D., Langosco, L., He, Z., Duan, Y., Carroll, M., Lin, M., Mayhew, A., Collins, K., Molamohammadi, M., Burden, J., Zhao, W., Rismani, S., Voudouris, K., Bhatt, U., … Maharaj, T. (2023). Harms from Increasingly Agentic Algorithmic Systems. Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, 651–666. https://doi.org/10.1145/3593013.3594033