Common survey platforms, such as Qualtrics (paid), are now integrating AI features such as Conversational Feedback, which uses LLM GPT-4 Turbo to generate follow-up questions based on respondents' initial answers. More flexible solutions include customisable chatbots built with platforms like Rasa (free & paid), which can be tailored for behavioural studies to simulate interviews or collect narrative data. Researchers have also started using LLMs, such as GPT-based bots, in moderated or semi-structured conversational formats to guide participants through qualitative tasks, improving scalability and responsiveness while maintaining conversational depth.
Vink is a free, open-source transcription tool that adapts OpenAI’s Whisper speech-to-text model to support rigorous and efficient qualitative research. Designed to be accessible and privacy-conscious, Vink enables researchers to transcribe multilingual interview data on local machines without cost or internet dependency. It has been positively evaluated for ease of use and performance across 14 languages, offering a scalable, secure alternative to expensive or limited transcription software. For qualitative interview data, Otter AI can help transcribe audio recordings to text. However, enhanced informed consent, secure data management and human verification are highly recommended in academic research.
For data processing, Julius AI Data Transformation can produce a preliminary assessment report on your dataset that identifies errors and outliers before transformation. This validation process allows users to detect missing values and add relevant data points for the analysis. This blog post might give an accessible overview.
There are general-purpose synthetic data generation tools, such as YData’s SDK, that use generative AI to create behavioural, tabular, and time-series data under specific conditions, as outlined in their user guides. More specialised tools also exist, for example, Synthea is an open-source synthetic patient generator that simulates complete, realistic (but fictional) medical histories. Designed for public health research and intervention testing, Synthea data is free from privacy, cost, and access restrictions, making it well-suited for academic, clinical, and government use.