Can AI help us review evidence faster without losing rigour? By Drs Rebekah Rees & Niamh Hart. Can AI help us review evidence faster without losing rigour? Reviews are the backbone of research and policy. They pull together what we already know, highlight gaps, and guide future decisions. However, research reviews (systematic, scoping, umbrella) can often take over a year, and they demand huge amounts of time and resources. We wondered — could generative AI help speed things up without cutting corners? Afterall, one of BR-UK's own recent Capability Scoping study found that AI could enable innovative research design and analysis with careful implementation. Generative AI involves deep-learning models creating content like text, images, or audio from training data. Well-known types include large language models (LLMS), such as OpenAI’s ChatGPT and Google’s Gemini. These systems analyse, edit, translate, and generate human-like content, which is increasingly used to improve review process efficiency. AI in ResearchOthers are already testing AI for research processes out. The National Collaborating Centre for Methods and Tools has built AI tools like DAISY Rank and AI Screening to cut screening time and improve accuracy (Rogers et al., 2024; Dobbins et al., 2024). The Behavioural Insights Team conducted a head-to-head test: two researchers reviewed the same topic, one using AI tools like ChatGPT-4, Claude 2, Elicit, and Consensus, while the other did not. The AI-assisted review was 23% faster overall (and 56% faster for analysis), producing similar quality results — though drafts needed human verification (Egan et al., 2025).What we didFor BR-UK’s Health & Wellbeing theme, we trialled AI in our umbrella review of childhood obesity. Normally, you’d spend months screening and extracting data from thousands of studies. Instead, we asked ChatGPT to help at each stage — generating search strategies, screening abstracts, checking full texts, and pulling out data. We tested different AI models but ChatGPT at the time (February 2025) proved the most accurate, consistent, and practical. After searching databases, we asked ChatGPT what its preferred format would be to extract data: So, to keep things reliable, we worked in small batches (asking ChatGPT to read 10-20 PDFs rather than the entire file ZIP file of 480 documents), used precise prompts (see Table 1 below for examples), and verified results manually. When ChatGPT produced its list of included reviews, we manually re-screened 100% of the titles and abstracts to ensure relevance. This involved checking that each review focused on obesity and included at least one behavioural determinant (such as diet, physical activity, sleep, or screen use). Reviews that concentrated solely on non-behavioural factors, such as genetics, biological mechanisms, or socioeconomic status, were excluded, which led us to manually remove 84 reviews that were not relevant. During this process, we also double-checked study numbers and meta-analyses outcomes for accuracy. We then full-text screened just over 50% of the reviews for additional rigour. Table 1. Instructions and Prompts provided to ChatGPT for Determinants of Obesity Umbrella Review Prompt Component Instruction Given to ChatGPT Objective “Conduct a rapid review of determinants of obesity in children and young people.” Uploaded Inputs • ZIP file with 362 full-text review PDFs • Excel file containing Review IDs, titles, and a structured data extraction template Extraction Rules For each review, complete one row in the Excel file using information from full-text reports. Ensure Review ID matches the correct PDF. Core Variables Review ID, title, authors, publication year, review type, study setting, population/age, gender, health status, country, income classification, social/deprivation indicators, study quality, study design, number of included studies, and meta-analysis details. Behavioural Focus Identify behavioural determinants of obesity (e.g., diet, screen time, physical activity, sleep) and non-behavioural determinants (e.g., socioeconomic status, comorbidities) separately. Output Expectations Summarise aims and findings clearly in 1–2 sentences per review. Maintain accuracy, completeness, and alignment with the Excel structure. What we found AI could pick out key behavioural determinants — like physical activity, sedentary time, regular meals, and consumption of ultra-processed foods. But it also misclassified some studies and occasionally included irrelevant data, not directly linked to obesity or our research aim. For example, AI excluded certain studies, but human review re-added 34 of them because they focused on children and young people or were systematic reviews. That’s why we checked every behavioural determinant manually and corrected errors where needed. Lessons learned AI is a powerful helper, not a replacement. Good prompts and small batches make outputs more reliable. Manual verification is essential — AI alone can’t yet capture all the nuance required for a reliable review. ChatGPT worked best for us, and its accessibility supports open and replicable science. Looking ahead AI won’t take over research, but it can help us work faster and smarter. For now, the best approach is hybrid: AI for speed, humans for rigour. As these tools evolve — and they are changing rapidly — they could transform how behavioural research handles evidence, turning months of work into weeks while maintaining quality. Want to explore further? Check out BR-UK’s AI Tools and Resources Repository, which pulls together guidance on how researchers can start experimenting with AI in their own reviews. This article was published on 2025-10-06