Hypothesis Generation & Study Design

Generating research questions is a core part of behavioural science, but it’s not easy. Researchers often spend months narrowing down a testable idea, as this process relies on reading and assessing large volumes of prior work. With thousands of new studies published each year, it’s becoming harder to keep up—and easier to miss important gaps in the literature.

AI tools can help by scanning and synthesising large bodies of research to suggest new hypotheses, highlight underexplored areas, and turn vague ideas into testable questions. Some tools can also recommend appropriate methods or generate draft materials such as surveys, interview guides, and behavioural interventions.

AI can help you:

  • Analysing existing literature or large datasets and suggesting hypotheses.
  • Design research materials, such as surveys, stimuli, and interview guides.
  • Refine vague or complex research questions and hypotheses into clear, precise, and testable formats, making them more specific, operationalisable, and easier to assess.
  • Suggest methods it would expect you to be using based on your research question.
  • Generate tailored intervention content (e.g., personalised health messages, therapy scripts, or learning modules).
  • Analyse past intervention data to refine strategies.

 

Tools

Many of the examples below use large language models (LLMs) like ChatGPT, Claude, or Copilot, often adapted by researchers for specific study tasks. However, there are also standalone tools designed to support early research stages like hypothesis generation and study design. These include:

  • HyperWrite’s Hypothesis Makers: Generates hypotheses based on your research question. Free trial and paid versions available.
  • CoQuest (GitHub | arXiv paper) : An AI-powered tool designed to generate research questions, particularly useful in early study planning. Developed by researchers at MIT.

A recent study evaluated six AI tools—ChatGPT-4o, ChatGPT-4, ChatGPT-3.5, Perplexity AI, SCI Space, and Julius AI—for identifying research gaps in the educational management literature. Results varied: while some tools returned accurate, useful references, others generated fabricated sources. Read the full study

  • Soriano, S., Obrero, M. M., Rabago, J. B., Pacaldo-Pabalan, A., & Angulo, R. R. D. G. and L. R. (2024). Evaluating the Accuracy of selected AI Tools in Identifying Methodological and Theoretical Gaps in Research Literature. Library Progress International, 44(3), Article 3. https://doi.org/10.48165/bapas.2024.44.2.1

Examples

Researchers developed an open-source machine learning model to identify novel predictors of unethical behaviour—such as hoarding and violating social-distancing guidelines—during the COVID-19 pandemic. Analysing responses to over 700 items from the World Values Survey, the model highlighted optimism about humanity’s future as a key predictor. The researchers then designed an experiment showing that framing messages about the pandemic in an optimistic (vs. pessimistic) way influenced how people justified unethical behaviours, with optimism reducing justification.

  • Sheetal, A., Feng, Z., & Savani, K. (2020). Using Machine Learning to Generate Novel Hypotheses: Increasing Optimism About COVID-19 Makes People Less Willing to Justify Unethical Behaviors. Psychological Science, 31(10), 1222–1235. https://doi.org/10.1177/0956797620959594

A recent study showed how AI can support hypothesis generation by combining a large language model (GPT-4) with causal graphs, which are maps of cause-and-effect relationships in research. Analysing over 43,000 psychology papers, the researchers used this method to generate 130 new hypotheses about well-being. The combined AI approach produced ideas as original as those from PhD students and stronger than those from the LLM alone, showing how AI can help researchers spot promising directions in large bodies of literature.

  • Tong, S., Mao, K., Huang, Z., Zhao, Y., & Peng, K. (2024). Automating psychological hypothesis generation with AI: When large language models meet causal graph. Humanities and Social Sciences Communications, 11(1), 1–14. https://doi.org/10.1057/s41599-024-03407-5

At BR-UK, one of our projects exploring how public support for policies shifts in response to different types of evidence—statistical, anecdotal, and combinations of the two—used ChatGPT to help generate the various evidence stimuli needed across the study. You can read more about the study and access the protocol outlining how ChatGPT was used to create the stimuli here.

  • Rodger, A., Sanna, G. A., Cheung, V., Raihani, N., & Lagnado, D. (2025). Negative Anecdotes Reduce Policy Support: Evidence from Three Experimental Studies on Communicating Policy (In)Effectiveness. OSF. https://doi.org/10.31219/osf.io/e2kxc_v1

Also see this great overview blog on The Promise & Pitfalls of AI-Augmented Survey Research


Researchers are exploring how generative AI can support better experimental design by helping identify mediators (why something works), moderators (for whom it works), and alternative treatment arms. A recent paper suggests that tools like large language models can help researchers brainstorm these elements, simulate synthetic participants to test different ideas, and assess whether a study is likely to scale to other settings. This approach can make it easier to design studies that are more comprehensive and policy-relevant from the start. However, researchers still need to critically assess which AI-generated ideas are useful and which are not.

  • Chang, S., Kennedy, A., Leonard, A., & List, J. A. (2024). 12 Best Practices for Leveraging Generative AI in Experimental Research (Working Paper 33025). National Bureau of Economic Research. https://doi.org/10.3386/w33025

Researcher leveraged GPT-3.5 to create 1,150 SMS messages aimed at improving medication adherence among individuals with type 2 diabetes. The messages adhered to behavior change techniques and met standards for readability and tone.

  • Harrison, R. M., Lapteva, E., & Bibin, A. (2024). Behavioral Nudging With Generative AI for Content Development in SMS Health Care Interventions: Case Study. JMIR AI, 3(1), e52974. https://doi.org/10.2196/52974

Researchers developed "MindShift," an AI-powered tool that generates personalised persuasive messages to reduce problematic smartphone use. Considering users' mental states and contexts, the intervention led to significant reductions in smartphone usage and improvements in self-efficacy over a five-week period.

  • Wu, R., Yu, C., Pan, X., Liu, Y., Zhang, N., Fu, Y., Wang, Y., Zheng, Z., Chen, L., Jiang, Q., Xu, X., & Shi, Y. (2024). MindShift: Leveraging Large Language Models for Mental-States-Based Problematic Smartphone Use Intervention. Proceedings of the CHI Conference on Human Factors in Computing Systems, 1–24. https://doi.org/10.1145/3613904.3642790

Mass General Brigham piloted using LLMs to help physicians respond to patient messages, finding that a significant percentage of AI-generated responses were safe to send and required no further editing.

  • Chen, S., Guevara, M., Moningi, S., Hoebers, F., Elhalawani, H., Kann, B. H., Chipidza, F. E., Leeman, J., Aerts, H. J. W. L., Miller, T., Savova, G. K., Gallifant, J., Celi, L. A., Mak, R. H., Lustberg, M., Afshar, M., & Bitterman, D. S. (2024). The effect of using a large language model to respond to patient messages. The Lancet Digital Health, 6(6), e379–e381. https://doi.org/10.1016/S2589-7500(24)00060-8

Agentic AI is being applied to personalised health coaching, where AI systems adapt to users' needs based on their feedback. For instance, one AI agent identifies barriers to healthy behaviours through motivational interviewing, while another provides tailored strategies using behavioural science models like COM-B. This approach reduces reliance on human coaches, improving accessibility and scalability of wellness programs. However, ensuring data privacy and protecting against unintended harms remain key challenges for responsible implementation.

  • Yang, E., Garcia, T., Williams, H., Kumar, B., Ramé, M., Rivera, E., Ma, Y., Amar, J., Catalani, C., & Jia, Y. (2024). From Barriers to Tactics: A Behavioral Science-Informed Agentic Workflow for Personalized Nutrition Coaching (arXiv:2410.14041). arXiv. https://doi.org/10.48550/arXiv.2410.14041

To learn more about Agentic AI read LSE’s blog on it’s Application in Behavioural Science.


Other key readings that informed this page:

Banker, S., Chatterjee, P., Mishra, H., & Mishra, A. (2024). Machine-assisted social psychology hypothesis generation. American Psychologist, 79(6), 789–797. https://doi.org/10.1037/amp0001222

Wei, J., Kim, S., Jung, H., & Kim, Y.-H. (2024). Leveraging Large Language Models to Power Chatbots for Collecting User Self-Reported Data. Proc. ACM Hum.-Comput. Interact., 8(CSCW1), 87:1-87:35. https://doi.org/10.1145/3637364