Research use of health and social care data is constrained by current reliance on data recorded using rigidly structured methods. However, qualitative health and social care data is often recorded in free-text. Building on our £4.3M UoE/City Deal investment in the regional DataLoch™, we will develop and deploy tailored Natural Language Processing and Artificial Intelligence methods to enhance existing routine data with data extracted from free-text clinical records, which will have a wide range of potential applications for research and health and social care. What are our intentions? To develop, evaluate and routinely implement processing of free-text health and social care records to obtain a complete and deep understanding of people’s medical profiles and circumstances (including diagnoses, social and family history, the presence of geriatric syndromes, functional deficits and frailty markers), place of residence (home, extra-care housing, care home) and household composition (living alone, fitness and frailty of other members of the household). Specifically, we have the following objectives: Understand requirements and datasets Standardised terminologies, geriatric syndrome ontology and computable phenotypes (parameters) Analyse deep data using natural language processing and machine learning Build a collaborative community with academics, geriatricians, primary care physicians, palliative care physicians, nurses, allied health professionals, social carers, regional/national health data initiatives The work proposed in this work-package will be published in international peer-reviewed conferences and journals. Why is this important? We have access to world-class linked routine health care data in Scotland and in other UK countries through our leadership role in Health Data Research UK. The £4.3M City Deal investment in the ‘DataLoch’ will enhance access, linkage, data security and the core analytical platform for the regional population of 1.3 million people. This provides a superb foundation to harness the potential of quantitative data to inform our understanding of health and care, to underpin new prediction tools, and to support the implementation and evaluation of new models of care with the potential to spread across the UK. However, even in centres of excellence, existing routine data is not perfect for research in later life because critically important data for this context is often only recorded in free text fields (which is a problem in NHS data, but particularly the case for social care data). Image Figure 1. WP3 Architecture of Research Design How will we achieve this? This work-package will provide a data infrastructure to support various data-driven research activities in ACRC and it is composed of 4 areas of tasks as depicted in Figure 1. Lower components provide essential basis for upper ones and right components provide key supports to the left ones. Task 1 The first task, probably the most important at the initial stage for this work-package, is understanding the ‘data infrastructure’ requirements for realising ACRC goals. This will be achieved via a forum that brings together all stakeholders. It will have two deliverables: Data infrastructure requirement specifications. Dataset identification and access. Task 2 Task 2 is to work on terminology standardisation and computable phenotypes. Standardisation is essential in health and social data research, with multiple levels of data standardisations that are relevant to ACRC data infrastructure. It will deliver: A terminology for late life health and social care. A geriatric syndrome ontology (standardised classifications). A phenotype (standardised definitions) library for geriatric syndrome and frailty. Task 3 Task 3 is to use Natural Language Processing (NLP) to analyse deep data from various unstructured data sources to complement structured datasets. Built upon the team’s current NLP work. It will deliver: Adapted NLP models on structured reports of medical imaging data for geriatric medicine. New NLP models for late life health and social care. The transfer of learning NLP and machine learning models for ACRC research. Task 4 Task 4 is to establish an active Research Community for co-design and collaboration on the technical work in this work-package. The community will comprise of leads of other ACRC work-packages, Healthcare and social care professionals, NLP research groups, National health data initiatives (HDR UK), Regional health data initiatives and biomedical AI / MRC precision medicine CDTs. Who are we working with? We work closely with the UK clinical NLP groups under the HDR UK text analytics project including King’s College London, University College London, University of Birmingham, Cambridge University, Swansea University, Manchester University and University of Sheffield. As part of Edinburgh Clinical NLP Group’s collaborations, we work closely with the Mayo Clinic. We will seek collaborations with other top NLP groups such as Stanford NLP group and particularly establish connections with the industry players in clinical NLP such as Deepmind, Facebook and Amazon. Meet the Team: Enhancing the Data Infrastructure Workpackage Lead - Dr. Honghan WuImageHonghan Wu jointly leads the Clinical NLP Group at the University of Edinburgh. Dr Wu is a Lecturer in Health Informatics at UCL, London and a Rutherford Research Fellow doing clinical data science in Usher Institute, University of Edinburgh.Find out more about Honghan Wu on their profile pageWorkpackage Lead - Dr. Beatrice AlexImageBeatrice Alex jointly leads the Clinical NLP Group at the University of Edinburgh. Dr Alex is a Chancellor’s Fellow at the Edinburgh Futures Institute and Turing Fellow at the Alan Turing Institute and the School of Informatics at the University of Edinburgh.Find out more about Beatrice Alex on their profile pageResearch Fellow - Dr. Imane GuellilImageImane Guellil has nine years of experience dedicated to a range of Natural Language Processing (NLP) and data science topics, including her most recent year focusing on clinical NLP. She has consistently demonstrated enthusiasm for applying NLP and Artificial Intelligence (AI) approaches to intricate and challenging problems, always striving to think outside the box. As an illustration, during her Ph.D., she pioneered, a sentiment analysis approach tailored to the Algerian dialect. This dialect poses unique challenges, involving various NLP tasks such as transliteration, translation, and diverse orthographic, syntactic, and morphological analyses. Additionally, during the preparation of her thesis, she concurrently served as an assistant professor at the Higher School of Applied Science in Algeria. This dual role allowed her to impart knowledge in algorithmics and programming languages to a diverse group of undergraduate students. The experience was not only impactful and successful but also brought her immense joy, making it one of the most rewarding periods in her life.Her NLP expertise was also beneficial and applied to the industry when she was involved as a knowledge transfer associate with Aston University. During this position, she has the opportunity to apply and improve the state-of-the-art approaches to a real business problem (the deidentification of customers' sensitive data).She has also contributed significantly to the fields of Natural Language Processing (NLP) and Machine Learning (ML), particularly through literature reviews and research papers with the most recent one dedicated to the detection of geriatric syndromes and adverse events, as a part of her contribution to the AIM-CISC/ACRC projects at the University of Edinburgh. She is now leading the annotation and automatic detection of geriatric syndromes and adverse events from free textResearch Fellow - Dr. Fahrurrozi RahmanImageFahrurrozi Rahman is a Research Fellow in Clinical Natural Language Processing. His research includes continual learning and natural language processing, alongside explorations in poetry and harmony generation. He is working on analysing geriatric phenotypes in clinical text under the supervision of Dr. Beatrice Alex. This article was published on 2024-09-24