Linking electronic health records for research on a nationwide cohort

New linked health data resource covering over 54 million people in England - over 96% of the English population

For the first time, a new linked health data resource covering 54.4 million people – over 96% of the English population – is now available for researchers from across the UK to collaborate in NHS Digital’s secure research environment. This resource will enable vital research to take place into COVID-19 and cardiovascular disease, with the aim of improving treatments and care for patients.

This work has been led by the CVD-COVID-UK consortium in partnership with NHS Digital. The new resource links health data from GP records, hospital data, death records, COVID-19 laboratory test data and data on medications dispensed from pharmacies, and is accessible to CVD-COVID-UK consortium researchers in NHS Digital’s Trusted Research Environment (TRE) Service for England.

The CVD-COVID-UK consortium is a collaborative group of more than 130 members across 40 institutions working to understand the relationship between COVID-19 and cardiovascular diseases. The consortium is managed by the British Heart Foundation (BHF) Data Science Centre, led by Health Data Research UK.

Because of our partnership with NHS Digital, researchers are now able to access health data at a scale that a year ago was hardly even imaginable. The combination of the data and the new Trusted Research Environment are allowing research teams across the UK to work together to answer questions about a very wide range of common and rare health conditions. This will help health professionals, patients, carers and health service planners make better decisions to benefit the health of the whole country, including people of all ages, ethnic groups, social backgrounds, and geographic locations.

Access to health data in the TRE is only available to approved researchers, through a secure process that ensures researchers have access to the data they need, while protecting people’s privacy. The data made available for research are de-identified (i.e., removing a person’s name, address and exact date of birth) and pseudonymised (i.e., replacing a person’s unique NHS number with a non-identifying unique master key).

Professor Cathie Sudlow
Personal Chair of Neurology and Clinical Epidemiology, Usher Institute;  Director of the BHF Data Science Centre

The ability to link different types of health data from almost the entire population of England provides a more complete and accurate picture of the impact of COVID-19 on patients with diseases of the heart and circulation than has been possible before now. It will also provide the data to understand whether patients with COVID-19 are more likely to go on to develop diseases of the heart and circulation, such as heart attack and stroke.

This linked health data includes comprehensive information on key details such as sex, age and ethnicity. Looking at only one dataset in isolation (e.g. only GP data) provides a more limited picture. For example, linking GP and hospital data together provides information on ethnicity for 95% of the population, whereas looking at only GP data provides this information for less than 70%.

Related links

Pre-print publication describing the cohort

Information on the linked health data resource, on the HDR Innovation Gateway

NHS Digital’s Trusted Research Environment (TRE) Service for England

BHF Data Science Centre website | @BHFDataScience