Using genetic evidence and Bayesian shrinkage methods to discover proteomics biomarkers for the early diagnosis of autoimmune rheumatic diseases | Precision Medicine Doctoral Training Programme

Precision Medicine Project - Using genetic evidence and Bayesian shrinkage methods to discover proteomics biomarkers for the early diagnosis of autoimmune rheumatic diseases

Supervisor(s): Dr Athina Spiliopoulou, Prof Paul McKeigue & Dr Svitlana Braichenko

Centre/Institute: Usher Institute

Background

Autoimmune rheumatic diseases (ARDs) are diseases where the body’s immune system mistakenly attacks its own tissues, commonly the joints, causing chronic inflammation, pain, and fatigue. Early effective treatment can provide substantial long-term benefits by preventing or slowing down progression to irreversible tissue damage and disability in affected individuals. However, diagnosing ARDs early is challenging. Around 30% of all GP consultations present with non-specific musculoskeletal symptoms. Among these, only about 1 in 40 patients has an ARD, therefore, referral to rheumatology is typically delayed until more specific symptoms have developed. This postpones access to treatment, leaving the damaging inflammatory process uncontrolled.

The aim of this project is to develop proteomics-based models for distinguishing individuals with ARDs from those without in the early stages of the disease, and to identify a sparse set of proteins suitable for inclusion in a diagnostic biomarker panel. We will analyse high-throughput proteomic data from the UK Biobank, a large-scale population cohort study that has measured 3,000 circulating proteins in 50,000 participants. Measurements of 5,000 proteins in all 500,000 participants are currently under way and are expected to be available to researchers by Q1 of 2027. Adding proteomic measurements to standard clinical covariates has been shown to improve prediction of incident disease for several conditions¹. For developing useful diagnostic panels for ARDs, a key challenge is how to select the smallest number of proteins that are most predictive, given the relatively low numbers of cases and the high dimensionality of the proteomics data.

We will address this challenge in three ways:

1. Exploit recent advances in Bayesian computation to learn sparse predictive models, where most biomarker have no effect, using hierarchical shrinkage priors on the effect sizes². This approach overcomes limitations of older methods such as LASSO – large effects are not shrunk, and cross-validation is not required to learn the penalty parameters. Cross-validation is required only in a final step to evaluate predictive performance on data not seen before. We will combine this approach with projection predictive variable selection, a method that automatically selects the most informative subset of variables from the full model by quantifying the percent of predictive information gained by each additional variable.

2. Exploit our recent findings on putative core genes for rheumatoid arthritis³, systemic lupus erythematosus⁴, and other ARDs, to prioritise proteins that are likely to have a causal role in these diseases.

3. Learn proteomics models to classify prevalent disease cases and controls and use findings from these models to set informative prior distributions in the models for incident disease.

Aims

Develop and evaluate the performance of a sparse proteomics model to discriminate incident disease cases from controls for each of six ARDs that have at least 400 incident cases in the full UK Biobank cohort (rheumatoid arthritis, axial spondyloarthritis, psoriatic arthritis, polymyalgia rheumatica, systemic lupus erythematosus, systemic sclerosis).
Develop and evaluate performance of a sparse proteomics model to discriminate incident ARD cases from incident cases of osteoarthritis and other conditions that cause musculoskeletal symptoms. Such a diagnostic would be useful for triaging referral to rheumatology in primary care.

Training outcomes

The student will develop an in-depth understanding of data science methodologies and the ability to analyse and interpret complex data (including genomic, biomarker and electronic health record data) to inform personalised care strategies. Core technical areas of learning will include Bayesian inference and machine learning, biostatistics, statistical genetics, scientific programming and database management. Soft skills in science communication and collaboration will be fostered through participation in regular meetings with clinical academic rheumatologists, patient partners and other stakeholders, scheduled as part of Dr Spiliopoulou’s Arthritis UK fellowship.

References

Carrasco-Zanini J, Pietzner M, Davitte J, et al. Proteomic signatures improve risk prediction for common and rare diseases. Nat Med. 2024;30(9):2489-2498. doi:10.1038/s41591-024-03142-z
Piironen J, Vehtari A. On the Hyperprior Choice for the Global Shrinkage Parameter in the Horseshoe Prior. In: Singh A, Zhu J, eds. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. Vol 54. PMLR; 2017:905-913. http://proceedings.mlr.press/v54/piironen17a.html
Spiliopoulou A, Iakovliev A, Plant D, et al. Genome-Wide Aggregated Trans Effects Analysis Identifies Genes Encoding Immune Checkpoints as Core Genes for Rheumatoid Arthritis. Arthritis & Rheumatology. 2025;77(7):817-826. doi:10.1002/art.43125
Iakovliev A, Castellini-Pérez O, Erabadda B, et al. Discovery of core genes for systemic lupus erythematosus via genome-wide aggregated trans-effects analysis. Genes Immun. Published online September 3, 2025:1-12. doi:10.1038/s41435-025-00352-4

Apply Now

Click here to Apply Now

The deadline for 26/27 applications is Monday 12th January 2026
Applicants must apply to a specific project. Please ensure you include details of the project on the Recruitment Form below, which you must submit to the research proposal section of your EUCLID application.
Please ensure you upload as many of the requested documents as possible, including a CV, at the time of submitting your EUCLID application.

Precision Medicine Recruitment Form (878.56 KB / DOCX)

Q&A Sessions

Supervisor(s) of each project will be holding a 30 minute Q&A session in the first two weeks of December.

If you have any questions regarding this project, you are invited to attend the session on Wednesday 10th December at 11am GMT via Microsoft Teams. Click here to join the session.

This article was published on 2024-11-04