Applying Joint Modelling Regression Approaches in Biomedical Data Science
- Díaz Louzao, Carla
- Francisco Gude Sampedro Director
Defence university: Universidade de Santiago de Compostela
Fecha de defensa: 15 December 2022
- Javier Roca Pardiñas Chair
- Angel Salgado Barreira Secretary
- Ana Claveria Fontan Committee member
Type: Thesis
Abstract
The research work carried out in this thesis is based on the collaboration of two research groups: a biostatistics group (Group in Biostatistics and Biomedical Data Sci- ence, GI-2127: GRID-BDS) belonging to the University of Santiago de Compostela (USC), and a clinical medicine group (Research Methods Group, C017) belonging to the Instituto de Investigaci ́on Sanitaria de Santiago de Compostela (IDIS). These groups have been coordinating for years to carry out interdisciplinary research in the field of biostatistics. This collaboration is particularly noteworthy in the current era, in which technology makes it possible to collect huge amounts of all kinds of variables, thus creating enormously complex databases that pose a challenge when it comes to processing, analysing and interpreting them. This complexity of data requires the collaboration of statisticians and clinicians to be able to carry out a biomedical research study with guarantees, applying data science tools. In this thesis, biostatistical models for multivariate response are studied. In gen- eral, in this type of modelling, the different response variables are related to each other, so that the individual analysis of each one can lead to biased results. More- over, sometimes the importance of the study lies not only in the modelling of the re- sponses, but also in their correlation. It is in this context that joint modelling is born and, despite their great usefulness, the use of these techniques is not yet widespread in biomedicine. Consequently, the main objective of this thesis is to bring this type of models closer to clinical practice, and to future epidemiological studies. To this end, we have real databases on liver damage in patients with COVID-19, perinatal mental health during the COVID-19 pandemic, and thyroid-related hormones in a healthy adult population. Specifically, we start with a review of joint distributional modelling for bivari- ate response, following the methodology of Copula Generalised Additive Models for Location, Scale and Shape (CGAMLSS), introduced by Marra and Radice (2017). These models allow the distribution of two response variables to be fully deter- mined by flexibly modelling each parameter of the response variable (not just its mean) as a function of other covariates of interest. They also allow modelling of the correlation between the two responses. This methodology is applied in the study of the correlation between transaminases and inflammation markers in patients with COVID-19, as well as in the study of the influence of the measures taken by govern- ments during the pandemic on perinatal mental health. More recently, Multivariate Conditional Transformation Models (MCTM; Klein et al., 2019) have emerged as a more flexible and faster alternative to CGAMLSS models. However, the software for its implementation is still under development, and its implementation is limited to a single explanatory variable. We yapply this methodology to data from healthy patients to determine the joint distribution of the three main thyroid-related hormones (TSH, free T3 and free T4), as well as their variation with patient age. The aim is that it can be used as an aid in the diagnosis of thyroid diseases. We should also highlight a specific type of joint modelling whose response vari- ables are, on the one hand, longitudinal markers (repeated measures) and, on the other hand, one or several events of interest (e.g. death and the development of a certain disease) and the time to these events. In longitudinal studies, in which the values of some markers are collected repeatedly over time, it is very common to find variables of this kind. To study the association between longitudinal and survival processes, it is necessary to analyse them jointly by treating the longitudinal mark- ers as response variables, and then incorporating their corresponding estimates into the survival model as covariates. Joint regression models for longitudinal and sur- vival data (JMLS; Tsiatis and Davidian, 2004) have emerged precisely for this type of analysis. In this thesis we introduce the most recent and complete extension of these models (Mauff et al., 2020), illustrating it in practice by analysing the influ- ence of three transaminases (AST, ALT and GGT), collected over time, on survival in patients with COVID-19. Finally, in order to bring these models closer to clinical practice, we provide a tutorial with the necessary code for the analyses carried out during the course of this doctoral thesis