Contributions to distributional regression models. Applications in biomedicine

ESPASANDÍN DOMÍNGUEZ, JENIFER

Contributions to distributional regression models. Applications in biomedicine

ESPASANDÍN DOMÍNGUEZ, JENIFER

Dirixida por:

Carmen María Cadarso Suárez Director
Thomas Kneib Co-director
Francisco Gude Sampedro Co-director

Universidade de defensa: Universidade de Santiago de Compostela

Fecha de defensa: 19 de xullo de 2019

Tribunal:

María Luz Durbán Reguera Presidente/a
Javier Roca Pardiñas Secretario/a
Bruno Cecilio de Sousa Vogal

Departamento:

Departamento de Estatística, Análise Matemática e Optimización

Tipo: Tese

Teseo: 599883 DIALNET

Resumo

The origin of this work lies in the convergence of two lines of research: a statistics research line undertaken by the Group in Biostatistics and Biomedical Data Science, GI-2127, and a clinical research line followed by the Research Methods Group, C017; both groups belong to the Instituto de Investigación Sanitaria de Santiago de Compostela. For many years, they have coordinated their work to perform interdisciplinary biostatistical research. The present study makes a contribution in statistical methodology to aid research into the factors involved in protein glycation. Earlier projects generated a sample of the general adult population for which extensive phenotypic details are available, as well as stored biological samples that can be used to study chronic diseases related to ageing, such as diabetes. The results of week-long continuous monitoring of interstitial glucose concentrations are also available for some members of this population; glucose profiles are therefore available as functional data for each of these individuals. The present work proposes statistical methods for functional data that take into account all the information contained in glucose curves. This thesis also discusses frequentist and Bayesian models of distributional regression for univariate responses (Klein et al., 2015). Compared to classical regression models based on the linear estimation of the mean of the response variable, these models allow for great flexibility in modelling the response variable and any possible predictor covariates. The incorporation of reference bands into quantile-quantile plots – as a generalization of Augustin et al. (2012) – is one of the major statistical contribution of the present work in the context of distributional regression models. Despite the advantages offered by univariate distributional regression, it has some practical limitations. For example, multivariate responses cannot be modelled, which would be of great interest within the framework of this thesis for simultaneously studying the glycation of several proteins (e.g., glycated haemoglobin and fructosamine). Tackling the problem of multivariate response modelling thus requires new multivariate regression techniques such as copula distributional regression models (Klein and Kneib, 2016b; Marra and Radice, 2017a). These techniques are discussed in this work from a frequentist and Bayesian standpoint, and for the first time compared via a simulation and via real data studies. Functional statistics (Ramsay and Silverman, 2005) are useful for incorporating the above-mentioned glucose profiles into regression models, but this time by means of entering as covariate in distributional regression models (as defined by Klein et al., 2014a). The present work adapts the techniques of Brockhaus et al. (2018) to allow the incorporation of functional data as covariates into univariate distributional regression, and validates the methodology proposed via a simulation study. For the context of multivariate analysis, an extension of the methodology of McLean et al. (2014) is presented that allows functional covariates to be contemplated in distributional regression models based on the copulas proposed by Marra and Radice (2017a). No other copula regression model exists that allows functional covariates to be modelled in a flexible manner. Both extensions are used in the context of continuous glucose monitoring. The code used in the present work is provided as supplementary material to allow the statistical techniques discussed to be reproduced and used by others.