Estimación espacio-temporal de procesos Hilbert-valuados Aplicación a la estimación y predicción funcional de mapas de riesgo de enfermedades

  1. Torres Signes, Antoni
Supervised by:
  1. María Dolores Ruiz Medina Director
  2. María del Pilar Frías Bustamante Director

Defence university: Universidad de Granada

Fecha de defensa: 21 May 2021

Committee:
  1. José Miguel Angulo Ibáñez Chair
  2. María Dolores Martínez Miranda Secretary
  3. George Christakos Committee member
  4. Tomás Goicoa Mangado Committee member
  5. Rosa M. Crujeiras Casais Committee member

Type: Thesis

Abstract

The application of techniques for the analysis of functional data correlated in time and/or space is a relatively recent area of research, where a number of problems have arisen and remain open. In particular, the derivation of probabilistic (point processes in function spaces) and statistical (functional spatial and time series) models is required for the analysis of high-dimensional data that often exhibit complex correlation structures in time and/or space. Point processes are used to explain the distribution of points generated by random mechanisms in time and/or space. Such processes allow to model and analyze the incidence or mortality associated with a disease. In this thesis, we have considered the context of doubly stochastic counting processes or Cox processes. In particular, an infinite-dimensional statistical approach, based on functional linear models, has been adopted for the statistical description of the random log-intensity. The spatio-temporal dynamics of these models are analyzed through temporal or spatial processes with values in an appropriate function space. The complexity of these models, given the high dimension of the parameter space (on many occasions we work with infinite-dimensional spaces), makes it essential to implement appropriate dimension reduction techniques, as well as the implementation of model selection procedures. From the theoretical point of view, in the following chapters, new scenarios are introduced in order to apply different estimation methodologies. On the one hand, log-Gaussian Cox processes in Hilbert spaces with random intensity given by an Ornstein-Uhlenbeck process approximated by an autoregressive Hilbertian process (ARH) are developed. These temporal patterns are analyzed from a time-correlated functional data perspective. On the other hand, Cox processes driven by linear infinite-dimensional spatial log-intensities are developed. In this case, these spatial patterns are analyzed from a spatially correlated fun-ctional data perspective. Regarding the methodological approaches adopted for the estimation, in the case of Cox processes driven by an O-U Hilbert-valued log-intensity, approximated by an ARH(1) process, the method of empirical moments has been used. In the case of spatial Cox processes driven by an infinite-dimensional spatial linear random log-intensity, to estimate the parameters modeling the parametric structure of the spectral density operator, under the condition of spatial stationarity, functional spectral techniques based on the periodogram operator extending the Whittle functional have been applied. As a preliminary analysis, we contribute, in the case of spatially stationary real-valued spatial processes, to obtaining sufficient conditions that guarantee the consistency and asymptotic normality of minimum-contrast estimators based on the tapered periodogram. Specifically, in this thesis, from the perspective of infinite-dimensional Cox processes, or Cox processes driven by infinite-dimensional linear log-intensities, not necessarily Gaussian, within the field of functional statistical analysis of point patterns in time and/or space, the following contributions have been established: Study of consistency and asymptotic normality of minimum-contrast estimators in spatial processes. Introduction of the class of temporal log-Gaussian Cox processes with random log-intensity defined by an Ornstein-Uhlenbeck Hilbert-valued process. Approximation of the Ornstein-Uhlenbeck Hilbert-valued processes by ARH(1) processes, using the estimation from the method of empirical moments and calculation of the associated plug-in predictor. Introduction of a new class of Cox processes driven by a linear Hilbertvalued log-intensity. Here, the log-Gaussian process condition, or Gaussian log-intensity, is not required. Neither is it required in the introduction, nor for the consistency result, that the log-intensity is SARH(1). It is only considered in that way in the simulation and application. Introduction of new estimation techniques by minimum componentwise contrast for the previously introduced family of processes (in particular, with SARH intensity). Development of conditions guaranteeing the strong consistency of the proposed estimators. Fitting linear and non-linear trend models in an infinite-dimensional statistical framework for spatio-temporal log-risk processes of disease incidence and mortality. Residual linear correlation in an autoregressive Hilbertian process framework, under a Bayesian approach. Comparison, via cross-validation and bootstrapping techniques, of the presented approaches with regression or prediction models based on machine learning. Epidemiology and the study in general of the evolution, both spatial and temporal, of several diseases has been the fundamental framework considered for the contributions indicated. Specifically, real data have been used for the estimation and functional prediction in time and space of prostate, breast and brain cancer, as well as respiratory diseases, in Spanish provinces, from annual or monthly observations, over periods ranging around thirty years. Furthermore, by implementing the techniques presented throughout the thesis, an application to real data has been carried out to analyze the incidence of a disease in a foreign territory. In particular, the evolution of dengue fever in American countries in recent years has been modeled. On the other hand, given the social emergency situation caused by the COVID-19 pandemic in the last stage of development of the thesis, it has been considered pertinent to include a statistical study on the estimation of the spatio-temporal evolution of the mortality risk, as well as of the daily mortality cases caused by this disease in the Autonomous Communities, which allows reflecting, among other aspects, the effect of the first state of alarm on the behavior of this evolution. In this way, the daily mortality due to COVID-19 in the Spanish Communities during the first wave, specifically from 8 March 2020 to 13 May 2020, has been modeled. The latter practical applications have been developed, based on the infinitedimensional statistical techniques proposed in the development of the thesis, under a classical and Bayesian approach, with modifications in the estimation methodology. Subsequently, in both cases, an empirical comparison has been made with other approaches. In the case of the risk of annual incidence of dengue fever in American countries, a comparison has been made with traditional spatio-temporal models, including a Leroux model, an Intrinsic Condi-tional Autoregressive model and a Besag, York and Mollie model. In the case of daily mortality risk by COVID–19 in the Spanish Autonomous Communities, the proposed approaches have been compared with another methodology based on the estimation by confidence intervals and probability densities using bootstrap techniques, as well as with a battery of models in the context of Machine Learning, including Generalized Regression Neural Networks, Multilayer Perceptron, Support Vector Regression, Bayesian Neural Networks, Neural Networks from Radial Function Bases, and Gaussian Processes. In addition, model selection in the context of parametric non-linear regression is addressed.