# Estimates and bootstrap calibration for functional regression with scalar response

- Martínez Calvo, Adela

- Frédéric Ferraty Director/a
- Wenceslao González Manteiga Director
- Philippe Vieu Director/a

Universidad de defensa: Universidade de Santiago de Compostela

Fecha de defensa: 04 de abril de 2013

- Manuel Febrero Bande Presidente
- Germán Aneiros Pérez Secretario/a
- Ana María Aguilera del Pino Vocal
- Ingrid Van Keilegom Vocal
- Juan José Romo Urroz Vocal

Tipo: Tesis

## Resumen

Nowadays the progress of computational tools (both memory and capacity increasing) allows creating, store and working with large databases. In many cases, the dataset is made up of observations from a finite dimensional distribution, measured over a period of time or recorded at different spatial locations. When the temporal or spatial grid is fine enough, the sample can be considered as an observation of a random variable on a certain functional space. Analysing this kind of data with standard multivariate methods and ignoring its functional feature may fail dramatically (curse of dimensionality, collinearity, valuable information loss, etc.). In these cases, specific statistical techniques are required in order to manage, leak and draw relevant underlying information. This fact has turned Functional Data Analysis (FDA) into one of the most active statistical fields in recent years (see Bosq, 2000; Ramsay and Silverman, 1997, 2002, 2005; Ferraty and Vieu, 2006a, 2006b; Ferraty and Romain, 2011; Horváth and Kokoszka, 2012). Due to the novelty of FDA, there is a wide range of research lines which could be explored, for instance, the construction of models to explain the relationship between functional variables (parametric and nonparametric regression models), and the development of functional statistical inference (confidence intervals, hypotheses testing, etc.). This thesis mainly deals with these two items. As far as the first item is concerned, the work has been focused on the functional linear model with scalar response (although some contributions to nonparametric regression are also included in the last chapter), whereas regarding the second one a bootstrap procedure has been developed, which allows to build confidence intervals and calibrate hypotheses tests related to the linear model. As well as the theoretical developments of the methodology presented throughout the thesis, all the proposed methods were implemented and applied to both simulated and real datasets. For this purpose, the statistical free software R was chosen (see R Development Core Team, 2010), and R routines were developed for the new techniques compiled in the thesis document. The thesis has been structured in the next six chapters. CHAPTER 1. INTRODUCTION TO FDA The first chapter of this thesis is used to fix the notation and give a brief summary of the state of the art on statistical methods for functional data. First of all, it is defined what functional data are, and some examples of functional datasets are given, which will illustrate the methods proposed in the next chapters. Throughout the thesis, it has been assumed that the space where functional variables take values is a real separable Hilbert space. This motivates the introduction of some associated spaces, such as the space of Hilbert-Schmidt operators and the dual space, some tensor notation and semi-metrics, which are very useful in order to determine the closeness of functional observations. Finally, a general background of existing FDA tools is presented: preprocessing techniques (smoothing and registration methods), functional descriptive statistics (measures of position and dispersion), and some key exploratory methods (e.g., Functional Principal Component Analysis (FPCA), which will be recalled later to define functional linear regression estimates). CHAPTER 2. FUNCTIONAL REGRESSION MODELS The chapter is devoted to functional regression models. A general review of functional regression is presented, and then the efforts are concentrated on models with scalar response and functional covariate. There are two main approaches to discuss this subject: the parametric approach and the nonparametric approach. As regards the parametric approach, the most usual parametric model is the functional linear model with scalar response. The two most popular estimates in this situation are introduced in this chapter: estimators based on basis systems, such as the penalized B-splines estimator (see Cardot et al., 2003), and FPCA-type estimators, such as the standard FPCA estimator (see Cardot et al., 1999, 2003, 2007). As far as the nonparametric approach is concerned, the functional version of the multivariate kernel-type estimator is analysed (see Ferraty and Vieu, 2004, 2006b; Ferraty et al., 2007). CHAPTER 3. PRESMOOTHING IN FUNCTIONAL LINEAR REGRESSION The chapter is focused on the functional linear model with scalar response, and explanatory variable valued in a functional space. FPCA has been used to estimate the model functional parameter in recent statistical literature. A modification of this approach by using presmoothing techniques is proposed in this chapter: either presmoothing via covariance structure or presmoothing via response variable. Specifically, four different FPCA-type estimators for the linear model parameter were introduced, all of them based on presmoothing techniques: (i) presmoothing via covariance structure; (ii) presmoothing via response variable; (iii) using Pezzulli and Silverman¿ presmoothed FPCA (Pezzulli and Silverman, 1993); (iv) using Silverman¿s presmoothed FPCA (Silverman, 1996). The first proposal, presmoothing via covariance structure, can be seen as an extension of the ordinary multivariate ridge regression estimator to general Hilbert spaces. The key idea is to avoid ill-conditioned problems by perturbing slightly the eigenvalues of the second moment operator. The consistency and expressions for conditional mean square errors for prediction and estimation were obtained for this estimator. Using the conditional estimation error, it can be seen, from a theoretical point of view, that this presmoothed estimate gets improvement over the FPCA estimate, especially when the model noise is large and/or the sample size is small. As far as the estimator based on presmoothing via response variable is concerned, its consistency was stated and the expressions for conditional mean square errors for prediction and estimation were computed. Regarding to estimators based on Pezzulli and Silverman¿ presmoothed FPCA and Silverman¿s presmoothed FPCA, only their conditional error expressions were computed using some heuristic calculations by means of the standard technique of asymptotic expansions. The effectiveness of the presmothed estimators relative to the standard FPCA estimator and the penalized B-splines estimator is also tested by means of simulation studies and real data applications. The simulations suggest that the presmoothed estimate via covariance structure improves the standard FPCA estimate, especially when the sample size is small, whereas the presmoothed estimator via response variable does not significantly reduce the conditional errors of standard FPCA approach. CHAPTER 4. BOOTSTRAP IN FUNCTIONAL LINEAR REGRESSION Dealing with the functional linear model with functional explanatory variable and scalar response, one of the most popular methods for parameter model estimation is based on FPCA. Weak convergence for a wide class of FPCA-type estimates has recently been proved and, as a result, asymptotic confidence intervals for the linear regression operator can be obtained for a fixed confidence level (see Cardot et al., 2007). In this chapter, an alternative approach in order to compute pointwise confidence intervals by means of a bootstrap procedure is proposed, obtaining also its asymptotic validity. In particular, algorithms for naive and wild bootstrap are developed, and bootstrap intervals are constructed using the pointwise bootstrap quantiles. A simulation study compares the practical performance of asymptotic and bootstrap confidence intervals in terms of length and coverage rates for two linear regression operators and several sample sizes. It was noted that an adequate selection of a pilot parameter involved in the bootstrap procedure makes empirical coverage rates of bootstrap intervals be closer to nominal level than the coverage rates of asymptotic confidence intervals. CHAPTER 5. TESTING IN FUNCTIONAL LINEAR REGRESSION In this chapter, the functional linear model with scalar response is considered but including an intercept term. In this context, a consistent bootstrap method to calibrate the distribution of test statistics for testing the lack of dependence is developed, and the related asymptotic theory is presented. Next, two linear models (satisfying that their functional covariates have the same covariance operator, and their errors have the same variance) are taken. A bootstrap method for checking the equality of the two linear models is introduced, and a study of its main asymptotic properties is done in order to show its consistency and correctness. From a practical point of view, the simulation study showed that bootstrap methods are competitive alternatives to tests based on asymptotic distributions, since they often give test sizes closer to the nominal ones. Finally, a real data example also illustrates the performance of the proposed bootstrap techniques in practice. CHAPTER 6. THRESHOLDING IN NONPARAMETRIC FUNCTIONAL REGRESSION This chapter presents an exploratory tool focused on the detection of underlying complex structures in the nonparametric regression model with scalar response and functional covariate. The proposed methodology analyses the existence of hidden patterns related to the functional covariate and/or the scalar response via a threshold procedure. For this purpose, an adequate threshold function must be chosen by the user according to the structure one wants to detect. A cross-validation criterion which allows estimating the threshold model is also introduced, and the usefulness of its graphical representation is studied. A simulation study and applications to real datasets show the effectiveness of the threshold approach from a practical point of view. In the simulation study, it was found that the threshold estimators and the standard nonparametric estimator obtain similar results in terms of the mean square prediction error, whereas the mean square estimation error can be reduced if each subsample detected by the threshold technique is studied separately. In addition, the real data applications showed that the methodology allows detecting some kind of hidden structures, although the effectiveness of the procedure depends on the choice of the threshold function. REFERENCES Bosq, D. (2000). Linear Processes in Function Spaces: Theory and Applications, volume 149 of Lecture Notes in Statistics. Springer, New York. Cardot, H., Ferraty, F., and Sarda, P. (1999). Functional Linear Model. Statistics & Probability Letters, 45(1):11-22. Cardot, H., Ferraty, F., and Sarda, P. (2003). Spline estimators for the functional linear model. Statistica Sinica, 13(3):571-591. Cardot, H., Mas, A., and Sarda, P. (2007). CLT in functional linear regression models. Probability Theory and Related Fields, 138(3-4):325-361. Ferraty, F., Mas, A., and Vieu, P. (2007). Nonparametric regression on functional data: inference and practical aspects. Australian & New Zealand Journal of Statistics, 49(3):267-286. Ferraty, F. and Romain, Y., editors (2011). The Oxford Handbook of Functional Data Analysis. Oxford Handbooks in Mathematics. Oxford University Press, Oxford. Ferraty, F. and Vieu, P. (2004). Nonparametric models for functional data, with application in regression, time series prediction and curve discrimination. Journal of Nonparametric Statistics, 16(1):111-125. Ferraty, F. and Vieu, P. (2006a). Functional Nonparametric Statistics in Action. In Sperlich, S., Härdle, W., and Aydinli, G., editors, The Art of Semiparametrics, Contributions to Statistics, pages 112-129. Physica-Verlag, Heidelberg. Ferraty, F. and Vieu, P. (2006b). Nonparametric Functional Data Analysis: Theory and Practice. Springer Series in Statistics. Springer, New York. Horváth, L. and Kokoszka, P. (2012). Inference for Functional Data with Applications, volume 200 of Springer Series in Statistics. Springer, New York. Pezzulli, S. and Silverman, B. W. (1993). Some properties of smoothed principal components analysis for functional data. Computational Statistics, 8:1-16. R Development Core Team (2010). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Available at http://www.R-project.org. Ramsay, J. O. and Silverman, B. W. (1997). Functional Data Analysis. Springer Series in Statistics. Springer, New York. Ramsay, J. O. and Silverman, B. W. (2002). Applied Functional Data Analysis. Methods and Case Studies. Springer Series in Statistics. Springer, New York. Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis. Springer Series in Statistics. Springer, New York, second edition. Silverman, B. W. (1996). Smoothed functional principal components analysis by choice of norm. Annals of Statistics, 24(1):1-24.