Statistical inference in quantile regression models

Conde Amboage, Mercedes

Statistical inference in quantile regression models

Conde Amboage, Mercedes

Supervised by:

Wenceslao González Manteiga Director
César Andrés Sánchez Sellero Co-director

Defence university: Universidade de Santiago de Compostela

Fecha de defensa: 28 April 2017

Committee:

Manuel Febrero Bande Chair
María Dolores Martínez Miranda Secretary
Ingrid Van Keilegom Committee member

Department:

Department of Statistics, Mathematical Analysis and Optimisation

Type: Thesis

Teseo: 466120 DIALNET MINERVA editor

Abstract

Although mean regression achieved its greatest diffusion in the twentieth century, it is very surprising to observe that the ideas of quantile regression were earlier. While the beginning of the least-squares regression can be dated in the year 1805 by the work of Legendre, in the mid-eighteenth century Boscovich already adjusted data on the ellipticity of the Earth through concepts of quantile regression. Quantile regression is employed when the aim of study is centred on the estimation of the different positions (quantiles). This kind of regression allows a more detailed description of the behaviour of the response variable, adapts to situations under more general conditions of the error distribution and enjoys properties of robustness. For all that, the quantile regression is a very useful statistical technology for a large diversity of disciplines. The main purpose of this dissertation is to collect different innovative statistical methods in quantile regression. In this sense, the contributions can be summarized as follows: Quantile regression methods are evaluated for computing predictions and prediction intervals of NOx concentrations measured in the vicinity of the power plant in As Pontes (Spain). A new method to construct prediction intervals involving median regression and bootstrapping the prediction error is proposed. This new method provides better coverage for NOx data compared with different competitors available in the literature. A simulation study illustrates the features of this proposed method showing a better performance for obtaining prediction intervals for datasets that do not meet assumptions of homoscedasticity and normality of the error distribution. The problem of bandwidth selection for local linear quantile regression was addressed in the literature by the usual approaches, such as cross-validation or plug-in methods. Most of the plug-in methods rely on restrictive assumptions on the quantile regression model in relation to the mean regression, or on parametric assumptions. We have presented a plug-in bandwidth selector for nonparametric quantile regression, that is defined from a completely nonparametric approach. To this end, the curvature of the quantile regression function and the integrated sparsity (inverse of the conditional density) are both nonparametrically estimated. The new bandwidth selector is shown to work well in different scenarios, particularly when the conditions commonly assumed in the literature are not satisfied. Two different lack-of-fit tests for quantile regression models have been presented. On the one hand, a new test that is suitable even with high-dimensional covariates is proposed. The test is based on the cumulative sum of residuals with respect to unidimensional linear projections of the covariates. To approximate the critical values of the test, a wild bootstrap mechanism convenient for quantile regression is used. An extensive simulation study was undertaken that shows the good performance of the new test, particularly when the dimension of the covariate is high. The test is illustrated with real data about the economic growth of 161 countries. On the other hand, the second lack-of-fit test is based on interpreting the residuals from the quantile regression model fit as response values of a logistic regression, the predictors of the logistic regression being functions of the covariates of the quantile model. Then a correct quantile model implies the nullity of all the coefficients but the constant in the logistic model. Given this property, we use a likelihood ratio test in the logistic regression to check the quantile regression model. In the case of a multivariate quantile regression, we use predictors obtained as functions of univariate projections of the covariates from the quantile model. Finally, we look for a ''least favourable'' projection for the null hypothesis. A simulation study and an application to real data show the good properties of the new test versus other nonparametric tests available in the literature.