Spatial depth-based methods for functional data
- Sguera, Carlo
- Rosa Elvira Lillo Rodríguez Director
- Pedro Galeano Director
Universidade de defensa: Universidad Carlos III de Madrid
Fecha de defensa: 28 de novembro de 2014
- Juan José Romo Urroz Presidente/a
- Manuel Febrero Bande Secretario
- Ricardo Fraiman Vogal
Tipo: Tese
Resumo
In this thesis we deal with functional data, and in particular with the notion of functional depth. A functional depth is a measure that allows to order and rank the curves in a functional sample from the most to the least central curve. In functional data analysis (FDA), unlike in univariate statistics where R provides a natural order criterion for observations, the ways how several existing functional depths rank curves differ among them. Moreover, there is no agreement about the existence of a best available functional depth. For these reasons among others, there is still ongoing research in the functional depth topic and this thesis intends to enhance the progress in this field of FDA. As first contribution, we enlarge the number of available functional depths by introducing the kernelized functional spatial depth (KFSD). In the course of the dissertation, we show that KFSD is the result of a modification of an existing functional depth known as functional spatial depth (FSD). FSD falls into the category of global functional depths, which means that the FSD value of a given curve relative to a functional sample depends equally on the rest of the curves in the sample. However, first in the multivariate framework, where also the notion of depth is used, and then in FDA, several authors suggested that a local approach to the depth problem may result useful. Therefore, some local depths for which the depth value of a given observation depends more on close than distant observations have been proposed in the literature. Unlike FSD, KFSD falls in the category of local depths, and it can be interpreted as a local version of FSD. As the name of KFSD suggests, we achieve the transition from global to local proposing a kernel-type modification of FSD. KFSD, as well as any functional depth, may result useful for several purposes. For instance, using KFSD it is possible to identify the most central curve in a functional sample, that is, the KFSD-based sample median. Also, using the p% most central curves, we can draw a p%-central region (0 < p < 100). Another application is the computation of robust means such as the -trimmed mean, 0 < < 1, which consists in the functional mean calculated after deleting the proportion of least central curves. The use of functional depths in FDA has gone beyond the previous examples and nowadays functional depths are also used to solve other types of problems. In particular, in this thesis we consider supervised functional classification and functional outlier detection, and we study and propose methods based on KFSD. Our approach to both classification and outlier detection has a main feature: we are interested in scenarios where the solution of the problem is not extremely graphically clear. In more detail, in classification we focus on cases in which the different groups of curves are hardly recognizable looking at a graph, and we overlook problems where the classes of curves are easily graphically detectable. Similarly, we do not deal with outliers that are excessively distant from the rest of the curves, but we consider low magnitude, shape and partial outliers, which are harder to detect. We deal with this type of problems because in these challenging scenarios it is possible to appreciate important differences among both depths and methods, while these differences tend to be much smaller in easier problems. Regarding classification, methods based on functional depths are already available. In this thesis we consider three existing depth-based procedures. For the first time, several functional depths (KFSD and six more depths) are employed to implement these depth-based techniques. The main result is that KFSD stands out among its competitors. Indeed, KFSD, when used together with one of the depth based methods, i.e., the within maximum depth procedure, shows the most stable and best performances along a simulation study that considers six different curve generating processes and for the classification of two real datasets. Therefore, the results supports the introduction of KFSD as a new functional depth. For what concerns outlier detection, we also consider some existing depth-based procedures and the above-mentioned battery of functional depths. In addition, we propose three new methods exclusively designed for KFSD. They are all based on a desirable feature for a functional depth, that is, a functional depth should assign a low depth value to an outlier. During our research, we have observed that KFSD is endowed with this feature. Moreover, thanks to its local approach, KFSD in general succeeds in ranking correctly outliers that do not stand out evidently in a graph. However, a low KFSD value is not enough to detect outliers, and it is necessary to have at disposal a threshold value for KFSD to distinguish between normal curves and outliers. Indeed, the three methods that we present provide alternative ways to choose a threshold for KFSD. The simulation study that we carry out for outlier detection is similarly extensive as in classification. Besides our proposals, we consider three existing depth-based methods and seven depths, and two techniques that do not use functional depths. The results of this second simulation study are also encouraging: the proposed KFSD-based methods are the only procedures that have good correct outlier detection performances in all the six scenarios and for the two contamination probabilities that we consider. To summarize, in this thesis we will present a new local functional depth, KFSD, which will turn out to be a useful tool in supervised classification, when it used in conjunction with some existing depth-based methods, and in outlier detection, by means of some new procedures that we will also present in this work.