Nonparametric inference for classification and association with high dimensional genetic data
- García-Magariños, Manuel
- Antonio Salas Ellacuriaga Doktorvater
- Wenceslao González Manteiga Doktorvater
- Ricardo Cao Abad Doktorvater/Doktormutter
Universität der Verteidigung: Universidade de Santiago de Compostela
Fecha de defensa: 29 von Januar von 2010
- Ángel Carracedo Álvarez Präsident
- Carmen María Cadarso Suárez Sekretärin
- Ignacio López de Ullibarri Galparsoro Vocal
- Vincent Macaulay Vocal
- Thore Egeland Vocal
Art: Dissertation
Zusammenfassung
Over the last years, genetic advances have meant a revolution that has expanded beyond genetic borders, influencing the future of many other scientific areas, As the boom of genetics has caused the arising of countless high dimensional datasets containing DNA/RNA profiles, statistics is the science required to deal with them. Not only new tools need to be developed, but also existing methods can be adapted, and their abilities evaluated, to be applied to genetic data. The term genetic data include a wide variety of datasets, having in common only the fact of coming from DNA information: from SNPs (categorical data) to gene expression measures (continuous data). Inside this DNA information could be the answer to many common diseases with a complex basis (psychiatric disorders, cancer, diabetes, etc), so the main aim of statistics is to provide with proper, powerful techniques, able to unravel the underlying nature of complex diseases. This essay contains several statistical approaches to both gene expression data and SNP/STR data. There is place here for penalized regression, machine learning or tree-based methods. Although the emphasis lays on clinical genetics, statistical tools for population and forensic genetics are also explained.