Topological Data Analysis of High-dimensional Correlation Structures with Applications in Epigenetics

  1. Prada Alonso, Sara
Dirixida por:
  1. Antonio Gómez Tato Director
  2. María de los Angeles Casares de Cal Director

Universidade de defensa: Universidade de Santiago de Compostela

Fecha de defensa: 05 de febreiro de 2021

Tribunal:
  1. Manuel Calaza Cabanas Presidente/a
  2. Ana María Freire Aradas Secretaria
  3. Desamparados Fernández Ternero Vogal
Departamento:
  1. Departamento de Matemáticas

Tipo: Tese

Resumo

This thesis comprises a comprehensive study of the correlation of highdimensional datasets from a topological perspective. Derived from a lack of efficient algorithms of big data analysis and motivated by the importance of finding a structure of correlations in genomics, we have developed two analytical tools inspired by the topological data analysis approach that describe and predict the behavior of the correlated design. Those models allowed us to study epigenetic interactions from a local and global perspective, taking into account the different levels of complexity. We applied graph-theoretic and algebraic topology principles to quantify structural patterns on local correlation networks and, based on them, we proposed a network model that was able to predict the locally high correlations of DNA methylation data. This model provided with an efficient tool to measure the evolution of the correlation with the aging process. Furthermore, we developed a powerful computational algorithm to analyze the correlation structure globally that was able to detect differentiated methylation patterns over sample groups. This methodology aimed to serve as a diagnostic tool, as it provides with selected epigenetic biomarkers associated with a specific phenotype of interest. Overall, this work establishes a novel perspective of analysis and modulation of hidden correlation structures, specifically those of great dimension and complexity, contributing to the understanding of the epigenetic processes, and that is designed to be useful for non-biological fields too.