Efficient query over large datasets of analytical chemistry

  1. Luaces Cachaza, David
Dirixida por:
  1. José Ramón Ríos Viqueira Director
  2. Tomás F. Pena Director

Universidade de defensa: Universidade de Santiago de Compostela

Fecha de defensa: 14 de xullo de 2023

Tribunal:
  1. Sergio Ilarri Artigas Presidente/a
  2. José Manuel Cotos Yáñez Secretario
  3. Laura Po Vogal
Departamento:
  1. Departamento de Electrónica e Computación

Tipo: Tese

Resumo

The efficient management of molecular data is one of the most demanded technologies by the industry. A very important type of search is the substructure searching. The molecular structures may be encoded as graphs where the vertices and bonds represent the atoms and bonds, respectively. In this Thesis, a cutting edge system that enables the storage and querying of molecular data has been designed and implemented, paying attention to the molecular substructure search, where new filter-then-verify(FTV) methods, beyond the state-of-the-art, were designed, implemented, and tested, achieving performance gains over 75% in the filtering stage. A generic framework for the implementation of FTV techniques on a distributed architecture was also developed, enabling the application of the FTV methods on very large graph databases, achieving a great performance gain in both index building and query execution. Finally, the Thesis presents a study for the use of different FTV solutions to obtain approximate results in an interactive searching application.