Efficient query over large datasets of analytical chemistry

  1. Luaces Cachaza, David
Supervised by:
  1. José Ramón Ríos Viqueira Director
  2. Tomás F. Pena Director

Defence university: Universidade de Santiago de Compostela

Fecha de defensa: 14 July 2023

Committee:
  1. Sergio Ilarri Artigas Chair
  2. José Manuel Cotos Yáñez Secretary
  3. Laura Po Committee member
Department:
  1. Department of Electronics and Computing

Type: Thesis

Abstract

The efficient management of molecular data is one of the most demanded technologies by the industry. A very important type of search is the substructure searching. The molecular structures may be encoded as graphs where the vertices and bonds represent the atoms and bonds, respectively. In this Thesis, a cutting edge system that enables the storage and querying of molecular data has been designed and implemented, paying attention to the molecular substructure search, where new filter-then-verify(FTV) methods, beyond the state-of-the-art, were designed, implemented, and tested, achieving performance gains over 75% in the filtering stage. A generic framework for the implementation of FTV techniques on a distributed architecture was also developed, enabling the application of the FTV methods on very large graph databases, achieving a great performance gain in both index building and query execution. Finally, the Thesis presents a study for the use of different FTV solutions to obtain approximate results in an interactive searching application.