Efficient query over large datasets of analytical chemistry

  1. Luaces Cachaza, David
unter der Leitung von:
  1. José Ramón Ríos Viqueira Doktorvater
  2. Tomás F. Pena Doktorvater

Universität der Verteidigung: Universidade de Santiago de Compostela

Fecha de defensa: 14 von Juli von 2023

Gericht:
  1. Sergio Ilarri Artigas Präsident/in
  2. José Manuel Cotos Yáñez Sekretär
  3. Laura Po Vocal
Fachbereiche:
  1. Departamento de Electrónica e Computación

Art: Dissertation

Zusammenfassung

The efficient management of molecular data is one of the most demanded technologies by the industry. A very important type of search is the substructure searching. The molecular structures may be encoded as graphs where the vertices and bonds represent the atoms and bonds, respectively. In this Thesis, a cutting edge system that enables the storage and querying of molecular data has been designed and implemented, paying attention to the molecular substructure search, where new filter-then-verify(FTV) methods, beyond the state-of-the-art, were designed, implemented, and tested, achieving performance gains over 75% in the filtering stage. A generic framework for the implementation of FTV techniques on a distributed architecture was also developed, enabling the application of the FTV methods on very large graph databases, achieving a great performance gain in both index building and query execution. Finally, the Thesis presents a study for the use of different FTV solutions to obtain approximate results in an interactive searching application.