Fuzzy approach to conceptual meaning processing in natural language documents

  1. Soto Villaverde, Andrés
Dirixida por:
  1. José Ángel Olivas Varela Director

Universidade de defensa: Universidad de Castilla-La Mancha

Fecha de defensa: 18 de decembro de 2008

Tribunal:
  1. José Luis Verdegay Galdeano Presidente/a
  2. Francisco Pascual Romero Chicharro Secretario/a
  3. Alejandro Sobrino Vogal
  4. Manuel Prieto Vogal
  5. Miguel Ángel Sicilia Urbán Vogal

Tipo: Tese

Teseo: 184062 DIALNET

Resumo

Development of methods for Information Retrieval based on conceptual aspects is vital to reduce the quantity of unimportant documents retrieved by today search engines, In this thesis, several methods and formulas which help to disambiguate the meaning of the terms used in the user queries are presented. One of these models uses an approach based on synonymy and polysemy in order to identify the most relevant concepts that appear in a document. This way, the document could be better characterized, and its relevance could be better evaluated, according to user preferences. Another model also introduced in this thesis calculates the frequency of the terms that appear in a dictionary definition in order to determine the frequency of the concept associated with that definition. A third model is also presented here, which is similar to the previous one, but with one important difference: in spite of calculating the frequency of the terms that appear in a dictionary definition, it calculates the frequency of the nominal phrases which appears in a dictionary definition in order to determine the frequency of the concept associated with that definition. After that, several results obtained by using those models combined with clustering algorithms are presented in the thesis. Those algorithms were applied to well known test collections as SMART and Reuters, with results that indicate a better performance than the classical approaches. Natural Languages (NL) are basically a system for describing perceptions which are intrinsically imprecise. Zadeh proposed a new approach denominated NL-Computation (Natural Language Computation), which employs new tools as Generalized Constraints (GC) and protoforms (PtF). Assuming that a NL proposition could be expressed by GC, then it could be assumed precise, al least in certain degree. The basic idea proposed by Zadeh is the following: given a description of a perception in NL, to translate it into a GC in order to make precise its meaning. Then the GC is transformed into a protoform, which is an abstract model of the GC. After that, applying the deductive rules associated with the PtF, new information could be deduced. In this thesis, the characteristics of different NL structures such as noun phrases, copulative sentences, comparative sentences and superlative sentences are analyzed, emphasizing their main syntactic and semantic aspects. Those characteristics allow us to specify constraints with respect to the entities that appear involved on those sentences. Methodologies to recognize those structures in NL documents are presented. A specific formal notation to represent those structures as constraining relations is proposed A program which allows transforming sentences expressed by parse trees into object oriented structures is presented. Those structures are used by the program to store and process conveniently the sentences and phrases previously mentioned. Later, another program interprets the O-O structures and provides information about the characteristics of the entities involved on those sentences. In the thesis, symbolic expressions that, as prototypical forms, summarize the semantic structure of the sentences and phrases already mentioned are also proposed. Several examples have been developed to show how those structures could be synthesised and manipulated. It is also shown that we can obtain new information that was not present in the original text. Therefore these ideas could be used to develop Question Answering Systems.