Towards the Automatic Construction of a Multilingual Dictionary of Collocations using Distributional Semantics

  1. Marcos Garcia 1
  2. Marcos García-Salido 1
  3. Margarita Alonso-Ramos 1
  1. 1 Universidade da Coruña
    info

    Universidade da Coruña

    La Coruña, España

    ROR https://ror.org/01qckj285

Libro:
Electronic lexicography in the 21st century. Proceedings of the eLex 2019 conference. 1-3 October 2019, Sintra, Portugal
  1. Iztok Kosem (ed. lit.)
  2. Tanara Zingano Kuhn (ed. lit.)
  3. Margarita Correia (ed. lit.)
  4. José Pedro Ferreira (ed. lit.)
  5. Maarten Jansen (ed. lit.)
  6. Isabel Pereira (ed. lit.)
  7. Jelena Kallas (ed. lit.)
  8. Miloš Jakubíček (ed. lit.)
  9. Simon Krek (ed. lit.)
  10. Carole Tiberius (ed. lit.)

Editorial: Lexical Computing

Ano de publicación: 2019

Páxinas: 747-762

Congreso: eLEX : Electronic lexicography in the 21st century (6. 2019. Sintra)

Tipo: Achega congreso

Resumo

This paper presents the method used to create a multilingual online dictionary of collocations of English, Portuguese, and Spanish. This resource is built automatically and contains three types of collocations: verb–object (e.g., “[to] issue [an] invoice”), adjective–noun (“deep shame”), and nominal compounds (“cigarette packet”). We take advantage of dependency parsing and statistical association measures to compile collocations of each language, and then we align them with their equivalents in the other languages by means of compositional methods which use cross-lingual models of distributional semantics. Collocations are extracted from large and assorted corpora, and the cross-lingual models are mapped using unsupervised approaches. For each collocation in a given language, the system shows different equivalents in the other languages, ranked by a confidence value. Besides the multilingual perspective, the resulting dictionary can also serve as a monolingual resource to retrieve the collocates of a given base, thus being a useful application to both native speakers and language learners. The dictionary will be published as an online tool, and all the resources generated in this research will be freely available.