La Wikipedia como fuente multilingüe de corpus comparables

  1. Isaac González López 1
  2. Pablo Gamallo Otero 1
  1. 1 Universidade de Santiago de Compostela
    info

    Universidade de Santiago de Compostela

    Santiago de Compostela, España

    ROR https://ror.org/030eybx10

Libro:
Language Windowing through Corpora
  1. Isabel Moskowich-Spiegel Fandiño (coord.)
  2. Begoña Crespo García (coord.)
  3. Inés Lareo Martín (coord.)
  4. Paula Lojo Sandino (coord.)

Editorial: Servizo de Publicacións ; Universidade da Coruña

ISBN: 978-84-9749-401-4

Ano de publicación: 2010

Título do volume: Part I, A-K

Volume: 1

Páxinas: 369-378

Congreso: International Conference on Corpus Linguistics (2. 2010. A Coruña)

Tipo: Achega congreso

Resumo

This article describes an automatic method to select comparable corpora from Wikipedia usingcategories as topic restrictions. Our strategy is based on two properties of Wikipedia: to be amultilingual resource and to be a free encyclopedia available in a XML file. Tools and corpus will bedistributed under GPL license (General Public License).