La Wikipedia como fuente multilingüe de corpus comparables
- Isaac González López 1
- Pablo Gamallo Otero 1
-
1
Universidade de Santiago de Compostela
info
- Isabel Moskowich-Spiegel Fandiño (coord.)
- Begoña Crespo García (coord.)
- Inés Lareo Martín (coord.)
- Paula Lojo Sandino (coord.)
Publisher: Servizo de Publicacións ; Universidade da Coruña
ISBN: 978-84-9749-401-4
Year of publication: 2010
Volume Title: Part I, A-K
Volume: 1
Pages: 369-378
Congress: International Conference on Corpus Linguistics (2. 2010. A Coruña)
Type: Conference paper
Abstract
This article describes an automatic method to select comparable corpora from Wikipedia usingcategories as topic restrictions. Our strategy is based on two properties of Wikipedia: to be amultilingual resource and to be a free encyclopedia available in a XML file. Tools and corpus will bedistributed under GPL license (General Public License).