La Wikipedia como fuente multilingüe de corpus comparables
- Isaac González López 1
- Pablo Gamallo Otero 1
-
1
Universidade de Santiago de Compostela
info
- Isabel Moskowich-Spiegel Fandiño (coord.)
- Begoña Crespo García (coord.)
- Inés Lareo Martín (coord.)
- Paula Lojo Sandino (coord.)
Verlag: Servizo de Publicacións ; Universidade da Coruña
ISBN: 978-84-9749-401-4
Datum der Publikation: 2010
Titel des Bandes: Part I, A-K
Ausgabe: 1
Seiten: 369-378
Kongress: International Conference on Corpus Linguistics (2. 2010. A Coruña)
Art: Konferenz-Beitrag
Zusammenfassung
This article describes an automatic method to select comparable corpora from Wikipedia usingcategories as topic restrictions. Our strategy is based on two properties of Wikipedia: to be amultilingual resource and to be a free encyclopedia available in a XML file. Tools and corpus will bedistributed under GPL license (General Public License).