Entity-Centric Coreference Resolution of Person Entities for Open Information Extraction

  1. Marcos García
  2. Pablo Gamallo
Revista:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Ano de publicación: 2014

Número: 53

Páxinas: 25-32

Tipo: Artigo

Outras publicacións en: Procesamiento del lenguaje natural

Resumo

Este trabajo presenta un sistema de resolución de correferencia de entidades persona cuya arquitectura se basa en la aplicación secuencial de módulos de resolución independientes y en una estrategia centrada en las entidades. Diversas evaluaciones indican que el sistema obtiene resultados prometedores en varios escenarios (≈71% y ≈ 81% de F1 CoNLL). Con el fin de analizar la influencia de la resolución de correferencia en la extracción de información, un sistema de extracción de información abierta se ha aplicado sobre textos con anotación correferencial. Los resultados de este experimento indican que la extracción de información mejora tanto en cobertura como en precisión. Las evaluaciones han sido realizadas en español, portugués y gallego, y todas las herramientas y recursos son distribuidos libremente..

Referencias bibliográficas

  • Bagga, Amit and Breck Baldwin. 1998. Algorithms for scoring coreference chains. In Proceedings of the Workshop on Linguistic Coreference at the 1st International Conference on Language Resources and Evaluation, volume 1, pages 563-566.
  • Baldwin, Breck. 1997. CogNIAC: high precision coreference with limited knowledge and linguistic resources. In Proceedings of a Workshop on Operational Factors in Practical, Robust Anaphora Resolution for Unrestricted Texts, pages 38-45.
  • Banko, Michele, Michael J Cafarella, Stephen Soderland, Matt Broadhead, and Oren Etzioni. 2007. Open information extraction from the web. In Proceedings of the International Joint Conference on Artificial Intelligence, pages 2670-2676.
  • Fader, Anthony, Stephen Soderland, and Oren Etzioni. 2011. Identifying relations for open information extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 1535-1545.
  • Ferrández, Antonio and Jesús Peral. 2000. A computational approach to zero-pronouns in spanish. In Proceedings of the Annual Meeting on Association for Computational Linguistics, pages 166-172.
  • Gamallo, Pablo and Marcos Garcia. 2011. A resource-based method for named entity extraction and classification. In Progress in Artificial Intelligence (LNCS/LNAI), volume 7026/2011, pages 610-623.
  • Gamallo, Pablo, Marcos Garcia, and Santiago Fernández-Lanza. 2012. Dependencybased Open Information Extraction. In Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP, pages 10-18.
  • Gamallo, Pablo and Isaac González López. 2011. A Grammatical Formalism Based on Patterns of Part-of-Speech Tags. International Journal of Corpus Linguistics, 16(1):45-71.
  • Garcia, Marcos and Pablo Gamallo. 2010. Análise Morfossintáctica para Português Europeu e Galego: Problemas, Solucoes e Avaliacao. Linguamática, 2(2):59-67.
  • Garcia, Marcos and Pablo Gamallo. 2014a. An Entity-Centric Coreference Resolution System for Person Entities with Rich Linguistic Information. In Proceedings of the International Conference on Computational Linguistics.
  • Garcia, Marcos and Pablo Gamallo. 2014b. Multilingual corpora with coreference annotation of person entities. In Proceedings of the Language Resources and Evaluation Conference, pages 3229-3233.
  • Garcia, Marcos, Iria Gayo, and Isaac González López. 2012. IdentificaÇao e ClassificaÇao de Entidades Mencionadas em Galego. Estudos de Lingüística Galega, 4:13-25.
  • Haghighi, Aria and Dan Klein. 2007. Unsupervised coreference resolution in a non-parametric bayesian model. In Proceedings of the Annual Meeting on Association for Computational Linguistics, volume 45, pages 848-855.
  • Lappin, Shalom and Herbert J. Leass. 1994. An algorithm for pronominal anaphora resolution. Computational linguistics, 20(4):535-561.
  • Lee, Heeyoung, Angel Chang, Yves Peirsman, N. Chambers, Mihai Surdeanu, and Dan Jurafsky. 2013. Deterministic coreference resolution based on entity-centric, precision-ranked rules. Computational Linguistics, 39(4):885-916.
  • Luo, Xiaoqiang. 2005. On Coreference Resolution Performance Metrics. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 25-32.
  • Mitkov, Ruslan. 1998. Robust pronoun resolution with limited knowledge. In Proceedings of the Annual Meeting of the Association for Computational Linguistics and International Conference on Computational Linguistics, volume 2, pages 869-875.
  • Padró, Lluís and Evgeny Stanilovsky. 2012. FreeLing 3.0: Towards Wider Multilinguality. In Proceedings of the Language Resources and Evaluation Conference.
  • Palomar, Manuel, Antonio Ferrández, Lidia Moreno, Patricio Martínez-Barco, Jesús Peral, Maximiliano Saiz-Noeda, and Rafael Mu~noz. 2001. An algorithm for anaphora resolution in Spanish texts. Computational Linguistics, 27(4):545-567.
  • Pradhan, Sameer, Lance Ramshaw, Mitchell Marcus, Martha Palmer, Ralph Weischedel, and Nianwen Xue. 2011. CoNLL-2011 Shared Task: Modeling Unrestricted Coreference in OntoNotes. In Proceedings of the 15th Conference on Computational Natural Language Learning: Shared Task, pages 1-27.
  • Raghunathan, Kathik, Heeyoung Lee, Sudarshan Rangarajan, Nathanael Chambers, Mihai Surdeanu, Dan Jurafsky, and Christopher Manning. 2010. A multi-pass sieve for coreference resolution. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 492-501.
  • Recasens, Marta and Eduard Hovy. 2009. A deeper look into features for coreference resolution. In Anaphora Processing and Applications. pages 29-42.
  • Recasens, Marta and Eduard Hovy. 2010. Coreference resolution across corpora: Languages, coding schemes, and preprocessing information. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pages 1423-1432.
  • Recasens, Marta and M. Antònia Martí. 2010. AnCora-CO: Coreferentially annotated corpora for Spanish and Catalan. Language Resources and Evaluation, 44.4:315-345.
  • Recasens, Marta, Lluís Màrquez, Emili Sapena, M. Antònia Martí, Mariona Taulé, Véronique Hoste, Massimo Poesio, and Yannick Versley. 2010. SemEval-2010 Task 1: Coreference resolution in multiple languages. In Proceedings of the International Workshop on Semantic Evaluation, pages 1-8.
  • Sapena, Emili, Lluís Padró, and Jordi Turmo. 2013. A Constraint-Based Hypergraph Partitioning Approach to Coreference Resolution. Computational Linguistics, 39(4).
  • Soon, Wee Meng, Hwee Tou Ng, and Daniel Chung Yong Lim. 2001. A machine learning approach to coreference resolution of noun phrases. Computational linguistics, 27(4):521-544.
  • Stoyanov, Veselin and Jason Eisner. 2012. Easy-first coreference resolution. In Proceedings of the International Conference on Computational Linguistics, pages 2519-2534.
  • Vilain, Marc, John Burger, John Aberdeen, Dennis Connolly, and Lynette Hirschman. 1995. A model-theoretic coreference scoring scheme. In Proceedings of Message Understanding Conference 6, pages 45-52.
  • Wu, Fei and Daniel S. Weld. 2010. Open information extraction using Wikipedia. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pages 118-127.