CLC number: TP391
On-line Access: 2015-07-06
Received: 2014-11-27
Revision Accepted: 2015-05-11
Crosschecked: 2015-06-08
Cited: 0
Clicked: 5963
Xiao Ding, Bing Qin, Ting Liu. BUEES: a bottom-up event extraction system[J]. Frontiers of Information Technology & Electronic Engineering, 2015, 16(7): 541-552.
@article{title="BUEES: a bottom-up event extraction system",
author="Xiao Ding, Bing Qin, Ting Liu",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="16",
number="7",
pages="541-552",
year="2015",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1400405"
}
%0 Journal Article
%T BUEES: a bottom-up event extraction system
%A Xiao Ding
%A Bing Qin
%A Ting Liu
%J Frontiers of Information Technology & Electronic Engineering
%V 16
%N 7
%P 541-552
%@ 2095-9184
%D 2015
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1400405
TY - JOUR
T1 - BUEES: a bottom-up event extraction system
A1 - Xiao Ding
A1 - Bing Qin
A1 - Ting Liu
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 16
IS - 7
SP - 541
EP - 552
%@ 2095-9184
Y1 - 2015
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1400405
Abstract: Traditional event extraction systems focus mainly on event type identification and event participant extraction based on pre-specified event type paradigms and manually annotated corpora. However, different domains have different event type paradigms. When transferring to a new domain, we have to build a new event type paradigm and annotate a new corpus from scratch. This kind of conventional event extraction system requires massive human effort, and hence prevents event extraction from being widely applicable. In this paper, we present BUEES, a bottom-up event extraction system, which extracts events from the web in a completely unsupervised way. The system automatically builds an event type paradigm in the input corpus, and then proceeds to extract a large number of instance patterns of these events. Subsequently, the system extracts event arguments according to these patterns. By conducting a series of experiments, we demonstrate the good performance of BUEES and compare it to a state-of-the-art Chinese event extraction system, i.e., a supervised event extraction system. Experimental results show that BUEES performs comparably (5% higher F-measure in event type identification and 3% higher F-measure in event argument extraction), but without any human effort.
The paper is well-motivated and easy to follow. The contribution of the paper includes: 1) The authors propose to extract events by a fully unsupervised Bottom-Up framework, which is an interesting problem. 2) The proposed solution is composed of several steps, including event paradigm construction, event argument extraction, and event type identification. The general framework is sound and reasonable. 3) Extensive evaluation is included to show the effectiveness of the proposed framework. ACE corpus is used, and the experimental results are compared with baseline systems, which make the conclusion convincing.
[1]Ahn, D., 2006. The stages of event extraction. Proc. Workshop on Annotating and Reasoning about Time and Events, p.1-8.
[2]Banko, M., Etzioni, O., 2008. The tradeoffs between open and traditional relation extraction. Proc. Annual Meeting on Association for Computational Linguistics, p.28-36.
[3]Banko, M., Cafarella, M.J., Soderland, S., et al., 2007. Open information extraction for the Web. Proc. 20th Int. Joint Conf. on Artificial Intelligence, p.2670-2676.
[4]Barzilay, R., McKeown, K.R., 2001. Extracting paraphrases from a parallel corpus. Proc. 39th Annual Meeting on Association for Computational Linguistics, p.50-57.
[5]Chambers, N., Jurafsky, D., 2009. Unsupervised learning of narrative schemas and their participants. Proc. 47th Annual Meeting on Association for Computational Linguistics and 4th Int. Joint Conf. on Natural Language Processing, p.602-610.
[6]Chambers, N., Jurafsky, D., 2011. Template-based information extraction without the templates. Proc. 49th Annual Meeting on Association for Computational Linguistics, p.976-986.
[7]Che, W., Li, Z., Li, Y., et al., 2009. Multilingual dependency-based syntactic and semantic parsing. Proc. 13th Conf. on Computational Natural Language Learning, p.49-54.
[8]Chen, Z., Ji, H., 2009. Language specific issue and feature exploration in Chinese event extraction. Proc. Annual Conf. on Association for Computational Linguistics, p.209-212.
[9]Chinchor, N., Lewis, D.D., Hirschman, L., 1993. Evaluating message understanding systems: an analysis of the third message understanding conference (MUC-3). Comput. Ling., 19(3):409-449.
[10]Ding, X., Song, F., Qin, B., et al., 2011. Research on typical event extraction method in the field of music. J. Chin. Inform. Process., 25(2):15-20 (in Chinese).
[11]Ding, X., Qin, B., Liu, T., 2013. Building Chinese event type paradigm based on trigger clustering. Proc. Int. Joint Conf. on Natural Language Processing, p.311-319.
[12]Dong, Z., Dong, Q., 2006. HowNet and the Computation of Meaning. World Scientific Publishing Company, USA.
[13]Etzioni, O., Fader, A., Christensen, J., et al., 2011. Open information extraction: the second generation. Proc. 22nd Int. Joint Conf. on Artificial Intelligence, p.3-10.
[14]Fader, A., Soderland, S., Etzioni, O., 2011. Identifying relations for open information extraction. Proc. Conf. on Empirical Methods in Natural Language Processing, p.1535-1545.
[15]Friedman, J.H., Bentley, J.L., Finkel, R.A., 1977. An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw., 3(3):209-226.
[16]Grishman, R., 1997. Information extraction: techniques and challenges. In: Pazienza, M.T. (Ed.), Information Extraction: a Multidisciplinary Approach to an Emerging Information Technology. Springer Berlin Heidelberg, New York, USA, p.10-27.
[17]Grishman, R., 2001. Adaptive information extraction and sublanguage analysis. Int. Joint Conf. on Artificial Itelligence, Workshop on Adaptive Text Extraction and Mining.
[18]Halkidi, M., Batistakis, Y., Vazirgiannis, M., 2001. On clustering validation techniques. J. Intell. Inform. Syst., 17(2-3):107-145.
[19]Hasegawa, T., Sekine, S., Grishman, R., 2004. Discovering relations among named entities from large corpora. Proc. 42nd Annual Meeting on Association for Computational Linguistics, Article 415.
[20]Hirschberg, D.S., 1977. Algorithms for the longest common subsequence problem. J. ACM, 24(4):664-675.
[21]Hong, Y., Zhang, J., Ma, B., et al., 2011. Using cross-entity inference to improve event extraction. Proc. 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, p.1127-1136.
[22]Ibrahim, A., Katz, B., Lin, J., 2003. Extracting structural paraphrases from aligned monolingual corpora. Proc. 2nd Int. Workshop on Paraphrasing, p.57-64.
[23]Ji, H., Grishman, R., 2008. Refining event extraction through cross-document inference. Proc. Association for Computational Linguistics, p.254-262.
[24]Lee, C.S., Chen, Y.J., Jian, Z.W., 2003. Ontology-based fuzzy event extraction agent for Chinese e-news summarization. Expert Syst. Appl., 25(3):431-447.
[25]Liao, S., Grishman, R., 2010. Filtered ranking for bootstrapping in event extraction. Proc. 23rd Int. Conf. on Computational Linguistics, p.680-688.
[26]Lin, D., Pantel, P., 2001. DIRT@SBT@discovery of inference rules from text. Proc. 7th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.323-328.
[27]Liu, T., Ma, J., Zhang, H., et al., 2007. Subdividing verbs to improve syntactic parsing. J. Electron. (China), 24(3):347-352 (in Chinese).
[28]Mei, J.J., Zhu, Y.M., Gao, Y.Q., et al., 1983. Dictionary of Synonymous Words. Shanghai Dictionary Publishing Press, Shanghai, China (in Chinese).
[29]Miller, S., Guinness, J., Zamanian, A., 2004. Name tagging with word clusters and discriminative training. Proc. Conf. of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, p.337-342.
[30]Miwa, M., Sætre, R., Kim, J.D., et al., 2010. Event extraction with complex event classification using rich features. J. Bioinform. Comput. Biol., 8(1):131-146.
[31]Pang, B., Knight, K., Marcu, D., 2003. Syntax-based alignment of multiple translations: extracting paraphrases and generating new sentences. Proc. Conf. of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, p.102-109.
[32]Patwardhan, S., Riloff, E., 2006. Learning domain-specific information extraction patterns from the Web. Proc. Workshop on Information Extraction Beyond the Document, p.66-73.
[33]Pham, X., Le, M., Ho, B., 2013. A hybrid approach for biomedical event extraction. Proc. Association for Computational Linguistics, p.121-124.
[34]Poon, H., Domingos, P., 2008. Joint unsupervised coreference resolution with Markov logic. Proc. Conf. on Empirical Methods in Natural Language Processing, p.650-659.
[35]Poon, H., Domingos, P., 2009. Unsupervised semantic parsing. Proc. Conf. on Empirical Methods in Natural Language Processing, p.1-10.
[36]Riloff, E., 1996. Automatically generating extraction patterns from untagged text. Proc. AAAI, p.1044-1049.
[37]Ritter, A., Mausam, Etzioni, O., et al., 2012. Open domain event extraction from Twitter. Proc. 18th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.1104-1112.
[38]Rosenfeld, B., Feldman, R., 2006. URES: an unsupervised web relation extraction system. Proc. COLING/ACL on Main Conference Poster Sessions, p.667-674.
[39]Schilder, F., 2007. Event extraction and temporal reasoning in legal documents. In: Schilder, F., Katz, G., Pustejovsky, J. (Eds.), Annotating, Extracting and Reasoning about Time and Events, p.55-71.
[40]Shinyama, Y., Sekine, S., 2006. Preemptive information extraction using unrestricted relation discovery. Proc. Conf. of the North American Chapter of the Association of Computational Linguistics on Human Language Technology, p.304-311.
[41]Soderland, S., 1999. Learning information extraction rules for semi-structured and free text. Mach. Learn., 34(1-3):233-272.
[42]Stevenson, M., Greenwood, M.A., 2005. A semantic approach to IE pattern induction. Proc. 43rd Annual Meeting on Association for Computational Linguistics, p.379-386.
[43]Sudo, K., Sekine, S., Grishman, R., 2003. An improved extraction pattern representation model for automatic IE pattern acquisition. Proc. 41st Annual Meeting on Association for Computational Linguistics, p.224-231.
[44]Wagner, W., Schmid, H., im Walde, S.S., 2009. Verb sense disambiguation using a predicate-argument-clustering model. Proc. CogSci Workshop on Distributional Semantics Beyond Concrete Concepts, p.23-28.
[45]Wu, F., Weld, D.S., 2010. Open information extraction using Wikipedia. Proc. 48th Annual Meeting of the Association for Computational Linguistics, p.118-127.
[46]Yangarber, R., Grishman, R., Tapanainen, P., et al., 2000. Automatic acquisition of domain knowledge for information extraction. Proc. 18th Conf. on Computational Linguistics, p.940-946.
[47]Yates, A., Etzioni, O., 2009. Unsupervised methods for determining object and relation synonyms on the web. J. Artif. Intell. Res., 34(1):255-296.
[48]Yeh, A., Hirschman, L., Morgan, A., 2002. Background and overview for KDD Cup 2002 task 1: information extraction from biomedical articles. ACM SIGKDD Explor. Newslett., 4(2):87-89.
Open peer comments: Debate/Discuss/Question/Opinion
<1>