CLC number: TP311
On-line Access: 2012-04-07
Received: 2011-08-10
Revision Accepted: 2012-01-20
Crosschecked: 2012-02-27
Cited: 6
Clicked: 8571
Zhi-chun Wang, Zhi-gang Wang, Juan-zi Li, Jeff Z. Pan. Knowledge extraction from Chinese wiki encyclopedias[J]. Journal of Zhejiang University Science C, 2012, 13(4): 268-280.
@article{title="Knowledge extraction from Chinese wiki encyclopedias",
author="Zhi-chun Wang, Zhi-gang Wang, Juan-zi Li, Jeff Z. Pan",
journal="Journal of Zhejiang University Science C",
volume="13",
number="4",
pages="268-280",
year="2012",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.C1101008"
}
%0 Journal Article
%T Knowledge extraction from Chinese wiki encyclopedias
%A Zhi-chun Wang
%A Zhi-gang Wang
%A Juan-zi Li
%A Jeff Z. Pan
%J Journal of Zhejiang University SCIENCE C
%V 13
%N 4
%P 268-280
%@ 1869-1951
%D 2012
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.C1101008
TY - JOUR
T1 - Knowledge extraction from Chinese wiki encyclopedias
A1 - Zhi-chun Wang
A1 - Zhi-gang Wang
A1 - Juan-zi Li
A1 - Jeff Z. Pan
J0 - Journal of Zhejiang University Science C
VL - 13
IS - 4
SP - 268
EP - 280
%@ 1869-1951
Y1 - 2012
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.C1101008
Abstract: The vision of the semantic Web is to build a ‘Web of data’ that enables machines to understand the semantics of information on the Web. The Linked Open Data (LOD) project encourages people and organizations to publish various open data sets as Resource Description Framework (RDF) on the Web, which promotes the development of the semantic Web. Among various LOD datasets, DBpedia has proved a successful structured knowledge base, and has become the central interlinking-hub of the Web of data in English. However, in the Chinese language, there is little linked Data published and linked to DBpedia. This hinders the structured knowledge sharing of both Chinese and cross-lingual resources. This paper deals with an approach for building a large-scale Chinese structured knowledge base from Chinese wiki resources, including Hudong and Baidu Baike. The proposed approach first builds an ontology based on the wiki category system and infoboxes, and then extracts instances from wiki articles. Using Hudong as our source, our approach builds an ontology containing 19 542 concepts and 2381 properties. 802 593 instances are extracted and described using the concepts and properties in the extracted ontology and 62 679 of them are linked to equivalent instances in DBpedia. As from Baidu Baike, our approach builds an ontology containing 299 concepts, 37 object properties, and 5590 data type properties. 1 319 703 instances are extracted from Baidu Baike, and 84 343 of them are linked to instances in DBpedia. We provide RDF dumps and SPARQL endpoint to access the established Chinese knowledge bases. The knowledge bases built using our approach can be used not only in Chinese linked Data building, but also in many useful applications of large-scale knowledge bases, such as question-answering and semantic search.
[1]Auer, S.R., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z., 2007. DBpedia: a Nucleus for a Web of Open Data. Proc. 6th Int. Semantic Web Conf. and 2nd Asian Semantic Web Conf., p.722-735.
[2]Berners-Lee, T., 1998. Semantic Web Road Map. Available from http://www.w3.org/DesignIssues/Semantic.html
[3]Bizer, C., Lehmann, J., Kobilarov, G., Auer, S.R., Becker, C., Cyganiak, R., Hellmann, S., 2009a. DBpedia—a crystallization point for the Web of data. Web Semant., 7(3):154-165.
[4]Bizer, C., Heath, T., Berners-Lee, T., 2009b. Linked data—the story so far. Int. J. Semant. Web Inform. Syst., 5(3):1-22.
[5]Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J., 2008. Freebase: a Collaboratively Created Graph Database for Structuring Human Knowledge. Proc. ACM SIGMOD Int. Conf. on Management of Data, p.1247-1250.
[6]Buitelaar, P., Cimiano, P., 2008. Ontology Learning and Population: Bridging the Gap Between Text and Knowledge. Frontiers in Artificial Intelligence and Applications, 167:45-69.
[7]Buitelaar, P., Cimiano, P., Magnini, B., 2005. Ontology Learning from Text: Methods, Evaluation and Applications. IOS Press, Amsterdam.
[8]Euzenat, J., Shvaiko, P., 2007. Ontology Matching. Springer-Verlag, Heidelberg (DE).
[9]Fellbaum, C., 1998. WordNet: an Electronic Lexical Database. MIT Press, Cambridge, MA.
[10]García-Silva, A., Szomszor, Y., Alani, M.Y., Corcho, Ó.H.Y., 2009. Preliminary Results in Tag Disambiguation Using DBpedia. 1st Int. Workshop Collective Knowledge Capturing and Representation, p.41-44.
[11]Heath, T., Bizer, C., 2011. Linked data: evolving the Web into a global data space. Synth. Lect. Semant. Web Theory Technol., 1(1):1-136.
[12]Kasneci, G., Ramanath, M., Suchanek, F., Weikum, G., 2008. The YAGO-NAGA approach to knowledge discovery. SIGMOD Rec., 37(4):41-47.
[13]Lenat, D.B., 1995. CYC: a large-scale investment in knowledge infrastructure. ACM Commun., 38(11):33-38.
[14]Maedche, A., Staab, S., 2001. Ontology learning for the Semantic Web. IEEE Intell. Syst., 16(2):72-79.
[15]Matuszek, C., Cabral, J., Witbrock, M., Deoliveira, J., 2006. An Introduction to the Syntax and Content of Cyc. AAAI Spring Symp., p.44-49.
[16]Melo, G.D., Weikum, G., 2010. MENTA: Inducing Multilingual Taxonomies from Wikipedia. Proc. 19th ACM Int. Conf. on Information and Knowledge Management, p.1099-1108.
[17]Navigli, R., Velardi, P., 2004. Learning domain ontologies from document warehouses and dedicated Web sites. Comput. Ling., 30(2):151-179.
[18]Navigli, R., Velardi, P., Gangemi, A., 2003. Ontology learning and its application to automated terminology translation. IEEE Intell. Syst. Their Appl., 18(1):22-31.
[19]Niles, I., Pease, A., 2001. Towards a Standard Upper Ontology. Proc. Int. Conf. on Formal Ontology in Information Systems, p.2-9.
[20]Niu, X., Sun, X., Wang, H., Rong, S., Qi, G., Yu, Y., 2011. Zhishi.me—Weaving Chinese Linking Open Data. Proc. 10th Int. Conf. on the Semantic Web, 2:205-220.
[21]Passant, A., 2010. dbrec—Music Recommendations Using DBpedia. Proc. 9th Int. Semantic Web Conf., 2:209-224.
[22]Pease, A., Niles, I., 2002. IEEE standard upper ontology: a progress report. Knowl. Eng. Rev., 17(1):65-70.
[23]Piek, V., 1997. EuroWordNet: a Multilingual Database for Information Retrieval. Proc. Delos Workshop on Cross-Language Information Retrieval, p.5-7.
[24]Ponzetto, S.P., Strube, M., 2007. Deriving a Large Scale Taxonomy from Wikipedia. Proc. 22nd National Conf. on Artificial Intelligence, 2:1440-1445.
[25]Shadbolt, N., Berners-Lee, T., Hall, W., 2006. The Semantic Web revisited. IEEE Intell. Syst. Their Appl., 21(3):96-101.
[26]Suchanek, F.M., Kasneci, G., Weikum, G., 2007. Yago: a Core of Semantic Knowledge. Proc. 16th Int. Conf. on World Wide Web, p.697-706.
[27]Suchanek, F.M., Kasneci, G., Weikum, G., 2008. YAGO: a large ontology from Wikipedia and WordNet. Web Semant., 6(3):203-217.
[28]Vossen, P., 1998. Introduction to EuroWordNet. Comput. Human., 32(2/3):73-89.
[29]Wu, F., Weld, D.S., 2007. Autonomously Semantifying Wikipedia. Proc. 16th ACM Conf. on Information and Knowledge Management, p.41-50.
[30]Wu, F., Weld, D.S., 2008. Automatically Refining the Wikipedia Infobox Ontology. Proc. 17th Int. Conf. on World Wide Web, p.635-644.
Open peer comments: Debate/Discuss/Question/Opinion
<1>