Full Text:   <417>

Summary:  <155>

CLC number: TP312

On-line Access: 2018-08-06

Received: 2017-08-04

Revision Accepted: 2017-12-03

Crosschecked: 2018-06-08

Cited: 0

Clicked: 1070

Citations:  Bibtex RefMan EndNote GB/T7714

 ORCID:

Rabia Irfan

http://orcid.org/0000-0002-7789-5338

-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering  2018 Vol.19 No.6 P.763-782

10.1631/FITEE.1700517


TIE algorithm: a layer over clustering-based taxonomy generation for handling evolving data


Author(s):  Rabia Irfan, Sharifullah Khan, Kashif Rajpoot, Ali Mustafa Qamar

Affiliation(s):  School of Electrical Engineering and Computer Science, National University of Sciences and Technology, Islamabad 44000, Pakistan; more

Corresponding email(s):   12phdrirfan@seecs.edu.pk

Key Words:  Taxonomy, Clustering algorithms, Information science, Knowledge management, Machine learning


Rabia Irfan, Sharifullah Khan, Kashif Rajpoot, Ali Mustafa Qamar. TIE algorithm: a layer over clustering-based taxonomy generation for handling evolving data[J]. Frontiers of Information Technology & Electronic Engineering, 2018, 19(6): 763-782.

@article{title="TIE algorithm: a layer over clustering-based taxonomy generation for handling evolving data",
author="Rabia Irfan, Sharifullah Khan, Kashif Rajpoot, Ali Mustafa Qamar",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="19",
number="6",
pages="763-782",
year="2018",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1700517"
}

%0 Journal Article
%T TIE algorithm: a layer over clustering-based taxonomy generation for handling evolving data
%A Rabia Irfan
%A Sharifullah Khan
%A Kashif Rajpoot
%A Ali Mustafa Qamar
%J Frontiers of Information Technology & Electronic Engineering
%V 19
%N 6
%P 763-782
%@ 2095-9184
%D 2018
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1700517

TY - JOUR
T1 - TIE algorithm: a layer over clustering-based taxonomy generation for handling evolving data
A1 - Rabia Irfan
A1 - Sharifullah Khan
A1 - Kashif Rajpoot
A1 - Ali Mustafa Qamar
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 19
IS - 6
SP - 763
EP - 782
%@ 2095-9184
Y1 - 2018
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1700517


Abstract: 
taxonomy is generated to effectively organize and access large volume of data. A taxonomy is a way of representing concepts that exist in data. It needs to continuously evolve to reflect changes in data. Existing automatic taxonomy generation techniques do not handle the evolution of data; therefore, the generated taxonomies do not truly represent the data. The evolution of data can be handled by either regenerating taxonomy from scratch, or allowing taxonomy to incrementally evolve whenever changes occur in the data. The former approach is not economical in terms of time and resources. A taxonomy incremental evolution (TIE) algorithm, as proposed, is a novel attempt to handle the data that evolve in time. It serves as a layer over an existing clustering-based taxonomy generation technique and allows an existing taxonomy to incrementally evolve. The algorithm was evaluated in research articles selected from the computing domain. It was found that the taxonomy using the algorithm that evolved with data needed considerably shorter time, and had better quality per unit time as compared to the taxonomy regenerated from scratch.

TIE算法:一种用于处理演化数据的聚类分层分类法生成技术上层算法

概要:分类法可实现对大量数据的有效组织和访问。分类法是表示数据概念的一种方法,其需要通过不断演进来反映数据变化。现有分类法自动生成技术无法处理数据演化,因此,所生成的分类法不能真实反映数据。为反映数据演变,可从头对分类法进行再生,或根据数据变化随时对分类法进行增量演进。其中,前者的时间和资源成本较高。提出一种新颖的分类增量进化(TIE)算法,用于处理随时间演变的数据。TIE是一种现有聚类分层分类法生成技术的上层算法,它允许现有分类法增量地演进。在计算机领域的研究论文中对该算法进行了评估。结果表明,与从头再生分类法相比,随数据演化的分类法生成算法耗时非常短,且在单位时间下性能更佳。

关键词:分类法;聚类算法;信息科学;知识管理;机器学习

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Baeza-Yates R, Ribeiro-Neto B, 2011. Modern Information Retrieval: the Concepts and Technology Behind (2nd Ed.). Pearson Education Limited, New York, USA.

[2]Blumberg R, Atre S, 2003. The problem with unstructured data. DM Rev, 13(2):42-46.

[3]Camiña SL, 2010. A Comparison of Taxonomy Generation Techniques Using Bibliometric Methods: Applied to Research Strategy Formulation. MS Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA.

[4]Carmel D, Roitman H, Zwerdling N, 2009. Enhancing cluster labeling using Wikipedia. Proc 32nd Int ACM SIGIR Conf on Research and Development in Information Retrieval, p.139-146.

[5]Cha SH, 2007. Comprehensive survey on distance/similarity measures between probability density functions. Int J Math Models Methods Appl Sci, 1(4):300-307.

[6]Cimiano P, Hotho A, Staab S, 2005. Learning concept hierarchies from text corpora using formal concept analysis. J Artif Intell Res, 24(1):305-339.

[7]Dawelbait G, Mezher T, Woon WL, et al., 2010. Taxonomy based trend discovery of renewable energy technologies in desalination and power generation. Proc Technology Management for Global Economic Growth, p.1-8.

[8]Deerwester S, Dumais ST, Furnas GW, et al., 1990. Indexing by latent semantic analysis. J Am Soc Inform Sci Technol, 41(6):391-407.

[9]Dietz EA, Vandic D, Frasincar F, 2012. TaxoLearn: a semantic approach to domain taxonomy learning. Proc IEEE/ WIC/ACM Int Conf on Web Intelligence and Intelligent Agent Technology, p.58-65.

[10]Enhanced Taxonomy Generation. USA Patent 20 100 274 733.

[11]Fountain T, Lapata M, 2012. Taxonomy induction using hierarchical random graphs. Proc Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, p.466-476.

[12]Glover E, Pennock DM, Lawrence S, et al., 2002. Inferring hierarchical descriptions. Proc 11th Int Conf on Information and Knowledge Management, p.507-514.

[13]Hedden H, 2010. The Accidental Taxonomist. Information Today Inc., Medford, New Jersey, USA, p.18-28.

[14]Irfan R, Khan S, 2016. TIE: an algorithm for incrementally evolving taxonomy for text data. Proc 15th IEEE Int Conf on Machine Learning and Applications, p.687-692.

[15]Jain AK, Murty MN, Flynn PJ, 1999. Data clustering: a review. ACM Comput Surv, 31(3):264-323.

[16]Kashyap V, Ramakrishnan C, Thomas C, et al., 2005. TaxaMiner: an experimentation framework for automated taxonomy bootstrapping. Int J Web Grid Serv, 1(2): 240-266.

[17]Koff W, Gustafson P, 2011. Data Revolution. Technical Report, Computer Sciences Corporation Leading Edge Forum.

[18]Kumar AA, Chandrasekhar S, 2012. Text data pre-processing and dimensionality reduction techniques for document clustering. Int J Eng Res Technol, 1(5):1-6.

[19]Lefever E, 2015. LT3: a multi-modular approach to automatic taxonomy construction. Proc 9th Int Workshop on Semantic Evaluation, p.944-948.

[20]Li T, Anand SS, 2009. Exploiting domain knowledge by automated taxonomy generation in recommender systems. Proc 10th Int Conf on E-commerce and Web Technologies, p.120-131.

[21]Manning CD, Raghavan P, Schütze H, 2008. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA.

[22]Marcacini RM, Rezende SO, 2010. Incremental construction of topic hierarchies using hierarchical term clustering. Proc 22nd Int Conf on Software Engineering and Knowledge Engineering, p.553-558.

[23]Medelyan O, Manion S, Broekstra J, et al., 2013. Constructing a focused taxonomy from a document collection. Proc 10th Int Conf on the Semantic Web: Semantics and Big Data, p.367-381.

[24]Meijer K, Frasincar F, Hogenboom F, 2014. A semantic approach for extracting domain taxonomies from text. Dec Support Syst, 62:78-93.

[25]Muller A, Dorre J, Gerstl P, et al., 1999. The TaxGen framework: automating the generation of a taxonomy for a large document collection. Proc 32nd Annual Hawaii Int Conf on Systems Sciences, Article 2034.

[26]Nadkarni PM, Ohno-Machado L, Chapman WW, 2011. Natural language processing: an introduction. J Am Med Inform Assoc, 18(5):544-551.

[27]Neshati M, Alijamaat A, Abolhassani H, et al., 2007. Taxonomy learning using compound similarity measure. Proc IEEE/WIC/ACM Int Conf on Web Intelligence, p.487-490.

[28]Paukkeri MS, García-Plaza AP, Fresno V, et al., 2012. Learning a taxonomy from a set of text documents. Appl Soft Comput, 12(3):1138-1148.

[29]Qi XG, Yin DW, Xue ZZ, et al., 2010. Choosing your own adventure: automatic taxonomy generation to permit many paths. Proc 19th ACM Int Conf on Information and Knowledge Management, p.1853-1856.

[30]Sánchez D, Moreno A, 2004. Automatic generation of taxonomies from the WWW. Proc 5th Int Conf on Practical Aspects of Knowledge Management, p.208-219.

[31]Sclano F, Velardi P, 2007. TermExtractor: a web application to learn the common terminology of interest groups and research communities. Proc 3rd Int Conf on Interoperability for Enterprise Software and Applications p.85-94.

[32]Spangler WS, Kreulen JT, Newswanger JF, 2006. Machines in the conversation: detecting themes and trends in informal communication streams. IBM Syst J, 45(4):785-799.

[33]Steinbach M, Karypis G, Kumar V, 2000. A comparison of document clustering techniques. World Text Mining Conf, p.1-2.

[34]Sujatha R, Krishna Rao BR, 2011. Taxonomy construction techniques—issues and challenges. Ind J Comput Sci Eng, 2(5):661-671.

[35]Thada V, Jaglan DV, 2013. Comparison of jaccard, dice, cosine similarity coefficient to find best fitness value for Web retrieved documents using genetic algorithm. Int J Innov Eng Technol, 2(4):202-205.

[36]Treeratpituk P, Callan J, 2006. Automatically labeling hierarchical clusters. Proc Int Conf on Digital Government Research, p.167-176.

[37]Turner V, Gantz J, Reinsel D, 2014. The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things. IDC White Paper, p.1-5.

[38]Velardi P, Faralli S, Navigli R, 2013. OntoLearn reloaded: a graph-based algorithm for taxonomy induction. Comput Ling, 39(3):665-707.

[39]Weng SS, Liu CK, 2004. Using text classification and multiple concepts to answer e-mails. Expert Syst Appl, 26(4): 529-543.

[40]Yang HC, Lee CH, Hsiao HW, 2015. Incorporating self-organizing map with text mining techniques for text hierarchy generation. Appl Soft Comput, 34:251-259.

[41]Yao JJ, Cui B, Cong G, et al., 2012. Evolutionary taxonomy construction from dynamic tag space. World Wide Web, 15(5-6):581-602.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - Journal of Zhejiang University-SCIENCE