Full Text:   <1077>

Summary:  <415>

CLC number: TP311

On-line Access: 2016-02-02

Received: 2015-06-11

Revision Accepted: 2015-09-11

Crosschecked: 2015-12-09

Cited: 2

Clicked: 1961

Citations:  Bibtex RefMan EndNote GB/T7714

 ORCID:

Hui-zong Li

http://orcid.org/0000-0002-1459-989X

-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering  2016 Vol.17 No.2 P.122-134

http://doi.org/10.1631/FITEE.1500187


A social tag clustering method based on common co-occurrence group similarity


Author(s):  Hui-zong Li, Xue-gang Hu, Yao-jin Lin, Wei He, Jian-han Pan

Affiliation(s):  1School of Computer and Information, Hefei University of Technology, Hefei 230009, China; more

Corresponding email(s):   lihz_aust@sina.com, jsjxhuxg@hfut.edu.cn, yjlin@mnnu.edu.cn, peter.jhpan@gmail.com

Key Words:  Social tagging systems, Tag co-occurrence, Spectral clustering, Group similarity


Hui-zong Li, Xue-gang Hu, Yao-jin Lin, Wei He, Jian-han Pan. A social tag clustering method based on common co-occurrence group similarity[J]. Frontiers of Information Technology & Electronic Engineering, 2016, 17(2): 122-134.

@article{title="A social tag clustering method based on common co-occurrence group similarity",
author="Hui-zong Li, Xue-gang Hu, Yao-jin Lin, Wei He, Jian-han Pan",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="17",
number="2",
pages="122-134",
year="2016",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1500187"
}

%0 Journal Article
%T A social tag clustering method based on common co-occurrence group similarity
%A Hui-zong Li
%A Xue-gang Hu
%A Yao-jin Lin
%A Wei He
%A Jian-han Pan
%J Frontiers of Information Technology & Electronic Engineering
%V 17
%N 2
%P 122-134
%@ 2095-9184
%D 2016
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1500187

TY - JOUR
T1 - A social tag clustering method based on common co-occurrence group similarity
A1 - Hui-zong Li
A1 - Xue-gang Hu
A1 - Yao-jin Lin
A1 - Wei He
A1 - Jian-han Pan
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 17
IS - 2
SP - 122
EP - 134
%@ 2095-9184
Y1 - 2016
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1500187


Abstract: 
social tagging systems are widely applied in Web 2.0. Many users use these systems to create, organize, manage, and share Internet resources freely. However, many ambiguous and uncontrolled tags produced by social tagging systems not only worsen users’ experience, but also restrict resources’ retrieval efficiency. Tag clustering can aggregate tags with similar semantics together, and help mitigate the above problems. In this paper, we first present a common co-occurrence group similarity based approach, which employs the ternary relation among users, resources, and tags to measure the semantic relevance between tags. Then we propose a spectral clustering method to address the high dimensionality and sparsity of the annotating data. Finally, experimental results show that the proposed method is useful and efficient.

The introduction of the paper is well presented. The state of the art section is well done, indicating the recent research in the area and following chronological order. In the presentation of the methodology the authors begin by describing the notation used to represent the model of social tagging system, as well as the status of co-occurrences between tags (co-occur for the same resource tags; for a single user, or for a same user-feature combination). The authors used examples to explain this part. In analyzing the results, the authors used two more geared metrics for clustering (Silhouette coefficient and Dunn index), according to the authors, rather than precision and recall. The results were compared with other four approaches adopted in state of the art. The algorithm was implemented in Matlab, and based on the metric previously proposed. The results obtained are satisfactory.

基于共同共现群体相似度的社会化标签聚类方法

目的:社会化标注系统产生了大量具有歧义和非受控的标签,降低了用户体验也限制了资源检索效率。标签聚类能够将具有相似语义的标签聚集在一起,从而缓解上述问题。现有的社会化标签聚类方法基本上从“资源-标签”的二元关系测量标签相似度,并使用K-means和层次聚类等算法实现标签的聚类,容易引起高维、稀疏和标签语义丢失等问题。本文提出一种基于共同共现群体的标签相似度测量方法,利用谱聚类算法实现标签聚类。
创新点:对社会化标注系统中的三元标注关系进行分析,总结出三元关系中最能保持语义关系的标签共现形式。在分析标签个体共现相似度的基础上,利用群体思想,提出标签的共同共现群体相似度,从全局角度精准地刻画标签的语义相似性,并提出一种基于共同共现群体相似度的社会化标签谱聚类方法。
方法:利用共同共现群体相似度来计算两两标签的相似度,建立相似度矩阵(公式(4))。使用谱聚类算法实验标签的聚类,首先使用拉普拉斯(Laplacian)变换对相似度矩阵进行规范化,建立标签的规范化拉普拉斯(Normalized Laplacian)矩阵,然后计算该矩阵的前k个特征值及其对应的特征向量,并将这k个特征向量组成新的特征空间,在此空间上用K-means算法将标签聚成k个类簇(算法1)。
结论:利用内部评价指标SC和Dunn对本文提出的标签聚类方法和其它传统的标签聚类方法进行实验对比。得出基于共同共现群体相似度的标签谱聚类方法在SC和Dunn这两个指标上的值均优于其它传统标签聚类方法;基于共同共现群体相似度的标签谱聚类方法能够获取较好的聚类结果。

关键词:社会化标注系统;标签共现;谱聚类;群体相似度

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Begelman, G., Keller, P., Smadja, F., 2006. Automated tag clustering: improving search and exploration in the tag space. Proc. 15th Int. World Wide Web Conf., p.15-33.

[2]Bischoff, K., Firan, C.S., Nejdl, W., et al., 2008. Can all tags be used for search? Proc. 17th ACM Conf. on Information and Knowledge Management, p.193-202.

[3]Cui, J.W., Liu, H.Y., He, J., et al., 2011. TagClus: a random walk-based method for tag clustering. Knowl. Inform. Syst., 27(2):193-225.

[4]Cuzzocrea, A., 2006. Combining multidimensional user models and knowledge representation and management techniques for making web services knowledge-aware. Web Intell. Agent Syst., 4(3):289-312.

[5]Cuzzocrea, A., Mastroianni, C., 2003. A reference architecture for knowledge management-based web systems. Proc. 4th Int. Conf. on Web Information Systems Engineering, p.347-351.

[6]Dattolo, A., Eynard, D., Mazzola, L., 2011. An integrated approach to discover tag semantics. Proc. ACM Symp. on Applied Computing, p.814-820.

[7]Deutsch, S., Schrammel, J., Tscheligi, M., 2011. Comparing different layouts of tag clouds: findings on visual perception. Human Aspects Visual., 6431:23-37.

[8]Dunn, J.C., 1974. Well-separated clusters and optimal fuzzy-partitions. J. Cybern., 4(1):95-104.

[9]Furnas, G.W., Fake, C., von Ahn, L., et al., 2006. Why do tagging systems work? Proc. Extended Abstracts on Human Factors in Computing Systems, p.36-39.

[10]Gemmell, J., Shepitsen, A., Mobasher, B., et al., 2008. Personalizing navigation in folksonomies using hierarchical tag clustering. Proc. 10th Int. Conf. on Data Warehousing and Knowledge Discovery, p.196-205.

[11]Gu, M., Zha, H., Ding, C., et al., 2001. Spectral relaxation models and structure analysis for k-way graph clustering and bi-clustering. Available from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.10.2657 [Accessed on Apr. 5, 2015].

[12]Heymann, P., Garcia-Molina, H., 2006. Collaborative creation of communal hierarchical taxonomies in social tagging systems. Technical Report, No. 2006-10, Stanford University, USA.

[13]Isabella, P., 2009. Folksonomies. Indexing and Retrieval in Web 2.0. Walter de Gruyter, Berlin.

[14]Jiang, J.J., Conrath, D.W., 1997. Semantic similarity based on corpus statistics and lexical taxonomy. Proc. Int. Conf. of Research on Computational Linguistics, p.1-15.

[15]Kaufman, L., Rousseeuw, P.J., 2008. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, London, UK.

[16]Knautz, K., Soubusta, S., Stock, W.G., 2010. Tag clusters as information retrieval interfaces. Proc. 43rd Hawaii Int. Conf. on System Sciences, p.1-10.

[17]Laniado, D., Eynard, D., Colombetti, M., 2007. Using WordNet to turn a folksonomy into a hierarchy of concepts. Proc. 4th Italian Semantic Web Workshop on Semantic Web Application and Perspectives, p.192-201.

[18]Lehwark, P., Risi, S., Ultsch, A., 2008. Visualization and clustering of tagged music data. Proc. 31st Annual Conf. on Data Analysis, Machine Learning and Applications, p.673-680.

[19]Markines, B., Cattuto, C., Menczer, F., et al., 2009. Evaluating similarity measures for emergent semantics of social tagging. Proc. 18th Int. Conf. on World Wide Web, p.641-650.

[20]Marlow, C., Naaman, M., Boyd, D., et al., 2006. HT06, tagging paper, taxonomy, Flickr, academic article, to read. Proc. 17th Conf. on Hypertext and Hypermedia, p.31-40.

[21]Mathes, A., 2004. Folksonomies—cooperative classification and communication through shared metadata. Available from http://www.adammathes.com/academic/computer-mediated-communication/folksonomies.html [Accessed on Apr. 5, 2015].

[22]Michlmayr, E., Cayzer, S., 2007. Learning user profiles from tagging data and leveraging them for personal(ized) information access. Proc. 16th Int. World Wide Web Conf., p.1-7.

[23]Ng, A.Y., Jordan, M.I., Weiss, Y., 2002. On spectral clustering: analysis and an algorithm. Proc. Conf. Advances in Neural Information Processing Systems, p.849-856.

[24]Noll, M.G., Meinel, C., 2007. Web search personalization via social bookmarking and tagging. Proc. 6th Int. Semantic Web Conf. and 2nd Asian Semantic Web Conf. on the Semantic Web, p.367-380.

[25]Noruzi, A., 2006. Folksonomies: (un)controlled vocabulary? Knowl. Organ., 33(4):199-203.

[26]Rivadeneira, A.W., Gruen, D.M., Muller, M.J., et al., 2007. Getting our head in the clouds: toward evaluation studies of tagclouds. Proc. SIGCHI Conf. on Human Factors in Computing Systems, p.995-998.

[27]Salton, G., 1983. Introduction to Modern Information Retrieval. McGraw-Hill College, New York, USA.

[28]Shepitsen, A., Gemmell, J., Mobasher, B., et al., 2008. Personalized recommendation in social tagging systems using hierarchical clustering. Proc. ACM Conf. on Recommender Systems, p.259-266.

[29]Shi, J., Malik, J., 2000. Normalized cuts and image segmentation. IEEE Trans. Patt. Anal. Mach. Intell., 22(8):888-905.

[30]Shirky, C., 2004. Folksonomy. Available from http://www.corante.com/many/archives/2004/08/25/-folksonomy.php [Accessed on Apr. 5, 2015].

[31]Simpson, E., 2008. Clustering tags in enterprise and web folksonomies. Proc. Int. Conf. on Weblogs and Social Media, p.222-223.

[32]Suchanek, F.M., Vojnovic, M., Gunawardena, D., 2008. Social tags: meaning and suggestions. Proc. 17th ACM Conf. on Information and Knowledge Management, p.223-232.

[33]Szomszor, M., Cattuto, C., Alani, H., et al., 2007. Folksonomies, the Semantic Web, and Movie Recommendation. Proc. 4th European Semantic Web Conf., p.71-84.

[34]Van Damme, C., Hepp, M., Siorpaes, K., 2007. Folksontology: an integrated approach for turning folksonomies into ontologies. Proc. Workshop on Bridging the Gap Between Semantic Web and Web2.0, p.57-70.

[35]Vanderlei, T.A., Dur ao, F.A., Martins, A.C., et al., 2007. A cooperative classification mechanism for search and retrieval software components. Proc. ACM Symp. on Applied Computing, p.866-871.

[36]Vander Wal, T., 2004. Folksonomy. Available from http://vanderwal.net/essays/051130/folksonomy.pdf [Accessed on Apr. 5, 2015].

[37]Vandic, D., van Dam, J.W., Hogenboom, F., et al., 2011. A semantic clustering-based approach for searching and browsing tag spaces. Proc. ACM Symp. on Applied Computing, p.1693-1699.

[38]Xu, G.D., Zong, Y., Jin, P., et al., 2015. KIPTC: a kernel information propagation tag clustering algorithm. J. Intell. Inform. Syst., 45(1):95-112.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - Journal of Zhejiang University-SCIENCE