Full Text:   <2710>

CLC number: TP391

On-line Access: 2012-11-02

Received: 2012-03-05

Revision Accepted: 2012-07-09

Crosschecked: 2012-10-12

Cited: 0

Clicked: 7179

Citations:  Bibtex RefMan EndNote GB/T7714

-   Go to

Article info.
Open peer comments

Journal of Zhejiang University SCIENCE C 2012 Vol.13 No.11 P.828-839


Overlapping community detection combining content and link

Author(s):  Zhou-zhou He, Zhong-fei (Mark) Zhang, Philip S. Yu

Affiliation(s):  Zhejiang Provincial Key Laboratory of Information Network Technology, Department of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China; more

Corresponding email(s):   zju_hzz@zju.edu.cn, zhongfei@zju.edu.cn, psyu@uic.edu

Key Words:  Overlapping, Content, Link, Community detection

Zhou-zhou He, Zhong-fei (Mark) Zhang, Philip S. Yu. Overlapping community detection combining content and link[J]. Journal of Zhejiang University Science C, 2012, 13(11): 828-839.

@article{title="Overlapping community detection combining content and link",
author="Zhou-zhou He, Zhong-fei (Mark) Zhang, Philip S. Yu",
journal="Journal of Zhejiang University Science C",
publisher="Zhejiang University Press & Springer",

%0 Journal Article
%T Overlapping community detection combining content and link
%A Zhou-zhou He
%A Zhong-fei (Mark) Zhang
%A Philip S. Yu
%J Journal of Zhejiang University SCIENCE C
%V 13
%N 11
%P 828-839
%@ 1869-1951
%D 2012
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.C1200049

T1 - Overlapping community detection combining content and link
A1 - Zhou-zhou He
A1 - Zhong-fei (Mark) Zhang
A1 - Philip S. Yu
J0 - Journal of Zhejiang University Science C
VL - 13
IS - 11
SP - 828
EP - 839
%@ 1869-1951
Y1 - 2012
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.C1200049

In classic community detection, it is assumed that communities are exclusive, in the sense of either soft clustering or hard clustering. It has come to attention in the recent literature that many real-world problems violate this assumption, and thus overlapping community detection has become a hot research topic. The existing work on this topic uses either content or link information, but not both of them. In this paper, we deal with the issue of overlapping community detection by combining content and link information. We develop an effective solution called subgraph overlapping clustering (SOC) and evaluate this new approach in comparison with several peer methods in the literature that use either content or link information. The evaluations demonstrate the effectiveness and promise of SOC in dealing with large scale real datasets.

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article


[1]Ahn, Y.Y., Bagrow, J.P., Lehmann, S., 2010. Link communities reveal multiscale complexity in networks. Nature, 466(7307):761-764.

[2]Airoldi, E.M., Blei, D.M., Fienberg, S.E., Xing, E.P., 2008. Mixed membership stochastic blockmodels. J. Mach. Learn. Res., 9:1981-2014.

[3]Banerjee, A., Krumpelman, C., Ghosh, J., Basu, S., Mooney, R.J., 2005. Model-Based Overlapping Clustering. KDD, p.532-537.

[4]Baumes, J., Goldberg, M.K., Ismail, M.M., 2005. Efficient Identification of Overlapping Communities. ISI, p.27-36.

[5]Bishop, C.M., 2006. Pattern Recognition and Machine Learning. Springer.

[6]Chen, W.Y., Zhang, D., Chang, E.Y., 2008. Combinational Collaborative Filtering for Personalized Community Recommendation. KDD, p.115-123.

[7]Cohn, D.A., Hofmann, T., 2000. The Missing Link—a Probabilistic Model of Document Content and Hypertext Connectivity. NIPS, p.430-436.

[8]Fortunato, S., Castellano, C., 2009. Community Structure in Graphs. In: Encyclopedia of Complexity and Systems Science, Part 3, p.1141-1163.

[9]Fu, Q., Banerjee, A., 2008. Multiplicative Mixture Models for Overlapping Clustering. ICDM, p.791-796.

[10]Gao, J., Liang, F., Fan, W., Wang, C., Sun, Y.Z., Han, J.W., 2010. On Community Outliers and Their Efficient Detection in Information Networks. KDD, p.813-822.

[11]Gregory, S., 2010. Finding overlapping communities in networks by label propagation. New J. Phys., 12(10):103018.

[12]Kovacs, I.A., Palotai, R., Szalay, M.S., Csermely, P., 2009. Community Landscapes: an Integrative Approach to Determine Overlapping Network Module Hierarchy, Identify Key Nodes and Predict Network Dynamics. CoRR, abs/0912.0161.

[13]Lancichinetti, A., Fortunato, S., Kertesz, J., 2009. Detecting the overlapping and hierarchical community structure in complex networks. New J. Phys., 11(3):033015.

[14]Lee, C., Reid, F., McDaid, A., Hurley, N., 2010. Detecting Highly Overlapping Community Structure by Greedy Clique Expansion. SNA-KDD.

[15]Lin, C.X., Zhao, B., Mei, Q.Z., Han, J.W., 2010. PET: a Statistical Model for Popular Events Tracking in Social Communities. KDD, p.929-938.

[16]Nallapati, R., Ahmed, A., Xing, E.P., Cohen, W.W., 2008. Joint Latent Topic Models for Text and Citations. KDD, p.542-550.

[17]Newman, M., Girvan, M., 2004. Finding and evaluating community structure in networks. Phys. Rev. E, 69(2):026113.

[18]Palla, G., Derenyi, I., Farkas, I., Vicsek, T., 2005. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043):814-818.

[19]Segal, E., Battle, A., Koller, D., 2003. Decomposing Gene Expression into Cellular Processes. Pacific Symp. on Biocomputing, p.89-100.

[20]Shen, H.W., Cheng, X.Q., Cai, K., Hu, M.B., 2009. Detect overlapping and hierarchical community structure in networks. Phys. A, 388(8):1706-1712.

[21]Sun, Y.Z., Han, J.W., Zhao, P.X., Yin, Z.J., Cheng, H., Wu, T.Y., 2009a. RANKCLUS: Integrating Clustering with Ranking for Heterogeneous Information Network Analysis. EDBT, p.565-576.

[22]Sun, Y.Z., Yu, Y.T., Han, J.W., 2009b. Ranking-Based Clustering of Heterogeneous Information Networks with Star Network Schema. KDD, p.797-806.

[23]Tang, J., Sun, J.M., Wang, C., Yang, Z., 2009. Social Influence Analysis in Large-Scale Networks. KDD, p.807-816.

[24]Tang, L., Liu, H., 2009. Scalable Learning of Collective Behavior Based on Sparse Social Dimensions. CIKM, p.1107-1116.

[25]Tantipathananandh, C., Berger-Wolf, T.Y., Kempe, D., 2007. A Framework for Community Identification in Dynamic Social Networks. KDD, p.717-726.

[26]Wang, X., Tang, L., Gao, H., Liu, H., 2010. Discovering Overlapping Groups in Social Media. ICDM, p.569-578.

[27]Yan, F., Xu, Z.L., Qi, Y., 2011. Sparse Matrix-Variate Gaussian Process Blockmodels for Network Modeling. UAI, p.745-752.

[28]Yang, T.B., Jin, R., Chi, Y., Zhu, S.H., 2009. Combining Link and Content for Community Detection: a Discriminative Approach. KDD, p.927-936.

[29]Yu, S., de Moor, B., Moreau, Y., 2009. Clustering by Heterogeneous Data Fusion: Framework and Applications. NIPS Workshop.

[30]Zhang, S.H., Wang, R.S., Zhang, X.S., 2007. Identification of overlapping community structure in complex networks using fuzzy c-means clustering. Phys. A, 374(1):483-490.

[31]Zhang, X.D., Hu, X.H., Zhou, X.H., 2008. A Comparative Evaluation of Different Link Types on Enhancing Document Clustering. SIGIR, p.555-562.

[32]Zhou, Y., Cheng, H., Yu, J.X., 2009. Graph clustering based on structural/attribute similarities. Proc. VLDB, 2(1):718-729.

[33]Zhu, S.H., Yu, K., Chi, Y., Gong, Y.H., 2007. Combining Content and Link for Classification Using Matrix Factorization. SIGIR, p.487-494.

Open peer comments: Debate/Discuss/Question/Opinion


Please provide your name, email address and a comment

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE