CLC number: TP391
On-line Access: 2010-07-06
Received: 2009-07-25
Revision Accepted: 2009-09-30
Crosschecked: 2010-05-11
Cited: 9
Clicked: 8346
Ya-hong Han, Jian Shao, Fei Wu, Bao-gang Wei. Multiple hypergraph ranking for video concept detection[J]. Journal of Zhejiang University Science C, 2010, 11(7): 525-537.
@article{title="Multiple hypergraph ranking for video concept detection",
author="Ya-hong Han, Jian Shao, Fei Wu, Bao-gang Wei",
journal="Journal of Zhejiang University Science C",
volume="11",
number="7",
pages="525-537",
year="2010",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.C0910453"
}
%0 Journal Article
%T Multiple hypergraph ranking for video concept detection
%A Ya-hong Han
%A Jian Shao
%A Fei Wu
%A Bao-gang Wei
%J Journal of Zhejiang University SCIENCE C
%V 11
%N 7
%P 525-537
%@ 1869-1951
%D 2010
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.C0910453
TY - JOUR
T1 - Multiple hypergraph ranking for video concept detection
A1 - Ya-hong Han
A1 - Jian Shao
A1 - Fei Wu
A1 - Bao-gang Wei
J0 - Journal of Zhejiang University Science C
VL - 11
IS - 7
SP - 525
EP - 537
%@ 1869-1951
Y1 - 2010
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.C0910453
Abstract: This paper tackles the problem of video concept detection using the multi-modality fusion method. Motivated by multi-view learning algorithms, multi-modality features of videos can be represented by multiple graphs. And the graph-based semi-supervised learning methods can be extended to multiple graphs to predict the semantic labels for unlabeled video data. However, traditional graphs represent only homogeneous pairwise linking relations, and therefore the high-order correlations inherent in videos, such as high-order visual similarities, are ignored. In this paper we represent heterogeneous features by multiple hypergraphs and then the high-order correlated samples can be associated with hyperedges. Furthermore, the multi-hypergraph ranking (MHR) algorithm is proposed by defining Markov random walk on each hypergraph and then forming the mixture Markov chains so as to perform transductive learning in multiple hypergraphs. In experiments on the TRECVID dataset, a triple-hypergraph consisting of visual, textual features and multiple labeled tags is constructed to predict concept labels for unlabeled video shots by the MHR framework. Experimental results show that our approach is effective.
[1]Bickel, S., Scheffer, T., 2004. Multi-View Clustering. Proc. 4th IEEE Int. Conf. on Data Mining, p.19-26.
[2]Dhillon, I.S., 2001. Co-clustering Documents and Words Using Bipartite Spectral Graph Partitioning. Proc. 7th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.269-274.
[3]Dumais, S.T., Furnas, G.W., Landauer, T.K., 1998. Using Latent Semantic Analysis to Improve Access to Textual Information. Proc. SIGCHI Conf. on Human Factors in Computing Systems, p.281-285.
[4]Frey, B.J., Dueck, D., 2007. Clustering by passing messages between data points. Science, 315(5814):972-976.
[5]He, J., Li, M., Zhang, H.J., Tong, H.H., Zhang, C.S., 2004. Manifold-Ranking Based Image Retrieval. Proc. 12th Annual ACM Int. Conf. on Multimedia, p.9-16.
[6]Hoi, S.C.H., Lyu, M.R., 2008. A multimodal and multilevel ranking scheme for large-scale video retrieval. IEEE Trans. Multimedia, 10(4):607-619.
[7]Liu, J., Lai, W., Hua, X., Huang, Y., Li, S., 2007. Video Search Re-ranking via Multi-Graph Propagation. Proc. 15th Annual ACM Int. Conf. on Multimedia, p.208-217.
[8]Liu, Y., Wu, F., Zhuang, Y., Xiao, J., 2008. Active Post-Refined Multimodality Video Semantic Concept Detection with Tensor Representation. Proc. 16th Annual ACM Int. Conf. on Multimedia, p.91-100.
[9]Long, B., Yu, P.S., Zhang, Z.F., 2008. A General Model for Multiple View Unsupervised Learning. Proc. SIAM Int. Conf. on Data Mining, p.822-833.
[10]Naphade, M., Smith, J.R., Tesic, J., Chang, S.F., Hsu, W., Kennedy, L., Hauptmann, A., Curtis, J., 2006. Large-scale concept ontology for multimedia. IEEE Multimedia, 13(3):86-91.
[11]Qi, G., Hua, X.S., Rui, Y., Tang, J., Mei, T., Zhang, H.J., 2007. Correlative Multi-Label Video Annotation. Proc. 15th Annual ACM Int. Conf. on Multimedia, p.17-26.
[12]Spielman, D.A., Teng, S.H., 2003. Solving Sparse, Symmetric, Diagonally-Dominant Linear Systems in Time O(m1.31). 44th Annual IEEE Symp. on Foundations of Computer Science, p.416-427.
[13]Sun, L., Ji, S., Ye, J., 2008. Hypergraph Spectral Learning for Multi-Label Classification. Proc. 14th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.668-676.
[14]Tan, H., Ngo, C., Wu, X., 2008. Modeling Video Hyperlinks with Hypergraph for Web Video Reranking. Proc. 16th Annual ACM Int. Conf. on Multimedia, p.659-662.
[15]Tang, J., Hua, X.S., Qi, G., Wang, M., Mei, T., Wu, X., 2007. Structure-Sensitive Manifold Ranking for Video Concept Detection. Proc. 15th Annual ACM Int. Conf. on Multimedia, p.852-861.
[16]Tong, H., He, J., Li, M., Zhang, C., Ma, W.Y., 2005. Graph Based Multi-Modality Learning. Proc. 13th Annual ACM Int. Conf. on Multimedia, p.862-871.
[17]Virginia, R.S., 2005. Spectral Clustering with Two Views. Proc. 22nd Int. Conf. on Machine Learning, p.20-27.
[18]Wang, J., Zhao, Y., Wu, X., Hua, X., 2008. Transductive Multi-Label Learning for Video Concept Detection. Proc. 1st Annual ACM Int. Conf. on Multimedia Information Retrieval, p.298-304.
[19]Wang, M., Mei, T., Yuan, X., Song, Y., Dai, L., 2007a. Video Annotation by Graph-Based Learning with Neighborhood Similarity. Proc. 15th Annual ACM Int. Conf. on Multimedia, p.325-328.
[20]Wang, M., Hua, X.S., Yuan, X., Song, Y., Dai, L., 2007b. Optimizing Multi-Graph Learning: Towards a Unified Video Annotation Scheme. Proc. 15th Annual ACM Int. Conf. on Multimedia, p.862-871.
[21]Weng, M., Chuang, Y., 2008. Multi-Cue Fusion for Semantic Video Indexing. Proc. 16th Annual ACM Int. Conf. on Multimedia, p.71-80.
[22]Yanagawa, A., Chang, S.F., Kennedy, L., Hsu, W., 2007. Columbia University’s Baseline Detectors for 374 LSCOM Semantic Visual Concepts. ADVENT Technical Report No. 222-2006-8, Columbia University, New York.
[23]Yang, Y., Zhuang, Y., Wu, F., Pan, Y., 2008. Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. IEEE Trans. Multimedia, 10(3):437-446.
[24]Yuan, X., Hua, X.S., Wang, M., Wu, X., 2006. Manifold-Ranking Based Video Concept Detection on Large Database and Feature Pool. Proc. 14th Annual ACM Int. Conf. on Multimedia, p.623-626.
[25]Zha, Z., Mei, T., Wang, J., Wang, Z., Hua, X., 2009. Graph-based semi-supervised learning with multiple labels. J. Vis. Commun. Image Represent., 20(2):97-103.
[26]Zhang, H., Zhuang, Y., Wu, F., 2007. Cross-Modal Correlation Learning for Clustering on Image-Audio Dataset. Proc. 15th Annual ACM Int. Conf. on Multimedia, p. 273-276.
[27]Zhang, M., Zhou, Z., 2008. M3MIML: a Maximum Margin Method for Multi-Instance Multi-Label Learning. Proc. 8th IEEE Int. Conf. on Data Mining, p.688-697.
[28]Zhao, W., Ngo, C., Tan, H., Wu, X., 2007. Near-duplicate keyframe identification with interest point marching and pattern learning. IEEE Trans. Multimedia, 9(5):1037-1048.
[29]Zhou, D., Burges, C.J.C., 2007. Spectral Clustering and Transductive Learning with Multiple Views. Proc. 24th Int. Conf. on Machine Learning, p.1159-1166.
[30]Zhou, D., Weston, J., Gretton, A., Bousquet, O., Schölkopf, B., 2004a. Ranking on Data Manifolds. Advances in Neural Information Processing Systems 16, p.169-176.
[31]Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B., 2004b. Learning with Local and Global Consistency. Advances in Neural Information Processing Systems 16, p.321-328.
[32]Zhou, D., Huang, J., Schölkopf, B., 2007. Learning with Hypergraphs Clustering, Classification, and Embedding. Advances in Neural Information Processing Systems 19, p.1601-1608.
[33]Zhu, X., Ghahramani, Z., Lafferty, J., 2003. Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions. Proc. 20th Int. Conf. on Machine Learning, p.912-919.
Open peer comments: Debate/Discuss/Question/Opinion
<1>