CLC number: TP37; TP391
On-line Access:
Received: 2008-12-11
Revision Accepted: 2009-07-13
Crosschecked: 2009-10-18
Cited: 0
Clicked: 5275
Ding-yin XIA, Fei WU, Wen-hao LIU, Han-wang ZHANG. Image interpretation: mining the visible and syntactic correlation of annotated words[J]. Journal of Zhejiang University Science A, 2009, 10(12): 1759-1768.
@article{title="Image interpretation: mining the visible and syntactic correlation of annotated words",
author="Ding-yin XIA, Fei WU, Wen-hao LIU, Han-wang ZHANG",
journal="Journal of Zhejiang University Science A",
volume="10",
number="12",
pages="1759-1768",
year="2009",
publisher="Zhejiang University Press & Springer",
doi="10.1631/jzus.A0820856"
}
%0 Journal Article
%T Image interpretation: mining the visible and syntactic correlation of annotated words
%A Ding-yin XIA
%A Fei WU
%A Wen-hao LIU
%A Han-wang ZHANG
%J Journal of Zhejiang University SCIENCE A
%V 10
%N 12
%P 1759-1768
%@ 1673-565X
%D 2009
%I Zhejiang University Press & Springer
%DOI 10.1631/jzus.A0820856
TY - JOUR
T1 - Image interpretation: mining the visible and syntactic correlation of annotated words
A1 - Ding-yin XIA
A1 - Fei WU
A1 - Wen-hao LIU
A1 - Han-wang ZHANG
J0 - Journal of Zhejiang University Science A
VL - 10
IS - 12
SP - 1759
EP - 1768
%@ 1673-565X
Y1 - 2009
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/jzus.A0820856
Abstract: Automatic web image annotation is a practical and effective way for both web image retrieval and image understanding. However, current annotation techniques make no further investigation of the statement-level syntactic correlation among the annotated words, therefore making it very difficult to render natural language interpretation for images such as “pandas eat bamboo”. In this paper, we propose an approach to interpret image semantics through mining the visible and textual information hidden in images. This approach mainly consists of two parts: first the annotated words of target images are ranked according to two factors, namely the visual correlation and the pairwise co-occurrence; then the statement-level syntactic correlation among annotated words is explored and natural language interpretation for the target image is obtained. Experiments conducted on real-world web images show the effectiveness of the proposed approach.
[1] Blei, D., Ng, A., Jordan, M., 2003. Latent Dirichlet allocation. J. Mach. Learn. Res., 3(4-5):993-1022.
[2] Cilibrasi, R., Vitanyi, P., 2006. Automatic Extraction of Meaning from the Web. Proc. IEEE Int. Symp. on Information Theory, p.2309-2313.
[3] Datta, R., Joshi, D., Li, J., Wang, J.Z., 2008. Image retrieval: ideas, influences, and trends of the new age. ACM Comput. Surv., 40(2): Article 5, p.1-60.
[4] Deerwester, S., Dumais, S., Landauer, T., Furnas, G., Harshman, R., 1990. Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci., 41(6):391-407.
[5] Deschacht, K., Moens, M., 2007. Text Analysis for Automatic Image Annotation. 45th Annual Meeting Association for Computational Linguistics, p.1000-1007.
[6] Doyle, P.G., Snell, J.L., 1984. Random Walks and Electric Networks. No. 22. Mathematical Association of America, Washington, D.C., USA.
[7] Duygulu, P., Barnard, K., de Fretias, N., Forsyth, D., 2002. Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary. Proc. European Conf. on Computer Vision, p.97-112.
[8] Fergus, R., Li, F., Perona, P., Zisserman, A., 2005. Learning Object Categories from Google’s Image Search. Tenth IEEE Int. Conf. on Computer Vision, p.1816-1823.
[9] Frey, B.J., Dueck, D., 2007. Clustering by passing messages between data points. Science, 315(5814):972-976.
[10] Jeon, J., Manmatha, R., 2004. Using Maximum Entropy for Automatic Image Annotation. Proc. Int. Conf. on Image and Video Retrieval, p.24-32.
[11] Jeon, J., Lavrenko, V., Manmatha, R., 2003. Automatic Image Annotation and Retrieval Using Cross-media Relevance Models. Proc. ACM SIGIR Conf., p.119-126.
[12] Jin, R., Chai, J.Y., Si, L., 2004. Effective Automatic Image Annotation via a Coherent Language Model and Active Learning. Proc. ACM Multimedia, p.892-899.
[13] Li, J., Wang, J.Z., 2006. Real-time Computerized Annotation of Pictures. Proc. ACM Multimedia, p.911-920.
[14] Liu, Y., Wu, F., 2009. Multi-modality video shot clustering with tensor representation. Multim. Tools Appl., 41(1):93-109.
[15] Liu, Y., Fu, Y., Zhang, M., Ma, S., Ru, L., 2007. Automatic Search Engine Performance Evaluation with Click-through Data Analysis. Proc. 16th Int. Conf. on World Wide Web Conf., p.1133-1134.
[16] Liu, Y., Wu, F., Zhuang, Y., Xiao, J., 2008. Active Post-refined Multi-modality Video Semantic Concept Detection with Tensor Representation. Proc. ACM Multimedia, p.91-100.
[17] Metzler, D., Manmatha, R., 2004. An Inference Network Approach to Image Retrieval. Proc Int. Conf. on Image and Video Retrieval, p.42-50.
[18] Miller, G.A., 1995. WordNet: a lexical database for English. Commun. ACM, 38(11):39-41.
[19] Pedersen, T., Patwardhan, S., Michelizzi, J., 2004. WordNetSimilarity: Measuring the Relatedness of Concepts. Proc. 5th Annual Meeting of the North American Chapter of the Association for Computational Linguistics, p.38-41.
[20] Pehcevski, J., Thom, J.A., 2007. Evaluating Focused Retrieval Tasks. SIGIR Workshop on Focused Retrieval, p.33-40.
[21] Rui, X., Yu, N., Wang, T., Li, M., 2007. A Search-based Web Image Annotation Method. IEEE Int. Conf. on Multimedia and Expo, p.655-658.
[22] Wang, J.Z., Geman, D., Luo, J., Gray, R.M., 2008. Real-world image annotation and retrieval: an introduction to the special section. IEEE Trans. Pattern Anal. Mach. Intell., 30(11):1873-1876.
[23] Wu, F., Xia, D., Zhuang, Y., Zhang, H., Liu, W., 2009. Web Image Interpretation: Semi-supervised Mining Annotated Words. IEEE Int. Conf. on Multimedia and Expo, p.1512-1515.
[24] Wu, L., Hua, X.S., Yu, N., Ma, W.Y., Li, S., 2008. Flickr Distance. Proc. ACM Multimedia, p.31-40.
[25] Xia, D., Wu, F., Zhang, X., Zhuang, Y., 2008a. Local and global approaches of affinity propagation clustering for large scale data. J. Zhejiang Univ. Sci. A, 9(10):1373-1381.
[26] Xia, D., Wu, F., Zhuang, Y., 2008b. Search-Based Automatic Web Image Annotation Using Latent Visual and Semantic Analysis. Pacific-Rim Conf. on Multimedia, p.842-845.
[27] Yan, R., Hauptmann, A., Jin, R., 2003. Multimedia Search with Pseudo-relevance Feedback. Proc. Int. Conf. on Image and Video Retrieval, p.238-247.
[28] Yeh, T., Lee, J.J., Darrell, T., 2008. Photo-based Question Answering. Proc. ACM Multimedia, p.389-398.
[29] Zhu, X., Goldberg, A.B., van Gael, J., Andrzejewski, D., 2007a. Improving Diversity in Ranking Using Absorbing Random Walks. Proc. 8th Annual Meeting of the North American Chapter of the Association for Computational Linguistics.
[30] Zhu, X., Goldberg, A.B., Eldawy, M., Dyer, C.R., Strock, B., 2007b. A Text-to-picture Synthesis System for Augmenting Communication. Integrated Intelligence Track of the 22nd AAAI Conf. on Artificial Intelligence, p.1590-1595.
Open peer comments: Debate/Discuss/Question/Opinion
<1>