JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering

Accepted manuscript available online (unedited version)

Visual knowledge: an attempt to explore machine creativity

Author(s): Yueting Zhuang, Siliang Tang
Affiliation(s): College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
Corresponding email(s): yzhuang@zju.edu.cn, siliang@zju.edu.cn
Key Words:

Share this article to： More <<< Previous Paper \|Next Paper >>>

Yueting Zhuang, Siliang Tang. Visual knowledge: an attempt to explore machine creativity[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2100116

@article{title="Visual knowledge: an attempt to explore machine creativity",
author="Yueting Zhuang, Siliang Tang",
journal="Frontiers of Information Technology & Electronic Engineering",
year="in press",
publisher="Zhejiang University Press & Springer",
doi="https://doi.org/10.1631/FITEE.2100116"
}

%0 Journal Article
%T Visual knowledge: an attempt to explore machine creativity
%A Yueting Zhuang
%A Siliang Tang
%J Frontiers of Information Technology & Electronic Engineering
%P 619-624
%@ 2095-9184
%D in press
%I Zhejiang University Press & Springer
doi="https://doi.org/10.1631/FITEE.2100116"

TY - JOUR
T1 - Visual knowledge: an attempt to explore machine creativity
A1 - Yueting Zhuang
A1 - Siliang Tang
J0 - Frontiers of Information Technology & Electronic Engineering
SP - 619
EP - 624
%@ 2095-9184
Y1 - in press
PB - Zhejiang University Press & Springer
ER -
doi="https://doi.org/10.1631/FITEE.2100116"

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: One question that has long puzzled the artificial intelligence (AI) community is: Can AI be creative? Or, can the reasoning process be creative? Starting at noetic science, this paper discusses the issues of visual knowledge representation and its potential applications to machine creativity. In this paper, we enumerate related research on imagery-thinking-based reasoning, then focus on a special type of visual knowledge representation, i.e., visual scene graph, and finally review the problem of visual scene graph construction and its potential applications in detail. All the evidence suggests that visual knowledge and visual thinking not only can improve the performance of current AI tasks but can be used in the practice of machine creativity.

视觉知识：智能创意初探

庄越挺，汤斯亮
浙江大学计算机科学与技术学院人工智能研究所，中国杭州市，310027

概要：长期以来困扰人工智能领域的一个问题是：人工智能是否具有创造力，或者说，算法的推理过程是否可以具有创造性。本文从思维科学的角度探讨人工智能创造力的问题。首先，列举形象思维推理的相关研究；然后，重点介绍一种特殊的视觉知识表示形式，即视觉场景图；最后，详细介绍视觉场景图构造问题与潜在应用。所有证据表明，视觉知识和视觉思维不仅可以改善当前人工智能任务的性能，而且可以用于机器创造力的实践。

关键词组：思维科学；形象思维推理；视觉知识表达；视觉场景图

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Arnheim R, 1997. Visual Thinking. University of California Press, San Francisco, USA.

[2]Bau D, Zhu JY, Wulff J, et al., 2019. Seeing what a GAN cannot generate. Proc IEEE/CVF Int Conf on Computer Vision, p.4501-4510.

[3]Chen L, Zhang HW, Xiao J, et al., 2019. Counterfactual critic multi-agent training for scene graph generation. Proc IEEE/CVF Int Conf on Computer Vision, p.4612-4622.

[4]Denis M, 1991. Imagery and thinking. In: Cornoldi C, McDaniel MA (Eds.), Imagery and Cognition. Springer, New York, NY, USA, p.103-131.

[5]Elgammal A, Liu BC, Elhoseiny M, et al., 2017. CAN: creative adversarial networks, generating “art” by learning about styles and deviating from style norms. https://arxiv.org/abs/1706.07068

[6]Gazzaniga MS, 1967. The split brain in man. Sci Am, 217(2):24-29.

[7]Gu JX, Zhao HD, Lin Z, et al., 2019. Scene graph generation with external knowledge and image reconstruction. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.1969-1978.

[8]Haurilet M, Roitberg A, Stiefelhagen R, 2019. It’s not about the journey; it’s about the destination: following soft paths under question-guidance for visual reasoning. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.1930-1939.

[9]Herzig R, Bar A, Xu HJ, et al., 2020. Learning canonical representations for scene graph to image generation. 16^th European Conf on Computer Vision, p.210-227.

[10]Hudson DA, Manning CD, 2019. GQA: a new dataset for real-world visual reasoning and compositional question answering. https://arxiv.org/abs/1902.09506

[11]Johnson J, Gupta A, Li FF, 2018. Image generation from scene graphs. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.1219-1228.

[12]Kolodner J, 2014. Case-Based Reasoning. Morgan Kaufmann, San Mateo, USA.

[13]Krishna R, Zhu YK, Groth O, et al., 2017. Visual genome: connecting language and vision using crowdsourced dense image annotations. Int J Comput Vis, 123(1):32-73.

[14]Li ML, Zareian A, Zeng Q, et al., 2020. Cross-media structured common space for multimedia event extraction. https://arxiv.org/abs/2005.02472

[15]Li YL, Xu L, Huang XJ, et al., 2019. HAKE: human activity knowledge engine. https://arxiv.org/abs/1904.06539v2

[16]Liu DQ, Zhang HW, Zha ZJ, et al., 2019. Referring expression grounding by marginalizing scene graph likelihood. https://arxiv.org/abs/1906.03561v1

[17]McCarthy J, Minsky ML, Rochester N, et al., 2006. A proposal for the Dartmouth summer research project on artificial intelligence. AI Mag, 27(4):12-14.

[18]Mittal G, Agrawal S, Agarwal A, et al., 2019. Interactive image generation using scene graphs. https://arxiv.org/abs/1905.03743

[19]Mu Z, Tang S, Tan J, et al., 2021. Disentangled motif-aware graph learning for phrase grounding. Proc 35^th AAAI Conf on Artificial Intelligence.

[20]Norcliffe-Brown W, Vafeais E, Parisot S, 2018. Learning conditioned graph structures for interpretable visual question answering. https://arxiv.org/abs/1806.07243v1

[21]Pan YH, 2019. On visual knowledge. Front Inform Technol Electron Eng, 20(8):1021-1025.

[22]Pan YH, 2020a. Miniaturized five fundamental issues about visual knowledge. Front Inform Technol Electron Eng, online.

[23]Pan YH, 2020b. Multiple knowledge representation of artificial intelligence. Engineering, 6(3):216-217.

[24]Radford A, Metz L, Chintala S, 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. https://arxiv.org/abs/1511.06434

[25]Shen K, Wu LF, Xu FL, et al., 2020. Hierarchical attention based spatial-temporal graph-to-sequence learning for grounded video description. Proc 29^th Int Joint Conf on Artificial Intelligence, p.941-947.

[26]Tripathi S, Bhiwandiwalla A, Bastidas A, et al., 2019. Using scene graph context to improve image generation. https://arxiv.org/abs/1901.03762

[27]Yang JW, Lu JS, Lee S, et al., 2018. Graph R-CNN for scene graph generation. Proc 15^th European Conf on Computer Vision, p.690-706.

[28]Yang X, Tang KH, Zhang HW, et al., 2019. Auto-encoding scene graphs for image captioning. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.10677-10686.

[29]Yang XY, Mei T, Xu YQ, et al., 2016. Automatic generation of visual-textual presentation layout. ACM Trans Multim Comput Commun Appl, 12(2):33.

[30]Yu RC, Li A, Morariu VI, et al., 2017. Visual relationship detection with internal and external linguistic knowledge distillation. Proc IEEE Int Conf on Computer Vision, p.1068-1076.

[31]Zareian A, Karaman S, Chang SF, 2020. Weakly supervised visual semantic parsing. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.3733-3742.

[32]Zhang HW, Kyaw Z, Chang SF, et al., 2017. Visual translation embedding network for visual relation detection. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.3107-3115.

[33]Zhang W, Wang XE, Tang S, et al., 2020. Relational graph learning for grounded video description generation. Proc 28^th ACM Int Conf on Multimedia, p.3807-3828.

[34]Zhang W, Shi H, Tang S, et al., 2021. Consensus graph representation learning for better grounded image captioning. Proc 35^th AAAI Conf on Artificial Intelligence.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

- Go to

视觉知识：智能创意初探

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference