CLC number: TP183
On-line Access: 2025-07-28
Received: 2024-11-25
Revision Accepted: 2025-01-26
Crosschecked: 2025-07-30
Cited: 0
Clicked: 503
Citations: Bibtex RefMan EndNote GB/T7714
Huilin ZHOU, Qihan REN, Junpeng ZHANG, Quanshi ZHANG. Towards the first principles of explaining DNNs: interactions explain the learning dynamics[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2401025 @article{title="Towards the first principles of explaining DNNs: interactions explain the learning dynamics", %0 Journal Article TY - JOUR
面向深度神经网络解释的第一性原理:基于等效交互理论解析学习动态性1上海交通大学电子信息与电气工程学院,中国上海市,200240 2上海交通大学计算机学院,中国上海市,200240 摘要:当前关于深度学习可解释性的大部分研究都是经验主义的,而是否存在第一性原理,从不同角度全方位严谨解释深度神经网络的内在机理,成为可解释人工智能领域亟待解决的核心科学问题之一。本文探讨等效交互理论可否用于深度神经网络的第一性原理解释分析。我们认为,该理论之所以具备较强的解释能力,主要体现在以下4个方面:(1)建立了一套新的公理体系,将深度神经网络的决策逻辑转化为一系列符号化的交互;(2)能够同时解释深度学习的多种典型特征,包括网络的泛化能力、抗敏感性、表征瓶颈以及学习动态性;(3)提供了统一解释深度学习算法的数学工具,从而能够系统地解释各种经验归因方法以及对抗迁移性方法背后的机制;(4)分析深度神经网络建模过程中交互复杂度的双阶段动态变化,解释深度神经网络在训练过程中建模的复杂性以及泛化能力和抗敏感性之间的联系,从而深入揭示深度神经网络的泛化能力和抗敏感性在学习阶段的内在变化机理。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Adebayo J, Gilmer J, Muelly M, et al., 2018. Sanity checks for saliency maps. Proc 32nd Int Conf on Neural Information Processing Systems, p.9525-9536. ![]() [2]Bau D, Zhou BL, Khosla A, et al., 2017. Network dissection: quantifying interpretability of deep visual representations. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.6541-6549. ![]() [3]Chen L, Lou SY, Huang BH, et al., 2024. Defining and extracting generalizable interaction primitives from DNNs. Proc 12th Int Conf on Learning Representations. ![]() [4]Cheng X, Cheng L, Peng ZR, et al., 2024. Layerwise change of knowledge in neural networks. Proc 41st Int Conf on Machine Learning, Article 316. ![]() [5]Deng HQ, Ren QH, Zhang H, et al., 2021. Discovering and explaining the representation bottleneck of DNNs. Proc 9th Int Conf on Learning Representations. ![]() [6]Deng HQ, Zou N, Du MN, et al., 2024. Unifying fourteen post-hoc attribution methods with Taylor interactions. IEEE Trans Patt Anal Mach Intell, 46(7):4625-4640. ![]() [7]Devlin J, Chang MW, Lee K, et al., 2019. BERT: pre-training of deep bidirectional transformers for language understanding. Proc Conf of North American Chapter of the Association for Computational Linguistics: Human Language Technologies, p.4171-4186. ![]() [8]Dua D, Graff C, 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml ![]() [9]Ghassemi M, Oakden-Rayner L, Beam AL, 2021. The false hope of current approaches to explainable artificial intelligence in health care. Lancet Dig Health, 3(11):e745-e750. ![]() [10]Kang JS, Erginbas YE, Butler L, et al., 2024. Learning to understand: identifying interactions via the Möbius transform. Proc 38th Int Conf on Neural Information Processing Systems, p.46160-46202. ![]() [11]Kim B, Wattenberg M, Gilmer J, et al., 2018. Interpretability beyond feature attribution: quantitative testing with concept activation vectors (TCAV). Proc 35th Int Conf on Machine Learning, p.2668-2677. ![]() [12]Krizhevsky A, Hinton G, 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report No. TR-2009, University of Toronto, Toronto, Canada. ![]() [13]Krizhevsky A, Sutskever I, Hinton GE, 2012. ImageNet classification with deep convolutional neural networks. Proc 26th Int Conf on Neural Information Processing Systems, p.1097-1105. ![]() [14]Le Y, Yang X, 2015. Tiny ImageNet Visual Recognition Challenge. CS 231N, 7(7):3. ![]() [15]LeCun Y, Bottou L, Bengio Y, et al., 1998. Gradient-based learning applied to document recognition. Proc IEEE, 86(11):2278-2324. ![]() [16]Li MJ, Zhang QS, 2023. Does a neural network really encode symbolic concepts? Proc 40th Int Conf on Machine Learning, Article 843. ![]() [17]Liu DR, Deng HQ, Cheng X, et al., 2023. Towards the difficulty for a deep neural network to learn concepts of different complexities. Proc 37th Int Conf on Advances in Neural Information Processing Systems, Article 36. ![]() [18]Qi CR, Su H, Mo KC, et al., 2017. PointNet: deep learning on point sets for 3D classification and segmentation. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.652-660. ![]() [19]Ren J, Zhang D, Wang YS, et al., 2021. Towards a unified game-theoretic view of adversarial perturbations and robustness. Proc 35th Int Conf on Neural Information Processing Systems, p.3797-3810. ![]() [20]Ren J, Li MJ, Chen QR, et al., 2023a. Defining and quantifying the emergence of sparse concepts in DNNs. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.20280-20289. ![]() [21]Ren J, Zhou ZP, Chen QR, et al., 2023b. Can we faithfully represent absence states to compute Shapley values on a DNN? Proc 11th Int Conf on Learning Representations. ![]() [22]Ren QH, Deng HQ, Chen YN, et al., 2023a. Bayesian neural networks avoid encoding complex and perturbation-sensitive concepts. Proc 40th Int Conf on Machine Learning, p.28889-28913. ![]() [23]Ren QH, Gao JY, Shen W, et al., 2023b. Where we have arrived in proving the emergence of sparse interaction primitives in DNNs. Proc 12th Int Conf on Learning Representations. ![]() [24]Ren QH, Zhang JP, Xu Y, et al., 2024. Towards the dynamics of a DNN learning symbolic interactions. https://arxiv.org/abs/2407.19198 ![]() [25]Rudin C, 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell, 1(5):206-215. ![]() [26]Selvaraju RR, Cogswell M, Das A, et al., 2017. Grad-CAM: visual explanations from deep networks via gradient-based localization. Proc IEEE Int Conf on Computer Vision, p.618-626. ![]() [27]Shapley LS, 1953. A value for n-person games. In: Kuhn H, Tucker A (Eds.), Contributions to the Theory of Games. Princeton University Press, Princeton, USA, p.307-317. ![]() [28]Shen W, Cheng L, Yang YX, et al., 2023. Can the inference logic of large language models be disentangled into symbolic concepts? https://arxiv.org/abs/2304.01083 ![]() [29]Simonyan K, Zisserman A, 2014. Very deep convolutional networks for large-scale image recognition. https://arxiv.org/abs/1409.1556 ![]() [30]Simonyan K, Vedaldi A, Zisserman A, 2014. Deep inside convolutional networks: visualising image classification models and saliency maps. https://arxiv.org/abs/1312.6034 ![]() [31]Socher R, Perelygin A, Wu J, et al., 2013. Recursive deep models for semantic compositionality over a sentiment treebank. Proc Conf on Empirical Methods in Natural Language Processing, p.1631-1642. ![]() [32]Wah C, Branson S, Welinder P, et al., 2011. The Caltech-UCSD Birds-200-2011 Dataset. Technical Report No. CNS-TR-2011-001, California Institute of Technology, Pasadena, USA. ![]() [33]Wang X, Ren J, Lin SY, et al., 2021. A unified approach to interpreting and boosting adversarial transferability. Proc 9th Int Conf on Learning Representations. ![]() [34]Wang Y, Sun YB, Liu ZW, et al., 2019. Dynamic graph CNN for learning on point clouds. ACM Trans Graph, 38(5):146. ![]() [35]Yi L, Kim VG, Ceylan D, et al., 2016. A scalable active framework for region annotation in 3D shape collections. ACM Trans Graph, 35(6):210. ![]() [36]Yosinski J, Clune J, Nguyen A, et al., 2015. Understanding neural networks through deep visualization. https://arxiv.org/abs/1506.06579 ![]() [37]Zhang H, Li S, Ma YC, et al., 2020. Interpreting and boosting dropout from a game-theoretic view. Proc 8th Int Conf on Learning Representations. ![]() [38]Zhang JP, Li Q, Lin L, et al., 2024. Two-phase dynamics of interactions explains the starting point of a DNN learning over-fitted features. https://arxiv.org/abs/2405.10262 ![]() [39]Zhou HL, Zhang H, Deng HQ, et al., 2024. Explaining generalization power of a DNN using interactive concepts. Proc 38th AAAI Conf on Artificial Intelligence, Article 19707. ![]() Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2025 Journal of Zhejiang University-SCIENCE |
Open peer comments: Debate/Discuss/Question/Opinion
<1>