CLC number: TP18
On-line Access: 2022-09-21
Received: 2022-07-10
Revision Accepted: 2022-09-21
Crosschecked: 2022-07-24
Cited: 0
Clicked: 1626
Yi MA, Doris TSAO, Heung-Yeung SHUM. On the principles of Parsimony and Self-consistency for the emergence of intelligence[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2200297 @article{title="On the principles of Parsimony and Self-consistency for the emergence of intelligence", %0 Journal Article TY - JOUR
论智能起源中的简约与自洽原则1加州大学伯克利分校电子工程与计算机系,美国加利福尼亚州,94720 2加州大学伯克利分校分子与细胞生物系,霍华德·休斯医学研究所,美国加利福尼亚州,94720 3粤港澳大湾区数字经济研究院,中国深圳市,518045 摘要:深度学习重振人工智能十年后的今天,我们提出一个理论框架来帮助理解深度神经网络在整个智能系统里面扮演的角色。我们引入两个基本原则:简约与自洽;分别解释智能系统要学习什么以及如何学习。我们认为这两个原则是人工智能和自然智能之所以产生和发展的基石。虽然这两个原则的雏形早已出现在前人的经典工作里,但是我们对这些原则的重新表述使得它们变得可以精准度量与计算。确切地说,简约与自洽这两个原则能自然地演绎出一个高效计算框架:压缩闭环转录。这个框架统一并解释了现代深度神经网络以及众多人工智能实践的演变和进化。尽管本文主要用视觉数据建模作为例子,我们相信这两个原则将会有助于统一对各种自动智能系统的理解,并且提供一个帮助理解大脑工作机理的框架。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Agarwal A, Kakade S, Krishnamurthy A, et al., 2020. FLAMBE: structural complexity and representation learning of low rank MDPs. Proc 34th Int Conf on Neural Information Processing Systems, p.20095-20107. [2]Azulay A, Weiss Y, 2019. Why do deep convolutional networks generalize so poorly to small image transformations? https://arxiv.org/abs/1805.12177 [3]Baek C, Wu ZY, Chan KHR, et al., 2022. Efficient maximal coding rate reduction by variational forms. https://arxiv.org/abs/2204.00077 [4]Bai SJ, Kolter JZ, Koltun V, 2019. Deep equilibrium models. Proc 33rd Int Conf on Neural Information Processing Systems, p.690-701. [5]Baker B, Gupta O, Naik N, et al., 2017. Designing neural network architectures using reinforcement learning. https://arxiv.org/abs/1611.02167 [6]Bao PL, She L, McGill M, et al., 2020. A map of object space in primate inferotemporal cortex. Nature, 583(7814):103-108. [7]Barlow HB, 1961. Possible principles underlying the transformations of sensory messages. In: Rosenblith WA (Ed.), Sensory Communication. MIT Press, Cambridge, MA, USA, p.217-234. [8]Bear DM, Fan CF, Mrowca D, et al., 2020. Learning physical graph representations from visual scenes. Proc 34th Int Conf on Neural Information Processing Systems, p.6027-6039. [9]Belkin M, Hsu D, Ma SY, et al., 2019. Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proc Natl Acad Sci USA, 116(32):15849-15854. [10]Benna MK, Fusi S, 2021. Place cells may simply be memory cells: memory compression leads to spatial tuning and history dependence. Proc Natl Acad Sci USA, 118(51):e2018422118. [11]Bennett J, Carbery A, Christ M, et al., 2008. The Brascamp–Lieb inequalities: finiteness, structure and extremals. Geom Funct Anal, 17(5):1343-1415. [12]Berner C, Brockman G, Chan B, et al., 2019. Dota 2 with large scale deep reinforcement learning. https://arxiv.org/abs/1912.06680 [13]Bertsekas DP, 2012. Dynamic Programming and Optimal Control, Volume I and II. Athena Scientific, Belmont, Massachusetts, USA. [14]Bronstein MM, Bruna J, Cohen T, et al., 2021. Geometric deep learning: grids, groups, graphs, geodesics, and gauges. https://arxiv.org/abs/2104.13478 [15]Bruna J, Mallat S, 2013. Invariant scattering convolution networks. IEEE Trans Patt Anal Mach Intell, 35(8):1872-1886. [16]Buchanan S, Gilboa D, Wright J, 2021. Deep networks and the multiple manifold problem. https://arxiv.org/abs/2008.11245 [17]Candès EJ, Li XD, Ma Y, et al., 2011. Robust principal component analysis? J ACM, 58(3):11. [18]Chai JX, Tong X, Chan SC, et al., 2000. Plenoptic sampling. Proc 27th Annual Conf on Computer Graphics and Interactive Techniques, p.307-318. [19]Chan ER, Monteiro M, Kellnhofer P, et al., 2021. pi-GAN: periodic implicit generative adversarial networks for 3D-aware image synthesis. https://arxiv.org/abs/2012.00926 [20]Chan KHR, Yu YD, You C, et al., 2022. ReduNet: a white-box deep network from the principle of maximizing rate reduction. J Mach Learn Res, 23(114):1-103. [21]Chan TH, Jia K, Gao SH, et al., 2015. PCANet: a simple deep learning baseline for image classification? IEEE Trans Image Process, 24(12):5017-5032. [22]Chang L, Tsao DY, 2017. The code for facial identity in the primate brain. Cell, 169(6):1013-1028. [23]Cohen H, Kumar A, Miller SD, et al., 2017. The sphere packing problem in dimension 24. Ann Math, 185(3):1017-1033. [24]Cohen TS, Welling M, 2016. Group equivariant convolutional networks. https://arxiv.org/abs/1602.07576 [25]Cohen TS, Geiger M, Weiler M, 2019. A general theory of equivariant CNNs on homogeneous spaces. Proc 33rd Int Conf on Neural Information Processing Systems, p.9145-9156. [26]Cover TM, Thomas JA, 2006. Elements of Information Theory (2nd Ed.). John Wiley & Sons, Inc., Hoboken, New Jersey, USA. [27]Dai XL, Tong SB, Li MY, et al., 2022. Closed-loop data transcription to an LDR via minimaxing rate reduction. https://arxiv.org/abs/2111.06636 [28]Dosovitskiy A, Beyer L, Kolesnikov A, et al., 2021. An image is worth 16×16 words: transformers for image recognition at scale. https://arxiv.org/abs/2010.11929 [29]El Ghaoui L, Gu FD, Travacca B, et al., 2021. Implicit deep learning. SIAM J Math Data Sci, 3(3):930-958. [30]Engstrom L, Tran B, Tsipras D, et al., 2019. A rotation and a translation suffice: fooling CNNs with simple transformations. https://arxiv.org/abs/1712.02779v3 [31]Fefferman C, Mitter S, Narayanan H, 2013. Testing the manifold hypothesis. https://arxiv.org/abs/1310.0425 [32]Fiez T, Chasnov B, Ratliff LJ, 2019. Convergence of learning dynamics in Stackelberg games. https://arxiv.org/abs/1906.01217 [33]Friston K, 2009. The free-energy principle: a rough guide to the brain? Trends Cogn Sci, 13(7):293-301. [34]Fukushima K, 1980. Neocognitron: a self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern, 36(4):193-202. [35]Goodfellow IJ, Pouget-Abadie J, Mirza M, et al., 2014. Generative adversarial nets. Proc 27th Int Conf on Neural Information Processing Systems, p.2672-2680. [36]Gortler SJ, Grzeszczuk R, Szeliski R, et al., 1996. The lumigraph. Proc 23rd Annual Conf on Computer Graphics and Interactive Techniques, p.43-54. [37]Gregor K, LeCun Y, 2010. Learning fast approximations of sparse coding. Proc 27th Int Conf on Machine Learning, p.399-406. [38]Hadsell R, Chopra S, LeCun Y, 2006. Dimensionality reduction by learning an invariant mapping. IEEE Computer Society Conf on Computer Vision and Pattern Recognition, p.1735-1742. [39]He KM, Zhang XY, Ren SQ, et al., 2016. Deep residual learning for image recognition. IEEE Conf on Computer Vision and Pattern Recognition, p.770-778. [40]Hinton GE, Zemel RS, 1993. Autoencoders, minimum description length and Helmholtz free energy. Proc 6th Int Conf on Neural Information Processing Systems, p.3-10. [41]Hinton GE, Dayan P, Frey BJ, et al., 1995. The “wake-sleep” algorithm for unsupervised neural networks. Science, 268(5214):1158-1161. [42]Ho J, Jain A, Abbeel P, 2020. Denoising diffusion probabilistic models. https://arxiv.org/abs/2006.11239 [43]Hochreiter S, Schmidhuber J, 1997. Long short-term memory. Neur Comput, 9(8):1735-1780. [44]Huang G, Liu Z, van der Maaten L, et al., 2017. Densely connected convolutional networks. IEEE Conf on Computer Vision and Pattern Recognition, p.2261-2269. [45]Hughes JF, van Dam A, McGuire M, et al., 2014. Computer Graphics: Principles and Practice (3rd Ed.). Addison-Wesley, Upper Saddle River, NJ, USA. [46]Hutter F, Kotthoff L, Vanschoren J, 2019. Automated Machine Learning: Methods, Systems, Challenges. Springer Cham. [47]Hyvärinen A, 1997. A family of fixed-point algorithms for independent component analysis. IEEE Int Conf on Acoustics, Speech, and Signal Processing, p.3917-3920. [48]Hyvärinen A, Oja E, 1997. A fast fixed-point algorithm for independent component analysis. Neur Comput, 9(7):1483-1492. [49]Jin C, Netrapalli P, Jordan MI, 2020. What is local optimality in nonconvex-nonconcave minimax optimization? https://arxiv.org/abs/1902.00618 [50]Jolliffe IT, 1986. Principal Component Analysis. Springer-Verlag, New York, NY, USA. [51]Josselyn SA, Tonegawa S, 2020. Memory engrams: recalling the past and imagining the future. Science, 367(6473):eaaw4325. [52]Kakade SM, 2001. A natural policy gradient. Proc 14th Int Conf on Neural Information Processing Systems: Natural and Synthetic, p.1531-1538. [53]Kanwisher N, 2010. Functional specificity in the human brain: a window into the functional architecture of the mind. Proc Natl Acad Sci USA, 107(25):11163-11170. [54]Kanwisher N, McDermott J, Chun MM, 1997. The fusiform face area: a module in human extrastriate cortex specialized for face perception. J Neurosci, 17(11):4302-4311. [55]Keller GB, Mrsic-Flogel TD, 2018. Predictive processing: a canonical cortical computation. Neuron, 100(2):424-435. [56]Kelley HJ, 1960. Gradient theory of optimal flight paths. ARS J, 30(10):947-954. [57]Kingma DP, Welling M, 2013. Auto-encoding variational Bayes. https://arxiv.org/abs/1312.6114 [58]Kobyzev I, Prince SJD, Brubaker MA, 2021. Normalizing flows: an introduction and review of current methods. IEEE Trans Patt Anal Mach Intell, 43(11):3964-3979. [59]Koopman BO, 1931. Hamiltonian systems and transformation in Hilbert space. Proc Natl Acad Sci USA, 17(5):315-318. [60]Kramer MA, 1991. Nonlinear principal component analysis using autoassociative neural networks. AIChE J, 37(2):233-243. [61]Kriegeskorte N, Mur M, Ruff DA, et al., 2008. Matching categorical object representations in inferior temporal cortex of man and monkey. Neuron, 60(6):1126-1141. [62]Krizhevsky A, Sutskever I, Hinton GE, 2012. ImageNet classification with deep convolutional neural networks. Proc 25th Int Conf on Neural Information Processing Systems, p.1097-1105. [63]Kulkarni TD, Whitney WF, Kohli P, et al., 2015. Deep convolutional inverse graphics network. Proc 28th Int Conf on Neural Information Processing Systems, p.2539-2547. [64]LeCun Y, 2022. A Path Towards Autonomous Machine Intelligence. https://openreview.net/pdf?id=BZ5a1r-kVsf [65]LeCun Y, Browning J, 2022. What AI can tell us about intelligence. NO-EMA Magazine. https://www.noemamag.com/what-ai-can-tell-us-about-intelligence/ [66]LeCun Y, Bottou L, Bengio Y, et al., 1998. Gradient-based learning applied to document recognition. Proc IEEE, 86(11):2278-2324. [67]LeCun Y, Bengio Y, Hinton G, 2015. Deep learning. Nature, 521(7553):436-444. [68]Lei N, Su KH, Cui L, et al., 2017. A geometric view of optimal transportation and generative model. https://arxiv.org/abs/1710.05488 [69]Levoy M, Hanrahan P, 1996. Light field rendering. Proc 23rd Annual Conf on Computer Graphics and Interactive Techniques, p.31-42. [70]Li G, Wei YT, Chi YJ, et al., 2020. Breaking the sample size barrier in model-based reinforcement learning with a generative model. Proc 34th Int Conf on Neural Information Processing Systems, p.12861-12872. [71]Ma Y, Soatto S, Košecká J, et al., 2004. An Invitation to 3-D Vision: from Images to Geometric Models. Springer-Verlag, New York, USA. [72]Ma Y, Derksen H, Hong W, et al., 2007. Segmentation of multivariate mixed data via lossy data coding and compression. IEEE Trans Patt Anal Mach Intell, 29(9):1546-1562. [73]MacDonald J, Wäldchen S, Hauch S, et al., 2019. A rate-distortion framework for explaining neural network decisions. https://arxiv.org/abs/1905.11092 [74]Marcus G, 2020. The next decade in AI: four steps towards robust artificial intelligence. https://arxiv.org/abs/2002.06177 [75]Marr D, 1982. Vision. MIT Press, Cambridge, MA, USA. [76]Mayr O, 1970. The Origins of Feedback Control. MIT Press, Cambridge, MA, USA. [77]McCloskey M, Cohen NJ, 1989. Catastrophic interference in connectionist networks: the sequential learning problem. Psychol Learn Motiv, 24:109-165. [78]Mildenhall B, Srinivasan PP, Tancik M, et al., 2020. NeRF: representing scenes as neural radiance fields for view synthesis. https://arxiv.org/abs/2003.08934 [79]Nash J, 1951. Non-cooperative games. Ann Math, 54(2):286-295. [80]Newell A, Simon HA, 1972. Human Problem Solving. Prentice Hall, Englewood Cliffs, New Jersey, USA. [81]Ng AY, Russell SJ, 2000. Algorithms for inverse reinforcement learning. Proc 17th Int Conf on Machine Learning, p.663-670. [82]Olshausen BA, Field DJ, 1996. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583):607-609. [83]Osband I, van Roy B, 2014. Model-based reinforcement learning and the eluder dimension. Proc 27th Int Conf on Neural Information Processing Systems, p.1466-1474. [84]Pai D, Psenka M, Chiu CY, et al., 2022. Pursuit of a discriminative representation for multiple subspaces via sequential games. https://arxiv.org/abs/2206.09120 [85]Papyan V, Romano Y, Sulam J, et al., 2018. Theoretical foundations of deep learning via sparse representations: a multilayer sparse model and its connection to convolutional neural networks. IEEE Signal Process Mag, 35(4):72-89. [86]Papyan V, Han XY, Donoho DL, 2020. Prevalence of neural collapse during the terminal phase of deep learning training. https://arxiv.org/abs/2008.08186 [87]Patterson D, Gonzalez J, Hölzle U, et al., 2022. The carbon footprint of machine learning training will plateau, then shrink. https://arxiv.org/abs/2204.05149 [88]Quinlan JR, 1986. Induction of decision trees. Mach Learn, 1(1):81-106. [89]Rao RPN, Ballard DH, 1999. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat Neurosci, 2(1):79-87. [90]Rifai S, Vincent P, Muller X, et al., 2011. Contractive auto-encoders: explicit invariance during feature extraction. Proc 28th Int Conf on Machine Learning, p.833-840. [91]Rissanen J, 1989. Stochastic Complexity in Statistical Inquiry. World Scientific Publishing Co., Inc., Singapore. [92]Roberts DA, Yaida S, 2022. The Principles of Deep Learning Theory. Cambridge University Press, Cambridge, MA, USA. [93]Rosenblatt F, 1958. The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev, 65(6):386-408. [94]Rumelhart DE, Hinton GE, Williams RJ, 1986. Learning representations by back-propagating errors. Nature, 323(6088):533-536. [95]Russell S, Norvig P, 2020. Artificial Intelligence: a Modern Approach (4th Ed.). Pearson Education, Inc., River Street, Hoboken, NJ, USA. [96]Sastry S, 1999. Nonlinear Systems: Analysis, Stability, and Control. Springer, New York, USA. [97]Saxe AM, Bansal Y, Dapello J, et al., 2019. On the information bottleneck theory of deep learning. J Stat Mech, 2019:124020. [98]Shamir A, Melamed O, BenShmuel O, 2022. The dimpled manifold model of adversarial examples in machine learning. https://arxiv.org/abs/2106.10151 [99]Shannon CE, 1948. A mathematical theory of communication. Bell Syst Techn J, 27(3):379-423. [100]Shazeer N, Mirhoseini A, Maziarz K, et al., 2017. Outrageously large neural networks: the sparsely-gated mixture-of-experts layer. https://arxiv.org/abs/1701.06538 [101]Shum HY, Chan SC, Kang SB, 2007. Image-Based Rendering. Springer, New York, USA. [102]Shwartz-Ziv R, Tishby N, 2017. Opening the black box of deep neural networks via information. https://arxiv.org/abs/1703.00810 [103]Silver D, Huang A, Maddison CJ, et al., 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484-489. [104]Silver D, Schrittwieser J, Simonyan K, et al., 2017. Mastering the game of Go without human knowledge. Nature, 550(7676):354-359. [105]Simon HA, 1969. The Sciences of the Artificial. MIT Press, Cambridge, MA, USA. [106]Srivastava A, Valkoz L, Russell C, et al., 2017. VeeGAN: reducing mode collapse in GANs using implicit variational learning. Proc 31st Int Conf on Neural Information Processing Systems, p.3310-3320. [107]Srivastava RK, Greff K, Schmidhuber J, 2015. Highway networks. https://arxiv.org/abs/1505.00387 [108]Sutton RS, Barto AG, 2018. Reinforcement Learning: an Introduction (2nd Ed.). MIT Press, Cambridge, MA, USA. [109]Szegedy C, Zaremba W, Sutskever I, et al., 2014. Intriguing properties of neural networks. https://arxiv.org/abs/1312.6199 [110]Szeliski R, 2022. Computer Vision: Algorithms and Applications (2nd Ed.). Springer-Verlag, Switzerland. [111]Tenenbaum JB, de Silva V, Langford JC, 2000. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319-2323. [112]Tishby N, Zaslavsky N, 2015. Deep learning and the information bottleneck principle. IEEE Information Theory Workshop, p.1-5. [113]Tong SB, Dai XL, Wu ZY, et al., 2022. Incremental learning of structured memory via closed-loop transcription. https://arxiv.org/abs/2202.05411 [114]Uehara M, Zhang XZ, Sun W, 2022. Representation learning for online and offline RL in low-rank MDPs. https://arxiv.org/abs/2110.04652v1 [115]van den Oord A, Li YZ, Vinyals O, 2019. Representation learning with contrastive predictive coding. https://arxiv.org/abs/1807.03748v1 [116]Vaswani A, Shazeer N, Parmar N, et al., 2017. Attention is all you need. https://arxiv.org/abs/1706.03762 [117]Viazovska MS, 2017. The sphere packing problem in dimension 8. Ann Math, 185(3):991-1015. [118]Vidal R, 2022. Attention: Self-Expression Is All You Need. https://openreview.net/forum?id=MmujBClawFo [119]Vidal R, Ma Y, Sastry SS, 2016. Generalized Principal Component Analysis. Springer Verlag, New York, USA. [120]Vinyals O, Babuschkin I, Czarnecki WM, et al., 2019. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782):350-354. [121]von Neumann J, Morgenstern O, 1944. Theory of Games and Economic Behavior. Princeton University Press, Princeton, NJ, USA. [122]Wang TR, Buchanan S, Gilboa D, et al., 2021. Deep networks provably classify data on curves. https://arxiv.org/abs/2107.14324 [123]Wiatowski T, Bölcskei H, 2018. A mathematical theory of deep convolutional neural networks for feature extraction. IEEE Trans Inform Theory, 64(3):1845-1866. [124]Wiener N, 1948. Cybernetics. MIT Press, Cambridge, MA, USA. [125]Wiener N, 1961. Cybernetics (2nd Ed.). MIT Press, Cambridge, MA, USA. [126]Wisdom S, Powers T, Pitton J, et al., 2017. Building recurrent networks by unfolding iterative thresholding for sequential sparse recovery. IEEE Int Conf on Acoustics, Speech and Signal Processing, p.4346-4350. [127]Wood E, Baltrušaitis T, Hewitt C, et al., 2021. Fake it till you make it: face analysis in the wild using synthetic data alone. IEEE/CVF Int Conf on Computer Vision, p.3661-3671. [128]Wright J, Ma Y, 2022. High-Dimensional Data Analysis with Low-Dimensional Models: Principles, Computation, and Applications. Cambridge University Press, Cambridge, MA, USA. [129]Wright J, Tao Y, Lin ZY, et al., 2007. Classification via minimum incremental coding length (MICL). Proc 20th Int Conf on Neural Information Processing Systems, p.1633-1640. [130]Xie SN, Girshick R, Dollár P, et al., 2017. Aggregated residual transformations for deep neural networks. IEEE Conf on Computer Vision and Pattern Recognition, p.5987-5995. [131]Yang ZT, Yu YD, You C, et al., 2020. Rethinking bias-variance trade-off for generalization of neural networks. Proc 37th Int Conf on Machine Learning, p.10767-10777. [132]Yildirim I, Belledonne M, Freiwald W, et al., 2020. Efficient inverse graphics in biological face processing. Sci Adv, 6(10):eaax5979. [133]Yu A, Fridovich-Keil S, Tancik M, et al., 2021. Plenoxels: radiance fields without neural networks. https://arxiv.org/abs/2112.05131 [134]Yu YD, Chan KHR, You C, et al., 2020. Learning diverse and discriminative representations via the principle of maximal coding rate reduction. Proc 34th Int Conf on Neural Information Processing Systems, p.9422-9434. [135]Zeiler MD, Fergus R, 2014. Visualizing and understanding convolutional networks. Proc 13th European Conf on Computer Vision, p.818-833. [136]Zhai YX, Yang ZT, Liao ZY, et al., 2020. Complete dictionary learning via l4-norm maximization over the orthogonal group. J Mach Learn Res, 21(1):6622-6689. [137]Zhu JY, Park T, Isola P, et al., 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. IEEE Int Conf on Computer Vision, p.2242-2251. [138]Zoph B, Le QV, 2017. Neural architecture search with reinforcement learning. https://arxiv.org/abs/1611.01578 Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE |
Open peer comments: Debate/Discuss/Question/Opinion
<1>