CLC number: TP391
On-line Access: 2021-05-17
Received: 2020-02-11
Revision Accepted: 2020-03-23
Crosschecked: 2020-12-29
Cited: 0
Clicked: 5128
Citations: Bibtex RefMan EndNote GB/T7714
https://orcid.org/0000-0002-1802-8197
https://orcid.org/0000-0001-7722-7172
https://orcid.org/0000-0002-3045-624X
https://orcid.org/0000-0002-8043-0312
Caixia Liu, Dehui Kong, Shaofan Wang, Zhiyong Wang, Jinghua Li, Baocai Yin. Deep 3D reconstruction: methods, data, and challenges[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2000068 @article{title="Deep 3D reconstruction: methods, data, and challenges", %0 Journal Article TY - JOUR
深度三维重建:方法、数据和挑战1北京工业大学信息学部北京人工智能研究院,多媒体与智能软件技术北京市重点实验室,中国北京市,100124 2悉尼大学计算机科学学院多媒体实验室,澳大利亚新南威尔士州悉尼市,2006 摘要:三维形状重建是计算机视觉、计算机图形学、模式识别和虚拟现实等领域的重要研究课题。现有三维重建方法通常存在两个瓶颈:(1)它们涉及多个人工设计阶段,导致累积误差,且难以自动学习三维形状的语义特征;(2)它们严重依赖图像内容和质量,以及精确校准的摄像机。因此,这些方法的重建精度难以提高。基于深度学习的三维重建方法通过利用深度网络自动学习低质量图像中的三维形状语义特征,克服了这两个瓶颈。然而,这些方法具有多种体系框架,但是至今未有文献对它们作深入分析和比较。本文对基于深度学习的三维重建方法进行全面综述。首先,基于不同深度学习模型框架,将基于深度学习的三维重建方法分为4类:递归神经网络、深自编码器、生成对抗网络和卷积神经网络,并对相应方法作详细分析。其次,详细介绍上述方法常用的4个代表性数据库。再次,对基于深度学习的三维重建方法进行综合比较,包括不同方法在同一数据库、同一方法在不同数据库以及同一方法对于不同视角个数输入的结果比较。最后,讨论了基于深度学习的三维重建方法的发展趋势。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Agarwal S, Snavely N, Simon I, et al., 2009. Building Rome in a day. IEEE 12th Int Conf on Computer Vision, p.72-79. [2]Akhter I, Black MJ, 2015. Pose-conditioned joint angle limits for 3D human pose reconstruction. IEEE Conf on Computer Vision and Pattern Recognition, p.1446-1455. [3]Bansal A, Russell B, Gupta A, 2016. Marr revisited: 2D-3D alignment via surface normal prediction. IEEE Conf on Computer Vision and Pattern Recognition, p.5965-5974. [4]Bruna J, Zaremba W, Szlam A, et al., 2013. Spectral networks and locally connected networks on graphs. Int Conf on Learning Representations, p.1-14. [5]Calakli F, Taubin G, 2011. SSD: smooth signed distance surface reconstruction. Comput Graph Forum, 30(7):1993-2002. [6]Cao YP, Liu ZN, Kuang ZF, et al., 2018. Learning to reconstruct high-quality 3D shapes with cascaded fully convolutional networks. Proc 15th European Conf on Computer Vision, p.616-633. [7]Chang AX, Funkhouser T, Guibas L, et al., 2015. ShapeNet: an information-rich 3D model repository. https://arxiv.org/abs/1512.03012 [8]Chen K, Lai YK, Hu SM, 2015. 3D indoor scene modeling from RGB-D data: a survey. Comput Vis Media, 1(4):267-278. [9]Choy CB, Xu DF, Gwak J, et al., 2016. 3D-R2N2: a unified approach for single and multi-view 3D object reconstruction. Proc 14th European Conf on Computer Vision, p.628-644. [10]Cohen TS, Welling M, 2016. Group equivariant convolutional networks. Proc 33rd Int Conf on Machine Learning, p.2990-2999. [11]Cohen TS, Geiger M, Köhler J, et al., 2018. Spherical CNNs. Int Conf on Learning Representations, p.1-15. [12]Dai A, Qi CR, Niessner M, 2017. Shape completion using 3D-encoder-predictor CNNs and shape synthesis. IEEE Conf on Computer Vision and Pattern Recognition, p.6545-6554. [13]Denton E, Chintala S, Szlam A, et al., 2015. Deep generative image models using a Laplacian pyramid of adversarial networks. Proc 28th Int Conf on Neural Information Processing Systems, p.1486-1494. [14]Engel J, Schöps T, Cremers D, 2014. LSD-SLAM: large-scale direct monocular SLAM. Proc 13th European Conf on Computer Vision, p.834-849. [15]Everingham M, Eslami SMA, van Gool L, et al., 2015. The PASCAL visual object classes challenge: a retrospective. Int J Comput Vis, 111(1):98-136. [16]Fan HQ, Su H, Guibas L, 2017. A point set generation network for 3D object reconstruction from a single image. IEEE Conf on Computer Vision and Pattern Recognition, p.2463-2471. [17]Fitzgibbon A, Zisserman A, 1998. Automatic 3D model acquisition and generation of new images from video sequences. Proc 9th European Signal Processing Conf, p.129-140. [18]Furukawa Y, Ponce J, 2006. Carved visual hulls for image-based modeling. Proc 9th European Conf on Computer Vision, p.564-577. [19]Gadelha M, Maji S, Wang R, 2017. 3D shape induction from 2D views of multiple objects. Int Conf on 3D Vision, p.402-411. [20]Girdhar R, Fouhey DF, Rodriguez M, et al., 2016. Learning a predictable and generative vector representation for objects. Proc 14th European Conf on Computer Vision, p.484-499. [21]Goesele M, Snavely N, Curless B, et al., 2007. Multi-view stereo for community photo collections. IEEE 11th Int Conf on Computer Vision, p.1-8. [22]Goodfellow I, 2016. NIPS tutorial: generative adversarial networks. https://arxiv.org/abs/1701.00160 [23]Goodfellow IJ, Pouget-Abadie J, Mirza M, et al., 2014. Generative adversarial nets. Proc 27th Int Conf on Neural Information Processing Systems, p.2672-2680. [24]Graham B, 2014. Spatially-sparse convolutional neural networks. https://arxiv.org/abs/1409.6070v1 [25]Graham B, 2015. Sparse 3D convolutional neural networks. Proc British Machine Vision Conf, p.150.1-150.9. [26]Gregor K, Danihelka I, Graves A, et al., 2015. DRAW: a recurrent neural network for image generation. Proc 32nd Int Conf on Machine Learning, p.1462-1471. [27]Gulrajani I, Ahmed F, Arjovsky M, et al., 2017. Improved training of Wasserstein GANs. Advances in Neural Information Processing Systems, p.5767-5777. [28]Gwak J, Choy CB, Chandraker M, et al., 2017. Weakly supervised 3D reconstruction with adversarial constraint. Int Conf on 3D Vision, p.263-272. [29]Han XF, Laga H, Bennamoun M, 2019. Image-based 3D object reconstruction: state-of-the-art and trends in the deep learning era. IEEE Trans Patt Anal Mach Intell, 43(5):1578-1604. [30]Han XG, Li Z, Huang HB, et al., 2017. High-resolution shape completion using deep neural networks for global structure and local geometry inference. IEEE Int Conf on Computer Vision, p.85-93. [31]Häne C, Tulsiani S, Malik J, 2017. Hierarchical surface prediction for 3D object reconstruction. Int Conf on 3D Vision, p.412-420. [32]Henderson P, Ferrari V, 2019. Learning single-image 3D reconstruction by generative modelling of shape, pose and shading. Int J Comput Vis, 128:835-854. [33]Hochreiter S, Schmidhuber J, 1997. Long short-term memory. Neur Comput, 9(8):1735-1780. [34]Hu WZ, Zhu SC, 2015. Learning 3D object templates by quantizing geometry and appearance spaces. IEEE Trans Patt Anal Mach Intell, 37(6):1190-1205. [35]Huang QX, Wang H, Koltun V, 2015. Single-view reconstruction via joint analysis of image and shape collections. ACM Trans Graph, 34(4):87. [36]Kipf TN, Welling M, 2017. Semi-supervised classification with graph convolutional networks. Int Conf on Learning Representations, p.1-13. [37]Kong C, Lin CH, Lucey S, 2017. Using locally corresponding CAD models for dense 3D reconstructions from a single image. IEEE Conf on Computer Vision and Pattern Recognition, p.5603-5611. [38]Krizhevsky A, Sutskever I, Hinton GE, 2012. ImageNet classification with deep convolutional neural networks. Proc 25th Int Conf on Neural Information Processing Systems, p.1-9. [39]Laga H, 2019. A survey on deep learning architectures for image-based depth reconstruction. https://arxiv.org/abs/1906.06113 [40]Lhuillier M, Quan L, 2005. A quasi-dense approach to surface reconstruction from uncalibrated images. IEEE Trans Patt Anal Mach Intell, 27(3):418-433. [41]Li C, Wand M, 2016. Precomputed real-time texture synthesis with Markovian generative adversarial networks. Proc 14th European Conf on Computer Vision, p.702-716. [42]Li YY, Dai A, Guibas L, et al., 2015. Database-assisted object retrieval for real-time 3D reconstruction. Comput Graph Forum, 34(2):435-446. [43]Lim JJ, Pirsiavash H, Torralba A, 2014. Parsing IKEA objects: fine pose estimation. IEEE Int Conf on Computer Vision, p.2992-2999. [44]Lin CH, Kong C, Lucey S, 2018. Learning efficient point cloud generation for dense 3D object reconstruction. AAAI Conf on Artificial Intelligence, p.7114-7121. [45]Liu SC, Chen WK, Li TY, et al., 2019. Soft rasterizer: differentiable rendering for unsupervised single-view mesh reconstruction. https://arxiv.org/abs/1901.05567v1 [46]Lun ZL, Gadelha M, Kalogerakis E, et al., 2017. 3D shape reconstruction from sketches via multi-view convolutional networks. Int Conf on 3D Vision, p.67-77. [47]Nan LL, Xie K, Sharf A, 2012. A search-classify approach for cluttered indoor scene understanding. ACM Trans Graph, 31(6):137.1-137.10. [48]Nash C, Williams CKI, 2017. The shape variational autoencoder: a deep generative model of part-segmented 3D objects. Comput Graph Forum, 36(5):1-12. [49]Newell A, Yang KY, Deng J, 2016. Stacked hourglass networks for human pose estimation. Proc 14th European Conf on Computer Vision, p.483-499. [50]Niu CJ, Li J, Xu K, 2018. Im2Struct: recovering 3D shape structure from a single RGB image. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.1-9. [51]Pontes JK, Kong C, Eriksson A, et al., 2017. Compact model representation for 3D reconstruction. Int Conf on 3D Vision, p.88-96. [52]Pontes JK, Kong C, Sridharan S, et al., 2018. Image2Mesh: a learning framework for single image 3D reconstruction. Proc 14th Asian Conf on Computer Vision, p.365-381. [53]Radford A, Metz L, Chintala S, 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. Int Conf on Learning Representations, p.1-16. [54]Rezende DJ, Eslami SMA, Mohamed S, et al., 2016. Unsupervised learning of 3D structure from images. Proc 30th Conf on Neural Information Processing Systems, p.4997-5005. [55]Shao TJ, Xu WW, Zhou K, et al., 2012. An interactive approach to semantic modeling of indoor scenes with an RGBD camera. ACM Trans Graph, 31(6):136. [56]Shi YF, Long PX, Xu K, et al., 2016. Data-driven contextual modeling for 3D scene understanding. Comput Graph, 55:55-67. [57]Silberman N, Hoiem D, Kohli P, et al., 2012. Indoor segmentation and support inference from RGBD images. Proc 12th European Conf on Computer Vision, p.746-760. [58]Simonyan K, Zisserman A, 2015. Very deep convolutional networks for large-scale image recognitions. Int Conf on Learning Representations, p.1-14. [59]Smith EJ, Meger D, 2017. Improved adversarial systems for 3D object generation and reconstruction. Proc 1st Annual Conf on Robot Learning, p.87-96. [60]Sun XY, Wu JJ, Zhang XM, et al., 2018. Pix3D: dataset and methods for single-image 3D shape modeling. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.2974-2983. [61]Sun YY, 2011. A survey of 3D reconstruction based on single image. J North China Univ Technol, 23(1):9-13 (in Chinese). [62]Sundermeyer M, Schlüter R, Ney H, 2012. LSTM neural networks for language modeling. https://core.ac.uk/display/22066040 [63]Sutskever I, Vinyals O, Le Q, 2014. Sequence to sequence learning with neural networks. Proc 27th Int Conf on Neural Information Processing Systems, p.3104-3112. [64]Tatarchenko M, Dosovitskiy A, Brox T, 2017. Octree generating networks: efficient convolutional architectures for high-resolution 3D outputs. IEEE Int Conf on Computer Vision, p.2107-2115. [65]Udayan JD, Kim H, Kim JI, 2015. An image-based approach to the reconstruction of ancient architectures by extracting and arranging 3D spatial components. Front Inform Technol Electron Eng, 16(1):12-27. [66]Varley J, DeChant C, Richardson A, et al., 2017. Shape completion enabled robotic grasping. IEEE/RSJ Int Conf on Intelligent Robots and Systems, p.2442-2447. [67]Wang LJ, Fang Y, 2017. Unsupervised 3D reconstruction from a single image via adversarial learning. https://arxiv.org/abs/1711.09312 [68]Wang NY, Zhang YD, Li ZW, et al., 2018. Pixel2Mesh: generating 3D mesh models from single RGB images. Proc 15th European Conf on Computer Vision, p.55-71. [69]Wang XL, Gupta A, 2016. Generative image modeling using style and structure adversarial networks. Proc 14th European Conf on Computer Vision, p.318-335. [70]Wu JJ, Zhang CK, Xue TF, et al., 2016a. Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. Advances in Neural Information Processing Systems, p.82-90. [71]Wu JJ, Xue TF, Lim JJ, et al., 2016b. Single image 3D interpreter network. Proc 14th European Conf on Computer Vision, p.365-382. [72]Wu JJ, Wang YF, Xue TF, et al., 2017. MarrNet: 3D shape reconstruction via 2.5D sketches. Advances in Neural Information Processing Systems, p.540-550. [73]Wu ZR, Song SR, Khosla A, et al., 2015. 3D ShapeNets: a deep representation for volumetric shapes. IEEE Conf on Computer Vision and Pattern Recognition, p.1912-1920. [74]Xiang Y, Mottaghi R, Savarese S, 2014. Beyond PASCAL: a benchmark for 3D object detection in the wild. IEEE Winter Conf on Applications of Computer Vision, p.75-82. [75]Xiang Y, Kim W, Chen W, et al., 2016. ObjectNet3D: a large scale database for 3D object recognition. Proc 14th European Conf on Computer Vision, p.160-176. [76]Xiao JX, Hays J, Ehinger KA, et al., 2010. SUN database: large-scale scene recognition from abbey to zoo. IEEE Computer Society Conf on Computer Vision and Pattern Recognition, p.3485-3492. [77]Xie HZ, Yao HX, Sun XS, et al., 2019. Pix2Vox: context-aware 3D reconstruction from single and multi-view images. IEEE/CVF Int Conf on Computer Vision, p.1-9. [78]Yan XC, Yang JM, Yumer E, et al., 2016. Perspective transformer nets: learning single-view 3D object reconstruction without 3D supervision. Advances in Neural Information Processing Systems, p.1696-1704. [79]Yang B, Wen HK, Wang S, et al., 2018. 3D object reconstruction from a single depth view with adversarial learning. IEEE Int Conf on Computer Vision Workshop, p.679-688. [80]Yang B, Rosa S, Markham A, et al., 2019. 3D object dense reconstruction from a single depth view. IEEE Trans Patt Anal Mach Intell, 41(12):2820-2834. [81]Yang B, Wang S, Markham A, et al., 2020. Robust attentional aggregation of deep feature sets for multi-view 3D reconstruction. Int J Comput Vis, 128:53-73. [82]Zeiler MD, Krishnan D, Taylor GW, et al., 2010. Deconvolutional networks. IEEE Computer Society Conf on Computer Vision and Pattern Recognition, p.2528-2535. [83]Zhu CY, Byrd RH, Lu PH, et al., 1997. Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Trans Math Softw, 23(4):550-560. [84]Zou CH, Yumer E, Yang JM, et al., 2017. 3D-PRNN: generating shape primitives with recurrent neural networks. IEEE Int Conf on Computer Vision, p.900-909. Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE |
Open peer comments: Debate/Discuss/Question/Opinion
<1>