CLC number: TP391.4
On-line Access: 2022-10-26
Received: 2021-07-29
Revision Accepted: 2022-10-26
Crosschecked: 2022-01-25
Cited: 0
Clicked: 2001
Citations: Bibtex RefMan EndNote GB/T7714
Yue LU, Xingyu CHEN, Zhengxing WU, Junzhi YU, Li WEN. A novel robotic visual perception framework for underwater operation[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2100366 @article{title="A novel robotic visual perception framework for underwater operation", %0 Journal Article TY - JOUR
针对水下作业的新型机器人视觉感知框架1中国科学院自动化研究所复杂系统管理与控制国家重点实验室,中国北京市,100190 2快手科技Ytech,中国北京市,100085 3北京大学工学院先进制造与机器人系湍流与复杂系统国家重点实验室,中国北京市,100871 4北京航空航天大学机械工程及自动化学院,中国北京市,100191 摘要:水下机器人操作通常需要视觉感知(如目标检测和跟踪),但水下场景视觉质量较差,且代表一种特殊分布,会影响视觉感知的准确性。同时,检测的连续性和稳定性对机器人感知也很重要,但常用的基于静态精度的评估(即平均精度(average precision))不足以反映检测器的时序性能。针对这两个问题,本文提出一种新型机器人视觉感知框架。首先,研究不同质量的数据分布与视觉恢复在检测性能上的关系。结果表明虽然分布质量对分布内检测精度几乎没有影响,但是视觉恢复可以通过缓解分布漂移,从而有益于真实海洋场景的检测。此外,提出基于目标轨迹的检测连续性和稳定性的非参考评估方法,以及一种在线轨迹优化(online tracklet refinement,OTR)来提高检测器的时间性能。最后,结合视觉恢复,建立精确稳定的水下机器人视觉感知框架。为了将视频目标检测(video object detection,VID)方法扩展到单目标跟踪任务,提出小交并比抑制(small-overlap suppression,SOS)方法,实现目标检测和目标跟踪之间的灵活切换。基于ImageNet VID数据集和真实环境下的机器人任务进行了大量实验,实验结果验证了所作分析的正确性及所提方法的优越性。代码公开在https://github.com/yrqs/VisPerception。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Bernardin K, Stiefelhagen R, 2008. Evaluating multiple object tracking performance: the clear MOT metrics. EURASIP J Image Video Process, 2008:246309. [2]Bertasius G, Torresani L, Shi JB, 2018. Object detection in video with spatiotemporal sampling networks. Proc 15th European Conf on Computer Vision, p.342-357. [3]Cai MX, Wang Y, Wang S, et al., 2020. Grasping marine products with hybrid-driven underwater vehicle-manipulator system. IEEE Trans Autom Sci Eng, 17(3):1443-1454. [4]Chen XY, Yang XY, Kong SH, et al., 2019a. Dual refinement network for single-shot object detection. Proc Int Conf on Robotics and Automation, p.8305-8310. [5]Chen XY, Yu JZ, Kong SH, et al., 2019b. Towards real-time advancement of underwater visual quality with GAN. IEEE Trans Ind Electron, 66(12):9350-9359. [6]Chen XY, Yu JZ, Wu ZX, 2020. Temporally identity-aware SSD with attentional LSTM. IEEE Trans Cybern, 50(6):2674-2686. [7]Chen XY, Yu JZ, Kong SH, et al., 2021. Joint anchor-feature refinement for real-time accurate object detection in images and videos. IEEE Trans Circ Syst Video Technol, 31(2):594-607. [8]Chen YH, Li W, Sakaridis C, et al., 2018. Domain adaptive faster R-CNN for object detection in the wild. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.3339-3348. [9]Chi C, Zhang SF, Xing JL, et al., 2019. Selective refinement network for high performance face detection. Proc AAAI Conf on Artificial Intelligence, p.8231-8238. [10]Everingham M, van Gool L, Williams CKI, et al., 2010. The PASCAL visual object classes (VOC) challenge. Int J Comput Vis, 88(2):303-338. [11]Feichtenhofer C, Pinz A, Zisserman A, 2017. Detect to track and track to detect. Proc IEEE Int Conf on Computer Vision, p.3057-3065. [12]Gong ZY, Cheng JH, Chen XY, et al., 2018. A bio-inspired soft robotic arm: kinematic modeling and hydrodynamic experiments. J Bion Eng, 15(2):204-219. [13]He KM, Zhang XY, Ren SQ, et al., 2016. Deep residual learning for image recognition. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.770-778. [14]Howard AG, Zhu ML, Chen B, et al., 2017. MobileNets: efficient convolutional neural networks for mobile vision applications. https://arxiv.org/abs/1704.04861 [15]Inoue N, Furuta R, Yamasaki T, et al., 2018. Cross-domain weakly-supervised object detection through progressive domain adaptation. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.5001-5009. [16]Kalman RE, 1960. A new approach to linear filtering and prediction problems. J Bas Eng, 82(1):35-45. [17]Kalogeiton V, Ferrari V, Schmid C, 2016. Analysing domain shift factors between videos and images for object detection. IEEE Trans Patt Anal Mach Intell, 38(11):2327-2334. [18]Kang K, Li HS, Yan JJ, et al., 2018. T-CNN: tubelets with convolutional neural networks for object detection from videos. IEEE Trans Circ Syst Video Technol, 28(10):2896-2907. [19]Khodabandeh M, Vahdat A, Ranjbar M, et al., 2019. A robust learning approach to domain adaptive object detection. Proc IEEE/CVF Int Conf on Computer Vision, p.480-490. [20]Kim HU, Kim CS, 2016. CDT: cooperative detection and tracking for tracing multiple objects in video sequences. Proc 14th European Conf on Computer Vision, p.851-867. [21]Kristan M, Leonardis A, Matas J, et al., 2018. The sixth visual object tracking VOT2018 challenge results. Proc European Conf on Computer Vision, p.3-53. [22]Li B, Xu YX, Fan SS, et al., 2018. Underwater docking of an under-actuated autonomous underwater vehicle: system design and control implementation. Front Inform Technol Electron Eng, 19(8):1024-1041. [23]Li CY, Guo JC, Cong RM, et al., 2016. Underwater image enhancement by dehazing with minimum information loss and histogram distribution prior. IEEE Trans Image Process, 25(12):5664-5677. [24]Lin TY, Goyal P, Girshick R, et al., 2017. Focal loss for dense object detection. Proc IEEE Int Conf on Computer Vision, p.2999-3007. [25]Liu RS, Fan X, Zhu M, et al., 2020. Real-world underwater enhancement: challenges, benchmarks, and solutions under natural light. IEEE Trans Circ Syst Video Technol, 30(12):4861-4875. [26]Liu W, Anguelov D, Erhan D, et al., 2016. SSD: single shot multibox detector. Proc 14th European Conf on Computer Vision, p.21-37. [27]Lowe DG, 2004. Distinctive image features from scale-invariant keypoints. Int J Comput Vis, 60(2):91-110. [28]Luo H, Xie WX, Wang XG, et al., 2019. Detect or track: towards cost-effective video object detection/tracking. Proc AAAI Conf on Artificial Intelligence, p.8803-8810. [29]Panetta K, Gao C, Agaian S, 2016. Human-visual-system-inspired underwater image quality measures. IEEE J Ocean Eng, 41(3):541-551. [30]Raj A, Namboodiri VP, Tuytelaars T, 2015. Subspace alignment based domain adaptation for RCNN detector. Proc British Machine Vision Conf, p.166.1-166.11. [31]Russakovsky O, Deng J, Su H, et al., 2015. ImageNet large scale visual recognition challenge. Int J Comput Vis, 115(3):211-252. [32]Schechner YY, Karpel N, 2004. Clear underwater vision. Proc IEEE Computer Society Conf on Computer Vision and Pattern Recognition, p.536-543. [33]Simonyan K, Zisserman A, 2014. Very deep convolutional networks for large-scale image recognition. https://arxiv.org/abs/1409.1556 [34]Xu JL, Ramos S, Vázquez D, et al., 2014. Domain adaptation of deformable part-based models. IEEE Trans Patt Anal Mach Intell, 36(12):2367-2380. [35]Yang M, Sowmya A, 2015. An underwater color image quality evaluation metric. IEEE Trans Image Process, 24(12):6062-6071. [36]Zhang SF, Wen LY, Bian X, et al., 2018. Single-shot refinement neural network for object detection. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.4203-4212. [37]Zhou XY, Zhuo JC, Krähenbühl P, 2019. Bottom-up object detection by grouping extreme and center points. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.850-859. [38]Zhu DQ, Qu Y, Yang SX, 2019. Multi-AUV SOM task allocation algorithm considering initial orientation and ocean current environment. Front Inform Technol Electron Eng, 20(3):330-341. [39]Zhu YS, Zhao CY, Guo HY, et al., 2019. Attention CoupleNet: fully convolutional attention coupling network for object detection. IEEE Trans Image Process, 28(1):113-126. Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2024 Journal of Zhejiang University-SCIENCE |
Open peer comments: Debate/Discuss/Question/Opinion
<1>