JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering 2015 Vol.16 No.5 P.346-357

Detection of engineering vehicles in high-resolution monitoring images

Author(s): Xun Liu, Yin Zhang, San-yuan Zhang, Ying Wang, Zhong-yan Liang, Xiu-zi Ye
Affiliation(s): College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China; more
Corresponding email(s): star.liuxun@gmail.com, yinzh@zju.edu.cn, syzhang@zju.edu.cn, maggiewang0427@gmail.com
Key Words: Object detection, Histogram of oriented gradient (HOG), Dense scale-invariant feature transform (dense SIFT), Saliency, Part models, Engineering vehicles

Share this article to： More <<< Previous Article \|Next Article >>>

Xun Liu, Yin Zhang, San-yuan Zhang, Ying Wang, Zhong-yan Liang, Xiu-zi Ye. Detection of engineering vehicles in high-resolution monitoring images[J]. Frontiers of Information Technology & Electronic Engineering, 2015, 16(5): 346-357.

@article{title="Detection of engineering vehicles in high-resolution monitoring images",
author="Xun Liu, Yin Zhang, San-yuan Zhang, Ying Wang, Zhong-yan Liang, Xiu-zi Ye",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="16",
number="5",
pages="346-357",
year="2015",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1500026"
}

%0 Journal Article
%T Detection of engineering vehicles in high-resolution monitoring images
%A Xun Liu
%A Yin Zhang
%A San-yuan Zhang
%A Ying Wang
%A Zhong-yan Liang
%A Xiu-zi Ye
%J Frontiers of Information Technology & Electronic Engineering
%V 16
%N 5
%P 346-357
%@ 2095-9184
%D 2015
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1500026

TY - JOUR
T1 - Detection of engineering vehicles in high-resolution monitoring images
A1 - Xun Liu
A1 - Yin Zhang
A1 - San-yuan Zhang
A1 - Ying Wang
A1 - Zhong-yan Liang
A1 - Xiu-zi Ye
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 16
IS - 5
SP - 346
EP - 357
%@ 2095-9184
Y1 - 2015
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1500026

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: This paper presents a novel formulation for detecting objects with articulated rigid bodies from high-resolution monitoring images, particularly engineering vehicles. There are many pixels in high-resolution monitoring images, and most of them represent the background. Our method first detects object patches from monitoring images using a coarse detection process. In this phase, we build a descriptor based on histograms of oriented gradient, which contain color frequency information. Then we use a linear support vector machine to rapidly detect many image patches that may contain object parts, with a low false negative rate and a high false positive rate. In the second phase, we apply a refinement classification to determine the patches that actually contain objects. In this stage, we increase the size of the image patches so that they include the complete object using models of the object parts. Then an accelerated and improved salient mask is used to improve the performance of the dense scale-invariant feature transform descriptor. The detection process returns the absolute position of positive objects in the original images. We have applied our methods to three datasets to demonstrate their effectiveness.

This is an example of a technically interesting and innovative paper. I believe that the presented work in combining saliency detection with feature extraction makes a valuable contribution to the state of the art and the key ideas deserve in the paper to be published.

基于高清监控图像的工程车辆检测算法

目的：基于监控图像，设计一种工程车辆检测算法，使其能快速高效地对工程车辆等组件可变铰链式刚体进行识别与检测。
创新点：模拟人类视觉检测过程，把检测算法分为粗提取与精确分类两个阶段。第一阶段提出“颜色频率”特征并用其优化HOG描述子。第二阶段改进一种显著性提取算法并用改进的算法来改善dense SIFT算子。两阶段结合，得到整体高效的检测算法。
方法：监控图像的特点是高空广角监控，工程车辆像素面积小，模拟人类视觉从复杂多样的物体中检测目标物体的过程，把检测过程分为粗提取阶段与精确分类阶段。人类在寻找目标物体时，一般会快速浏览这些复杂多样的物体群，遇到拟似目标物体，会多停留几秒来确认是否为真正的目标物体。粗提取阶段，加入“颜色频率”的HOG描述子(图3)和线性SVM分类器，快速扫描整张监控图像，从中提取出拟似工程车辆区域。这一阶段的目的是低漏检率快速提取拟似目标区域。精确分类阶段是采用显著性蒙版化的dense SIFT算子(图8)，去除第一阶段提取出的非工程车辆区域，得到最终的低漏检率低错检率的快速检测算法(图1)。
结论：针对监控图像的工程车辆等组件可变的铰链式刚体，提出一种先大范围粗提取后小范围精确分类的检测算法。算法不仅高效快速，且具有一定泛化性能。

关键词：目标检测；梯度直方图；稠密度SIFT；显著性检测；组件模型；工程车辆

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Avidan, S., 2006. SpatialBoost: adding spatial reasoning to AdaBoost. Proc. 9th European Conf. on Computer Vision, p.386-396.

[2]Bay, H., Ess, A., Tuytelaars, T., et al., 2008. Speeded-up robust features (SURF). Comput. Vis. Image Understand., 110(3):346-359.

[3]Breiman, L., Spector, P., 1992. Submodel selection and evaluation in regression. The X-random case. Int. Statist. Rev., 60(3):291-319.

[4]Calonder, M., Lepetit, V., Strecha, C., et al., 2010. BRIEF: binary robust independent elementary features. Proc. 11th European Conf. on Computer Vision, p.778-792.

[5]Dalal, N., Triggs, B., 2005. Histograms of oriented gradients for human detection. Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p.886-893.

[6]Déniz, O., Bueno, G., Salido, J., et al., 2011. Face recognition using histograms of oriented gradients. Patt. Recogn. Lett., 32(12):1598-1603.

[7]Dubout, C., Fleuret, F., 2012. Exact acceleration of linear object detectors. Proc. 12th European Conf. on Computer Vision, p.301-311.

[8]Felzenszwalb, P.F., Huttenlocher, D.P., 2005. Pictorial structures for object recognition. Int. J. Comput. Vis., 61(1):55-79.

[9]Felzenszwalb, P.F., Girshick, R.B., McAllester, D., 2010a. Cascade object detection with deformable part models. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, p.2241-2248.

[10]Felzenszwalb, P.F., Girshick, R.B., McAllester, D., et al., 2010b. Object detection with discriminatively trained part-based models. IEEE Trans. Patt. Anal. Mach. Intell., 32(9):1627-1645.

[11]Fischler, M.A., Elschlager, R.A., 1973. The representation and matching of pictorial structures. IEEE Trans. Comput., 22(1):67-92.

[12]Goferman, S., Tal, A., Zelnik-Manor, L., 2010. Puzzle-like collage. Comput. Graph. For., 29(2):459-468.

[13]Goferman, S., Zelnik-Manor, L., Tal, A., 2012. Context-aware saliency detection. IEEE Trans. Patt. Anal. Mach. Intell., 34(10):1915-1926.

[14]Grauman, K., Darrell, T., 2005. The pyramid match kernel: discriminative classification with sets of image features. Proc. 10th IEEE Int. Conf. on Computer Vision, p.1458-1465.

[15]Itti, L., Koch, C., 2001. Computational modelling of visual attention. Nat. Rev. Neurosci., 2(3):194-203.

[16]Juan, L., Gwun, O., 2009. A comparison of SIFT, PCA-SIFT and SURF. Int. J. Image Process., 3(4):143-152.

[17]Kanan, C., Cottrell, G., 2010. Robust classification of objects, faces, and flowers using natural image statistics. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, p.2472-2479.

[18]Kanan, C., Tong, M.H., Zhang, L., et al., 2009. SUN: top-down saliency using natural statistics. Vis. Cogn., 17(6-7):979-1003.

[19]Ke, Y., Sukthankar, R., 2004. PCA-SIFT: a more distinctive representation for local image descriptors. Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p.506-513.

[20]Kobayashi, T., 2013. BFO meets HOG: feature extraction based on histograms of oriented p.d.f gradients for image classification. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, p.747-754.

[21]Kokkinos, I., 2011. Rapid deformable object detection using dual-tree branch-and-bound. Advances in Neural Information Processing Systems, p.2681-2689.

[22]Lazebnik, S., Schmid, C., Ponce, J., 2006. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p.2169-2178.

[23]Leutenegger, S., Chli, M., Siegwart, R.Y., 2011. BRISK: binary robust invariant scalable keypoints. Proc. IEEE Int. Conf. on Computer Vision, p.2548-2555.

[24]Li, F.F., Perona, P., 2005. A Bayesian hierarchical model for learning natural scene categories. Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p.524-531.

[25]Li, F.F., Fergus, R., Perona, P., 2007. Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Comput. Vis. Image Understand., 106(1):59-70.

[26]Li, W.H., Lin, Y.F., Fu, B., et al., 2013. Cascade classifier using combination of histograms of oriented gradients for rapid pedestrian detection. J. Softw., 8(1):71-77.

[27]Liu, C., Yuen, J., Torralba, A., et al., 2008. SIFT flow: dense correspondence across different scenes. Proc. 10th European Conf. on Computer Vision, p.28-42.

[28]Lowe, D.G., 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis., 60(2):91-110.

[29]Ojala, T., Pietikainen, M., Maenpaa, T., 2002. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Patt. Anal. Mach. Intell., 24(7):971-987.

[30]Otsu, N., 1975. A threshold selection method from gray-level histograms. Automatica, 11:23-27.

[31]Ott, P., Everingham, M., 2009. Implicit color segmentation features for pedestrian and object detection. Proc. IEEE 12th Int. Conf. on Computer vision, p.723-730.

[32]Pedersoli, M., Vedaldi, A., Gonzàlez, J., et al., 2015. A coarse-to-fine approach for fast deformable object detection. Patt. Recogn., 48(5):1844-1853.

[33]Rahtu, E., Kannala, J., Salo, M., et al., 2010. Segmenting salient objects from images and videos. Proc. 11th European Conf. on Computer Vision, p.366-379.

[34]Rutishauser, U., Walther, D., Koch, C., et al., 2004. Is bottom-up attention useful for object recognition? Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, p.37-44.

[35]Santella, A., Agrawala, M., DeCarlo, D., et al., 2006. Gaze-based interaction for semi-automatic photo cropping. Proc. SIGCHI Conf. on Human Factors in Computing Systems, p.771-780.

[36]Tola, E., Lepetit, V., Fua, P., 2010. DAISY: an efficient dense descriptor applied to wide-baseline stereo. IEEE Trans. Patt. Anal. Mach. Intell., 32(5):815-830.

[37]van de Sande, K.E.A., Gevers, T., Snoek, C.G.M., 2010. Evaluating color descriptors for object and scene recognition. IEEE Trans. Patt. Anal. Mach. Intell., 32(9):1582-1596.

[38]Vedaldi, A., Fulkerson, B., 2010. VLFeat: an open and portable library of computer vision algorithms. Proc. Int. Conf. on Multimedia, p.1469-1472.

[39]Wilcoxon, F., 1945. Individual comparisons by ranking methods. Biometr. Bull., 1(6):80-83.

[40]Wu, J.X., Rehg, J.M., 2009. Beyond the Euclidean distance: creating effective visual codebooks using the histogram intersection kernel. Proc. IEEE 12th Int. Conf. on Computer Vision, p.630-637.

[41]Yan, J.J., Lei, Z., Wen, L.Y., et al., 2014. The fastest deformable part model for object detection. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, p.2497-2504.

[42]Zaklouta, F., Stanciulescu, B., 2014. Real-time traffic sign recognition in three stages. Robot. Auton. Syst., 62(1):16-24.

[43]Zhang, J., Marszałek, M., Lazebnik, S., et al., 2007. Local features and kernels for classification of texture and object categories: a comprehensive study. Int. J. Comput. Vis., 73(2):213-238.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Similar articles

- Go to

基于高清监控图像的工程车辆检测算法

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference