JZUS - Journal of Zhejiang University SCIENCE

Frontiers of Information Technology & Electronic Engineering

Accepted manuscript available online (unedited version)

Aggregated context network for crowd counting

Author(s): Si-yue Yu, Jian Pu
Affiliation(s): School of Computer Science and Technology, East China Normal University, Shanghai 200062, China; more
Corresponding email(s): 51174500148@stu.ecnu.edu.cn, jianpu@fudan.edu.cn
Key Words: Crowd counting, Convolutional neural network, Density estimation, Semantic segmentation, Multi-task learning

Share this article to： More <<< Previous Paper \|Next Paper >>>

Si-yue Yu, Jian Pu. Aggregated context network for crowd counting[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.1900481

@article{title="Aggregated context network for crowd counting",
author="Si-yue Yu, Jian Pu",
journal="Frontiers of Information Technology & Electronic Engineering",
year="in press",
publisher="Zhejiang University Press & Springer",
doi="https://doi.org/10.1631/FITEE.1900481"
}

%0 Journal Article
%T Aggregated context network for crowd counting
%A Si-yue Yu
%A Jian Pu
%J Frontiers of Information Technology & Electronic Engineering
%P 1626-1638
%@ 2095-9184
%D in press
%I Zhejiang University Press & Springer
doi="https://doi.org/10.1631/FITEE.1900481"

TY - JOUR
T1 - Aggregated context network for crowd counting
A1 - Si-yue Yu
A1 - Jian Pu
J0 - Frontiers of Information Technology & Electronic Engineering
SP - 1626
EP - 1638
%@ 2095-9184
Y1 - in press
PB - Zhejiang University Press & Springer
ER -
doi="https://doi.org/10.1631/FITEE.1900481"

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: crowd counting has been applied to a variety of applications such as video surveillance, traffic monitoring, assembly control, and other public safety applications. Context information, such as perspective distortion and background interference, is a crucial factor in achieving high performance for crowd counting. While traditional methods focus merely on solving one specific factor, we aggregate sufficient context information into the crowd counting network to tackle these problems simultaneously in this study. We build a fully convolutional network with two tasks, i.e., main density map estimation and auxiliary semantic segmentation. The main task is to extract the multi-scale and spatial context information to learn the density map. The auxiliary semantic segmentation task gives a comprehensive view of the background and foreground information, and the extracted information is finally incorporated into the main task by late fusion. We demonstrate that our network has better accuracy of estimation and higher robustness on three challenging datasets compared with state-of-the-art methods.

聚合上下文信息的人群计数

余思悦¹，浦剑^1,2
¹华东师范大学计算机科学与技术学院，中国上海市，200062
²复旦大学类脑智能科学与技术研究院，中国上海市，200433

摘要：人群计数被大量应用于视频监控、交通监控、汇编控制以及其它公共安全应用场景。上下文信息相关的透视扭曲和背景干扰是影响人群计数准确性的两个关键因素。区别于只解决其中一种特定因素的传统方法，本文提出一种人群计数网络，其充分聚合上下文信息，达到同时解决两种因素的目的。提出一个多任务的全卷积网络结构，学习人群密度估计和语义分割辅助任务，前者通过提取多尺度和空间上下文信息学习人群密度图，辅助语义分割任务通过学习背景和前景信息，后期将语义分割提取的信息融入人群密度估计任务。结果表明，提出的人群计数网络具有较好的人群计数准确率；与其它方法相比，提出的方法在3个具有挑战性的人群数据集上具有更高鲁棒性。

关键词组：人群计数；卷积神经网络；密度估计；语义分割；多任务学习

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Arteta C, Lempitsky V, Noble JA, et al., 2014. Interactive object counting. European Conf on Computer Vision, p.504-518.

[2]Boominathan L, Kruthiventi SSS, Babu RV, 2016. CrowdNet: a deep convolutional network for dense crowd counting. ACM Int Conf on Multimedia, p.640-644.

[3]Cao XK, Wang ZP, Zhao YY, et al., 2018. Scale aggregation network for accurate and efficient crowd counting. European Conf on Computer Vision, p.757-773.

[4]Chan AB, Vasconcelos N, 2012. Counting people with low-level features and Bayesian regression. IEEE Trans Image Process, 21(4):2160-2177.

[5]Chan AB, Liang ZSJ, Vasconcelos N, 2008. Privacy preserving crowd monitoring: counting people without people models or tracking. IEEE Conf on Computer Vision and Pattern Recognition, p.1-7.

[6]Chen K, Loy CC, Gong SG, et al., 2012. Feature mining for localised crowd counting. British Machine Vision Conf, Article 21.

[7]Chen LC, Papandreou G, Schroff F, et al., 2017. Rethinking atrous convolution for semantic image segmentation. https://arxiv.org/abs/1706.05587

[8]Chen LC, Papandreou G, Kokkinos I, et al., 2018. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Patt Anal Mach Intell, 40(4):834-848.

[9]Cheng J, Wang PS, Li G, et al., 2018. Recent advances in efficient computation of deep convolutional neural networks. Front Inform Technol Electron Eng, 19(1):64-77.

[10]Cong RM, Lei JJ, Fu HZ, et al., 2018. Co-saliency detection for RGBD images based on multi-constraint feature matching and cross label propagation. IEEE Trans Image Process, 27(2):568-579.

[11]Cong RM, Lei JJ, Fu HZ, et al., 2019a. Going from RGB to RGBD saliency: a depth-guided transformation model. IEEE Trans Cybern, in press.

[12]Cong RM, Lei JJ, Fu HZ, et al., 2019b. Review of visual saliency detection with comprehensive information. IEEE Trans Circ Syst Video Technol, 29(10):2941-2959.

[13]Cong RM, Lei JJ, Fu HZ, et al., 2019c. Video saliency detection via sparsity-based reconstruction and propagation. IEEE Trans Image Process, 28(10):4819-4831.

[14]Dalal N, Triggs B, 2005. Histograms of oriented gradients for human detection. IEEE Conf on Computer Vision and Pattern Recognition, p.886-893.

[15]Deb D, Ventura J, 2018. An aggregated multicolumn dilated convolution network for perspective-free counting. IEEE Conf on Computer Vision and Pattern Recognition Workshops, p.195-204.

[16]Dollar P, Wojek C, Schiele B, et al., 2012. Pedestrian detection: an evaluation of the state of the art. IEEE Trans Patt Anal Mach Intell, 34(4):743-761.

[17]Fiaschi L, Nair R, Koethe U, et al., 2012. Learning to count with regression forest and structured labels. Int Conf on Pattern Recognition, p.2685-2688.

[18]Gao JY, Wang Q, Li XL, 2019. PCC Net: perspective crowd counting via spatial convolutional network. IEEE Trans Circ Syst Video Technol, in press.

[19]He XT, Peng YX, Zhao JJ, 2018. Fast fine-grained image classification via weakly supervised discriminative localization. IEEE Trans Circ Syst Video Technol, 29(5):1394-1407.

[20]Huang JH, Di XG, Wu JD, et al., 2020. A novel convolutional neural network method for crowd counting. Front Inform Technol Electron Eng, 21(8).

[21]Huang SY, Li X, Zhang ZF, et al., 2018. Body structure aware deep crowd counting. IEEE Trans Image Process, 27:1049-1059.

[22]Idrees H, Saleemi I, Seibert C, et al., 2013. Multi-source multi-scale counting in extremely dense crowd images. IEEE Conf on Computer Vision and Pattern Recognition, p.2547-2554.

[23]Lempitsky V, Zisserman A, 2010. Learning to count objects in images. Conf and Workshop on Neural Information Processing Systems, p.1324-1332.

[24]Li CY, Cong RM, Hou JH, et al., 2019. Nested network with two-stream pyramid for salient object detection in optical remote sensing images. IEEE Trans Geosci Remote Sens, 57(11):9156-9166.

[25]Li M, Zhang ZX, Huang KQ, et al., 2008. Estimating the number of people in crowded scenes by MID based foreground segmentation and head-shoulder detection. Int Conf on Pattern Recognition, p.1-4.

[26]Li YH, Zhang XFF, Chen DM, 2018. CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. IEEE Conf on Computer Vision and Pattern Recognition, p.1091-1100.

[27]Long J, Shelhamer E, Darrell T, 2015. Fully convolutional networks for semantic segmentation. IEEE Conf on Computer Vision and Pattern Recognition, p.3431-3440.

[28]Loy CC, Chen K, Gong SG, et al., 2013. Crowd Counting and Profiling: Methodology and Evaluation. Springer, New York, USA.

[29]Oñoro-Rubio D, López-Sastre RJ, 2016. Towards perspective-free object counting with deep learning. European Conf on Computer Vision, p.615-629.

[30]Paszke A, Gross S, Chintala S, et al., 2017. Automatic differentiation in PyTorch. 31^st Conf on Neural Information Processing Systems, p.1-4.

[31]Peng YX, He XT, Zhao JJ, 2018. Object-part attention model for fine-grained image classification. IEEE Trans Image Process, 27(3):1487-1500.

[32]Pham VQ, Kozakaya T, Yamaguchi O, et al., 2015. COUNT forest: CO-voting uncertain number of targets using random forest for crowd density estimation. IEEE Int Conf on Computer Vision, p.3253-3261.

[33]Pu J, Jiang YG, Wang J, et al., 2014. Which looks like which: exploring inter-class relationships in fine-grained visual categorization. European Conf on Computer Vision, p.425-440.

[34]Rabaud V, Belongie S, 2006. Counting crowded moving objects. IEEE Conf on Computer Vision and Pattern Recognition, p.705-711.

[35]Rodriguez M, Laptev I, Sivic J, et al., 2011. Density-aware person detection and tracking in crowds. IEEE Int Conf on Computer Vision, p.2423-2430.

[36]Ruder S, 2017. An overview of multi-task learning in deep neural networks. https://arxiv.org/abs/1706.05098

[37]Ryan D, Denman S, Fookes CB, et al., 2010. Crowd counting using multiple local features. Proc Digital Image Computing: Techniques and Applications, p.81-88.

[38]Sam DB, Babu RV, 2018. Top-down feedback for crowd counting convolutional neural network. AAAI Conf on Artificial Intelligence, p.7323-7330.

[39]Sam DB, Surya S, Babu RV, 2017. Switching convolutional neural network for crowd counting. IEEE Conf on Computer Vision and Pattern Recognition, p.4031-4039.

[40]Sam DB, Sajjan NN, Babu RV, 2018. Divide and grow: capturing huge diversity in crowd images with incrementally growing CNN. IEEE Conf on Computer Vision and Pattern Recognition, p.3618-3626.

[41]Shang C, Ai HZ, Bai B, 2016. End-to-end crowd counting via joint learning local and global count. IEEE Int Conf on Image Processing, p.1215-1219.

[42]Shen Z, Xu Y, Ni BB, et al., 2018. Crowd counting via adversarial cross-scale consistency pursuit. IEEE Conf on Computer Vision and Pattern Recognition, p.5245-5254.

[43]Shi MJ, Yang ZH, Xu C, et al., 2019. Revisiting perspective information for efficient crowd counting. IEEE Conf on Computer Vision and Pattern Recognition, p.7271-7280.

[44]Sindagi VA, Patel VM, 2017a. CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting. IEEE Int Conf on Advanced Video and Signal Based Surveillance, p.1-6.

[45]Sindagi VA, Patel VM, 2017b. Generating high-quality crowd density maps using contextual pyramid CNNs. IEEE Int Conf on Computer Vision, p.1879-1888.

[46]Sindagi VA, Patel VM, 2018. A survey of recent advances in CNN-based single image crowd counting and density estimation. Patt Recogn Lett, 107:3-16.

[47]Viola P, Jones MJ, 2004. Robust real-time face detection. Int J Comput Vis, 57(2):137-154.

[48]Walach E, Wolf L, 2016. Learning to count with CNN boosting. European Conf on Computer Vision, p.660-676.

[49]Wang C, Zhang H, Yang L, et al., 2015. Deep people counting in extremely dense crowds. ACM Int Conf on Multimedia, p.1299-1302.

[50]Wang LY, Yin BQ, Guo AX, et al., 2018. Skip-connection convolutional neural network for still image crowd counting. Appl Intell, 48:3360-3371.

[51]Wang LY, Yin BQ, Tang X, et al., 2019. Removing background interference for crowd counting via de-background detail convolutional network. Neurocomputing, 332:360-371.

[52]Xie WX, Peng YX, Xiao JG, 2014. Weakly-supervised image parsing via constructing semantic graphs and hypergraphs. Proc 22^nd ACM Int Conf on Multimedia, p.277-286.

[53]Zhang C, Li HS, Wang XG, et al., 2015. Cross-scene crowd counting via deep convolutional neural networks. IEEE Conf on Computer Vision and Pattern Recognition, p.833-841.

[54]Zhang YY, Zhou DS, Chen SQ, et al., 2016. Single-image crowd counting via multi-column convolutional neural network. IEEE Conf on Computer Vision and Pattern Recognition, p.589-597.

[55]Zhu C, Peng YX, 2016. Group cost-sensitive boosting for multi-resolution pedestrian detection. 30^th AAAI Conf on Artificial Intelligence, p.3676-3682.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

- Go to

聚合上下文信息的人群计数

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference