CLC number: TP391
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2021-07-22
Cited: 0
Clicked: 6346
Liang Ma, Qiaoyong Zhong, Yingying Zhang, Di Xie, Shiliang Pu. Associative affinity network learning for multi-object tracking[J]. Frontiers of Information Technology & Electronic Engineering, 2021, 22(9): 1194-1206.
@article{title="Associative affinity network learning for multi-object tracking",
author="Liang Ma, Qiaoyong Zhong, Yingying Zhang, Di Xie, Shiliang Pu",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="22",
number="9",
pages="1194-1206",
year="2021",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2000272"
}
%0 Journal Article
%T Associative affinity network learning for multi-object tracking
%A Liang Ma
%A Qiaoyong Zhong
%A Yingying Zhang
%A Di Xie
%A Shiliang Pu
%J Frontiers of Information Technology & Electronic Engineering
%V 22
%N 9
%P 1194-1206
%@ 2095-9184
%D 2021
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2000272
TY - JOUR
T1 - Associative affinity network learning for multi-object tracking
A1 - Liang Ma
A1 - Qiaoyong Zhong
A1 - Yingying Zhang
A1 - Di Xie
A1 - Shiliang Pu
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 22
IS - 9
SP - 1194
EP - 1206
%@ 2095-9184
Y1 - 2021
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2000272
Abstract: We propose a joint feature and metric learning deep neural network architecture, called the associative affinity network (AAN), as an affinity model for multi-object tracking (MOT) in videos. The AAN learns the associative affinity between tracks and detections across frames in an end-to-end manner. Considering flawed detections, the AAN jointly learns bounding box regression, classification, and affinity regression via the proposed multi-task loss. Contrary to networks that are trained with ranking loss, we directly train a binary classifier to learn the associative affinity of each track-detection pair and use a matching cardinality loss to capture information among candidate pairs. The AAN learns a discriminative affinity model for data association to tackle MOT, and can also perform single-object tracking. Based on the AAN, we propose a simple multi-object tracker that achieves competitive performance on the public MOT16 and MOT17 test datasets.
[1]Andriyenko A, Roth S, Schindler K, 2011. An analytical formulation of global occlusion reasoning for multi-target tracking. IEEE Int Conf on Computer Vision Workshops, p.1839-1846.
[2]Bergmann P, Meinhardt T, Leal-Taixé L, 2019a. Tracking without bells and whistles. IEEE/CVF Int Conf on Computer Vision, p.941-951.
[3]Bergmann P, Meinhardt T, Leal-Taixé L, 2019b. Tracktor++_v2. Available from https://github.com/phil-bergmann/tracking_wo_bnw [Accessed on July 9, 2020].
[4]Bullinger S, Bodensteiner C, Arens M, 2017. Instance flow based online multiple object tracking. IEEE Int Conf on Image Processing, p.785-789.
[5]Chen L, Ai HZ, Zhuang ZJ, et al., 2018. Real-time multiple people tracking with deeply learned candidate selection and person re-identification. IEEE Int Conf on Multimedia and Expo, p.1-6.
[6]Chen S, Gong C, Yang J, et al., 2018. Adversarial metric learning. Proc 27th Int Joint Conf on Artificial Intelligence, p.2021-2027.
[7]Chen S, Luo L, Yang J, et al., 2019. Curvilinear distance metric learning. Proc 33rd Int Conf on Neural Information Processing Systems, p.4223-4232.
[8]Choi W, 2015. Near-online multi-target tracking with aggregated local flow descriptor. IEEE Int Conf on Computer Vision, p.3029-3037.
[9]Chu P, Ling HB, 2019. FAMNet: joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. IEEE/CVF Int Conf on Computer Vision, p.6171-6180.
[10]Chu Q, Ouyang WL, Li HS, et al., 2017. Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. Proc IEEE Int Conf on Computer Vision, p.4846-4855.
[11]Dalal N, Triggs B, 2005. Histograms of oriented gradients for human detection. IEEE Computer Society Conf on Computer Vision and Pattern Recognition, p.886-893.
[12]Duan YQ, Lu JW, Zheng WH, et al., 2020. Deep adversarial metric learning. IEEE Trans Image Process, 29:2037-2051.
[13]Emami P, Ranka S, 2018. Learning permutations with sinkhorn policy gradient. https://arxiv.org/abs/1805.07010
[14]Fagot-Bouquet L, Audigier R, Dhome Y, et al., 2016. Improving multi-frame data association with sparse representations for robust near-online multi-object tracking. Proc 14th European Conf on Computer Vision, p.774-790.
[15]Fang K, Xiang Y, Li XC, et al., 2018. Recurrent autoregressive networks for online multi-object tracking. IEEE Winter Conf on Applications of Computer Vision, p.466-475.
[16]Feichtenhofer C, Pinz A, Zisserman A, 2017. Detect to track and track to detect. IEEE Int Conf on Computer Vision, p.3057-3065.
[17]Felzenszwalb PF, Girshick RB, McAllester D, et al., 2010. Object detection with discriminatively trained part-based models. IEEE Trans Patt Anal Mach Intell, 32(9):1627-1645.
[18]Han XF, Leung T, Jia YG, et al., 2015. MatchNet: unifying feature and metric learning for patch-based matching. IEEE Conf on Computer Vision and Pattern Recognition, p.3279-3286.
[19]He KM, Gkioxari G, Dollăr P, et al., 2017. Mask R-CNN. IEEE Int Conf on Computer Vision, p.2980-2988.
[20]Henschel R, Leal-Taixé L, Cremers D, et al., 2018. Fusion of head and full-body detectors for multi-object tracking. IEEE/CVF Conf on Computer Vision and Pattern Recognition Workshops, p.1509-1518.
[21]Hermans A, Beyer L, Leibe B, 2017. In defense of the triplet loss for person re-identification. https://arxiv.org/abs/1703.07737
[22]Ilg E, Mayer N, Saikia T, et al., 2017. FlowNet 2.0: evolution of optical flow estimation with deep networks. IEEE Conf on Computer Vision and Pattern Recognition, p.1647-1655.
[23]Keuper M, Tang SY, Yu ZJ, et al., 2016. A multi-cut formulation for joint segmentation and tracking of multiple objects. https://arxiv.org/abs/1607.06317
[24]Kim C, Li FX, Ciptadi A, et al., 2015. Multiple hypothesis tracking revisited. IEEE Int Conf on Computer Vision, p.4696-4704.
[25]Lan L, Tao DC, Gong C, et al., 2016. Online multi-object tracking by quadratic pseudo-Boolean optimization. Proc 25th Int Joint Conf on Artificial Intelligence, p.3396-3402.
[26]Leal-Taixé L, Canton-Ferrer C, Schindler K, 2016. Learning by tracking: Siamese CNN for robust target association. IEEE Conf on Computer Vision and Pattern Recognition Workshops, p.418-425.
[27]Ma C, Yang CS, Yang F, et al., 2018. Trajectory factory: tracklet cleaving and re-connection by deep Siamese Bi-GRU for multiple object tracking. IEEE Int Conf on Multimedia and Expo, p.1-6.
[28]Maksai A, Wang XC, Fleuret F, et al., 2017. Non-Markovian globally consistent multi-object tracking. IEEE Int Conf on Computer Vision, p.2563-2573.
[29]Milan A, Rezatofighi SH, Garg R, et al., 2017a. Data-driven approximations to NP-hard problems. Proc 31st AAAI Conf on Artificial Intelligence, p.1453-1459.
[30]Milan A, Rezatofighi SH, Dick A, et al., 2017b. Online multi-target tracking using recurrent neural networks. Proc 31st AAAI Conf on Artificial Intelligence, p.4225-4232.
[31]Nummiaro K, Koller-Meier E, van Gool L, 2003. An adaptive color-based particle filter. Image Vis Comput, 21(1):99-110.
[32]Ren SQ, He KM, Girshick R, et al., 2017. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Patt Anal Mach Intell, 39(6):1137-1149.
[33]Rezatofighi SH, Milan A, Zhang Z, et al., 2015. Joint probabilistic data association revisited. IEEE Int Conf on Computer Vision, p.3047-3055.
[34]Ristani E, Tomasi C, 2018. Features for multi-target multi-camera tracking and re-identification. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.6036-6046.
[35]Ristani E, Solera F, Zou R, et al., 2016. Performance measures and a data set for multi-target, multi-camera tracking. European Conf on Computer Vision, p.17-35.
[36]Sadeghian A, Alahi A, Savarese S, 2017. Tracking the untrackable: learning to track multiple cues with long-term dependencies. IEEE Int Conf on Computer Vision, p.300-311.
[37]Schulter S, Vernaza P, Choi W, et al., 2017. Deep network flow for multi-object tracking. IEEE Conf on Computer Vision and Pattern Recognition, p.2730-2739.
[38]Shen H, Huang LC, Huang C, et al., 2018. Tracklet association tracker: an end-to-end learning-based association approach for multi-object tracking. https://arxiv.org/abs/1808.01562
[39]Shrivastava A, Gupta A, Girshick R, 2016. Training region-based object detectors with online hard example mining. IEEE Conf on Computer Vision and Pattern Recognition, p.761-769.
[40]Son J, Baek M, Cho M, et al., 2017. Multi-object tracking with quadruplet convolutional neural networks. IEEE Conf on Computer Vision and Pattern Recognition, p.3786-3795.
[41]Sun SJ, Akhtar N, Song HS, et al., 2021. Deep affinity network for multiple object tracking. IEEE Trans Patt Anal Mach Intell, 43(1):104-119.
[42]Tang SY, Andriluka M, Andres B, et al., 2017. Multiple people tracking by lifted multicut and person re-identification. IEEE Conf on Computer Vision and Pattern Recognition, p.3701-3710.
[43]Wang B, Wang L, Shuai B, et al., 2016. Joint learning of convolutional neural networks and temporally constrained metrics for tracklet association. IEEE Conf on Computer Vision and Pattern Recognition Workshops, p.386-393.
[44]Wang XY, Han TX, Yan S, 2009. An HOG-LBP human detector with partial occlusion handling. Proc IEEE 12th Int Conf on Computer Vision, p.32-39.
[45]Wojke N, Bewley A, Paulus D, 2017. Simple online and realtime tracking with a deep association metric. IEEE Int Conf on Image Processing, p.3645-3649.
[46]Xiang J, Sang N, Hou JH, et al., 2016. Hough forest-based association framework with occlusion handling for multi-target tracking. IEEE Signal Process Lett, 23(2):257-261.
[47]Xiang J, Xu GH, Ma C, et al., 2021. End-to-end learning deep CRF models for multi-object tracking. IEEE Trans Circ Syst Video Technol, 31(1):275-288.
[48]Xiang Y, Alahi A, Savarese S, 2015. Learning to track: online multi-object tracking by decision making. IEEE Int Conf on Computer Vision, p.4705-4713.
[49]Yang B, Nevatia R, 2014. Multi-target tracking by online learning a CRF model of appearance and motion patterns. Int J Comput Vis, 107(2):203-217.
[50]Yang F, Choi W, Lin YQ, 2016. Exploit all the layers: fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. IEEE Conf on Computer Vision and Pattern Recognition, p.2129-2137.
[51]Yin JB, Wang WG, Meng QH, et al., 2020. A unified object motion and affinity model for online multi-object tracking. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.6767-6776.
[52]Zhang JMY, Zhou SP, Chang X, et al., 2020. Multiple object tracking by flowing and fusing. https://arxiv.org/abs/2001.11180
[53]Zhou XY, Koltun V, Krähenbühl P, 2020. Tracking objects as points. https://arxiv.org/abs/2004.01177
Open peer comments: Debate/Discuss/Question/Opinion
<1>