Open Access

Impact of Labeling Schemes on Dense Crowd Counting Using Convolutional Neural Networks with Multiscale Upsampling

Greg Olmschenk

Department of Computer Science, The Graduate Center, CUNY, 365 Fifth Avenue, New York, NY 10016, USA

NASA Goddard Space Flight Center, 8800 Greenbelt Rd, Greenbelt, MD 20771, USA

E-mail Address: golmschenk@gradcenter.cuny.edu

Corresponding authors with equal contributions.

Search for more papers by this author

Xuan Wang

Department of Computer Science, The Graduate Center, CUNY, 365 Fifth Avenue, New York, NY 10016, USA

E-mail Address: xwang4@gradcenter.cuny.edu

Corresponding authors with equal contributions.

Search for more papers by this author

Hao Tang

Department of Computer Information Systems, Borough of Manhattan Community College, CUNY, 199 Chambers Street, New York, NY 10007, USA

E-mail Address: htang@bmcc.cuny.edu

Search for more papers by this author

, and

Zhigang Zhu

https://orcid.org/0000-0002-9990-1137

Department of Computer Science, The Graduate Center, CUNY, 365 Fifth Avenue, New York, NY 10016, USA

Department of Computer Science, The City College of New York, CUNY, 160 Convent Avenue, New York, NY 10031, USA

E-mail Address: zzhu@ccny.cuny.edu

Search for more papers by this author

https://doi.org/10.1142/S0218001421600120Cited by:2 (Source: Crossref)

This article is part of the issue:

Special Section on the 15th International Conference on Computer Vision Theory and Applications (VISAPP 2020)
Guest Editors: Petia Radeva and Giovanni Maria Farinella

Abstract

Gatherings of thousands to millions of people frequently occur for an enormous variety of educational, social, sporting, and political events, and automated counting of these high-density crowds is useful for safety, management, and measuring significance of an event. In this work, we show that the regularly accepted labeling scheme of crowd density maps for training deep neural networks may not be the most effective one. We propose an alternative inverse k-nearest neighbor (i $k$ NN) map mechanism that, even when used directly in existing state-of-the-art network structures, shows superior performance. We also provide new network architecture mechanisms that we demonstrate in our own MUD-i $k$ NN network architecture, which uses multi-scale drop-in replacement upsampling via transposed convolutions to take full advantage of the provided i $k$ NN labeling. This upsampling combined with the i $k$ NN maps further improves crowd counting accuracy. We further analyze several variations of the i $k$ NN labeling mechanism, which apply transformations on the $k$ NN measure before generating the map, in order to consider the impact of camera perspective views, image resolutions, and the changing rates of the mapping functions. To alleviate the effects of crowd density changes in each image, we also introduce an attenuation mechanism in the i $k$ NN mapping. Experimentally, we show that inverse square root $k$ NN map variation (iR $k$ NN) provides the best performance. Discussions are provided on computational complexity, label resolutions, the gains in mapping and upsampling, and details of critical cases such as various crowd counts, uneven crowd densities, and crowd occlusions.

Keywords:

References

1. C. Arteta, V. Lempitsky and A. Zisserman, Counting in the wild, in European Conf. Computer Vision (Amsterdam, The Netherlands, 2016), pp. 483–498. Crossref, Google Scholar
2. V. Badrinarayanan, A. Kendall and R. Cipolla, Segnet: A deep convolutional encoder-decoder architecture for image segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 39(12) (2017) 2481–2495. Crossref, Web of Science, Google Scholar
3. X. Cao, Z. Wang, Y. Zhao and F. Su, Scale aggregation network for accurate and efficient crowd counting, in Proc. European Conf. Computer Vision (ECCV) (Munich, Germany, 2018), pp. 734–750. Crossref, Google Scholar
4. A. B. Chan, Z.-S. J. Liang and N. Vasconcelos, Privacy preserving crowd monitoring: Counting people without people models or tracking, in Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conf. (Anchorage, Alaska, USA, 2008), pp. 1–7. Crossref, Google Scholar
5. K. Chen, S. Gong, T. Xiang and C. Change Loy, Cumulative attribute space for age and crowd density estimation, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (Portland, Oregon, USA, 2013), pp. 2467–2474. Crossref, Google Scholar
6. K. Chen, C. C. Loy, S. Gong and T. Xiang, Feature mining for localised crowd counting, in BMVC, Vol. 1 (2012), p. 3. Crossref, Google Scholar
7. K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (Las Vegas, Nevada, USA, 2016), pp. 770–778. Crossref, Google Scholar
8. M. Hossain, M. Hosseinzadeh, O. Chanda and Y. Wang, Crowd counting using scale-aware attention networks, in 2019 IEEE Winter Conf. Applications of Computer Vision (WACV) (Waikoloa Village, Hawaii, USA, 2019), pp. 1280–1288. Crossref, Google Scholar
9. G. Huang, Z. Liu, L. Van Der Maaten and K. Q. Weinberger, Densely connected convolutional networks, in CVPR, Vol. 1 (2017), p. 3. Crossref, Google Scholar
10. H. Idrees, I. Saleemi, C. Seibert and M. Shah, Multi-source multi-scale counting in extremely dense crowd images, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (Portland, Oregon, USA, 2013), pp. 2547–2554. Crossref, Google Scholar
11. H. Idrees, M. Tayyab, K. Athrey, D. Zhang, S. Al-Maadeed, N. Rajpoot and M. Shah, Composition loss for counting, density map estimation and localization in dense crowds, in Proc. European Conf. Computer Vision (ECCV) (Munich, Germany, 2018), pp. 532–546. Crossref, Google Scholar
12. X. Jiang, L. Zhang, M. Xu, T. Zhang, P. Lv, B. Zhou, X. Yang and Y. Pang, Attention scaling for crowd counting, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (Seattle, Washington, USA, 2020), pp. 4706–4715. Crossref, Google Scholar
13. I. H. Laradji, N. Rostamzadeh, P. O. Pinheiro, D. Vazquez and M. Schmidt, Where are the blobs: Counting by localization with point supervision, in Proc. European Conf. Computer Vision (ECCV) (Munich, Germany, 2018), pp. 547–562. Crossref, Google Scholar
14. V. Lempitsky and A. Zisserman, Learning to count objects in images, in Advances in Neural Information Processing Systems (2010), pp. 1324–1332. Google Scholar
15. Y. Li, X. Zhang and D. Chen, CSRNet: Dilated convolutional neural networks for understanding the highly congested scenes, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (Salt Lake City, Utah, USA, 2018), pp. 1091–1100. Crossref, Google Scholar
16. Z. Lin and L. S. Davis, Shape-based human detection and segmentation via hierarchical part-template matching, IEEE Trans. Pattern Anal. Mach. Intell. 32(4) (2010) 604–618. Crossref, Web of Science, Google Scholar
17. V. Ranjan, H. Le and M. Hoai, Iterative crowd counting, in Proc. European Conf. Computer Vision (ECCV) (Munich, Germany, 2018), pp. 270–285. Crossref, Google Scholar
18. D. B. Sam, S. Surya and R. V. Babu, Switching convolutional neural network for crowd counting, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Vol. 1 (Honolulu, Hawaii, USA, 2017), p. 6. Crossref, Google Scholar
19. Z. Shen, Y. Xu, B. Ni, M. Wang, J. Hu and X. Yang, Crowd counting via adversarial cross-scale consistency pursuit, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (Salt Lake City, Utah, USA, 2018), pp. 5245–5254. Crossref, Google Scholar
20. Z. Shi, L. Zhang, Y. Liu, X. Cao, Y. Ye, M.-M. Cheng and G. Zheng, Crowd counting with deep negative correlation learning, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (Salt Lake City, Utah, USA, 2018), pp. 5382–5390. Crossref, Google Scholar
21. V. A. Sindagi and V. M. Patel, CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting, in Advanced Video and Signal Based Surveillance (AVSS), 2017 14th IEEE Int. Conf. (Lecce, Italy, 2017), pp. 1–6. Crossref, Google Scholar
22. M. Wang and X. Wang, Automatic adaptation of a generic pedestrian detector to a specific traffic scene, in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conf. (Colorado Springs, Colorado, USA, 2011), pp. 3401–3408. Crossref, Google Scholar
23. B. Wu and R. Nevatia, Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors, in Tenth IEEE Int. Conf. Computer Vision (ICCV’05) (Beijing, China, 2005), pp. 90–97. Google Scholar
24. M. D. Zeiler, D. Krishnan, G. W. Taylor and R. Fergus, Deconvolutional networks, in 2010 IEEE Computer Society Conf. Computer Vision and Pattern Recognition (Sanfransico, California, USA, 2010), pp. 2528–2535. Crossref, Google Scholar
25. C. Zhang, H. Li, X. Wang and X. Yang, Cross-scene crowd counting via deep convolutional neural networks, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (Boston, Massachusetts, USA, 2015), pp. 833–841. Crossref, Google Scholar
26. A. Zhang, J. Shen, Z. Xiao, F. Zhu, X. Zhen, X. Cao and L. Shao, Relational attention network for crowd counting, in Proc. IEEE Int. Conf. Computer Vision (Seoul, Korea, 2019), pp. 6788–6797. Crossref, Google Scholar
27. Y. Zhang, D. Zhou, S. Chen, S. Gao and Y. Ma, Single-image crowd counting via multi-column convolutional neural network, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2016), pp. 589–597. Crossref, Google Scholar

Vol. 35, No. 16

Metrics

Downloaded 292 times

History

Received 21 June 2020

Accepted 3 February 2021

Published: 15 September 2021

Information

This is an Open Access article published by World Scientific Publishing Company. It is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 (CC BY-NC) License which permits use, distribution and reproduction in any medium, provided that the original work is properly cited and is used for non-commercial purposes.

Keywords

PDF download

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Impact of Labeling Schemes on Dense Crowd Counting Using Convolutional Neural Networks with Multiscale Upsampling

Abstract

Recommended