Pattern RecognitionNo Access

School of Information and Communication Engineering, Hainan University, No. 58 People’s Road, Haikou 570228, P. R. China

Naimeng Cang

https://orcid.org/0000-0003-2024-3897

School of Information and Communication Engineering, Hainan University, No. 58 People’s Road, Haikou 570228, P. R. China

E-mail Address: nmcang@hainanu.edu.cn

Search for more papers by this author

Wenbo Zhang

https://orcid.org/0009-0007-2549-2549

School of Information and Communication Engineering, Hainan University, No. 58 People’s Road, Haikou 570228, P. R. China

E-mail Address: zhangwenbo@hainanu.edu.cn

Search for more papers by this author

Chan Zhang

https://orcid.org/0009-0005-7932-0115

School of Information and Communication Engineering, Hainan University, No. 58 People’s Road, Haikou 570228, P. R. China

E-mail Address: czhbhgxs@163.com

Search for more papers by this author

Weidong Zhang

https://orcid.org/0000-0002-4700-1276

School of Information and Communication Engineering, Hainan University, No. 58 People’s Road, Haikou 570228, P. R. China

Department of Automation, Shanghai Jiaotong University, No. 800 Dongchuan Road, Shanghai 200240, P. R. China

E-mail Address: wdzhang@sjtu.edu.cn

Corresponding author.

Search for more papers by this author

, and

Dongsheng Guo

https://orcid.org/0000-0002-1700-1996

School of Information and Communication Engineering, Hainan University, No. 58 People’s Road, Haikou 570228, P. R. China

E-mail Address: gdongsh2022@hainanu.edu.cn

Corresponding author.

Search for more papers by this author

https://doi.org/10.1142/S0218001424500071Cited by:0 (Source: Crossref)

Abstract

Object detection on unmanned aerial vehicle (UAV) images is an important branch of object detection, belonging to small object detection in a broad sense. Detecting objects in UAV images poses a greater challenge due to the predominance of small objects and dense occlusion caused by UAV capturing images from varying heights and angles. To solve the above problems, we propose Residual Spatial Reduced Transformer based on YOLOv5 (RSRT-YOLOv5). Specifically, Slice Aided Enhancement Module (SAEM) is introduced to enhance the feature quality of small objects. Secondly, a Global attention-based Bi-directional Feature Fusion (GBFF) module is proposed. In the Neck architecture, an efficient Residual Spatial Reduced Transformer (RSRT) module is integrated in order to achieve more efficient feature representation and richer global contextual associations. Finally, our method is evaluated on the Visdrone2019 dataset, and the experimental results show that RSRT-YOLOv5 outperforms the baseline model (yolov5) and successfully improves the detection performance of UAV images.

Keywords:

References

1. F. C. Akyon, S. O. Altinuc and A. Temizel, Slicing aided hyper inference and fine-tuning for small object detection, in 2022 IEEE Int. Conf. Image Processing (ICIP) (IEEE, Bordeaux, 2022), pp. 966–970. Crossref, Google Scholar
2. V. Chalavadi, P. Jeripothula, R. Datla, S. B. Ch and M. C. Krishna, msodanet: A network for multi-scale object detection in aerial images using hierarchical dilated convolutions, Pattern Recognit. 126 (2022) 108548. Crossref, Web of Science, Google Scholar
3. C. Deng, M. Wang, L. Liu, Y. Liu and Y. Jiang, Extended feature pyramid network for small object detection, IEEE Trans. Multimedia 24 (2021) 1968–1979. Crossref, Web of Science, Google Scholar
4. A. Dosovitskiy et al., An image is worth 16x16 words: Transformers for image recognition at scale, preprint (2020), arXiv:2010.11929. Google Scholar
5. L. Haixiang and Z. Liansun, Vehicle target detection from the perspective of uav aerial photography based on rotating box, in 2022 IEEE Int. Conf. Advances in Electrical Engineering and Computer Applications (AEECA) (IEEE, Dalian, 2022), pp. 1349–1352. Crossref, Google Scholar
6. K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (IEEE, Las Vegas, 2016), pp. 770–778. Crossref, Google Scholar
7. M. Kisantal, Z. Wojna, J. Murawski, J. Naruniec and K. Cho, Augmentation for small object detection, preprint (2019), arXiv:1902.07296. Google Scholar
8. A. P. Laia, R. Purba and M. F. Pasha, Performance improvement of viola-jones using slicing aided hyper inference (sahi) for multi-face detection, in 2022 IEEE Int. Conf. Computer Science and Information Technology (ICOSNIKOM) (IEEE, Laguboti, 2022), pp. 1–6. Crossref, Google Scholar
9. H. Law and J. Deng, Cornernet: Detecting objects as paired keypoints, in European Conf. Computer Vision (Springer, Munich, 2018), pp. 734–750 Crossref, Google Scholar
10. Y. Lee, J. Kim, J. Willette and S. J. Hwang, Mpvit: Multi-path vision transformer for dense prediction, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (IEEE, New Orleans, 2022), pp. 7287–7296. Crossref, Google Scholar
11. Z. Li, C. Peng, G. Yu, X. Zhang, Y. Deng and J. Sun, Detnet: A backbone network for object detection, preprint (2018), arXiv:1804.06215. Google Scholar
12. T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan and S. Belongie, Feature pyramid networks for object detection, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (IEEE, Honolulu, 2017), pp. 2117–2125. Crossref, Google Scholar
13. G. Liu, J. Han and W. Rong, Feedback-driven loss function for small object detection, Image Vis. Comput. 111 (2021) 104197. Crossref, Web of Science, Google Scholar
14. S. Liu, L. Qi, H. Qin, J. Shi and J. Jia, Path aggregation network for instance segmentation, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (IEEE, Salt Lake City, 2018), pp. 8759–8768. Crossref, Google Scholar
15. Y. Liu, Z. Shao and N. Hoffmann, Global attention mechanism: Retain information to enhance channel-spatial interactions, preprint (2021), arXiv:2112.05561. Google Scholar
16. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin and B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, in Proc. IEEE/CVF Int. Conf. Computer Vision (IEEE, Montreal, 2021), pp. 10012–10022. Crossref, Google Scholar
17. Z. Liu, G. Gao, L. Sun and Z. Fang, Hrdnet: High-resolution detection network for small objects, in 2021 IEEE Int. Conf. Multimedia and Expo (ICME) (IEEE, Shenzhen, 2021), pp. 1–6. Crossref, Google Scholar
18. A. Onan, Biomedical text categorization based on ensemble pruning and optimized topic modelling, Comput. Math. Methods Med. 2018 (2018) 2497471. Crossref, Web of Science, Google Scholar
19. A. Onan, An ensemble scheme based on language function analysis and feature engineering for text genre classification, J. Inf. Sci. 44(1) (2018) 28–47. Crossref, Web of Science, Google Scholar
20. A. Onan, Two-stage topic extraction model for bibliometric data analysis based on word embeddings and clustering, IEEE Access 7 (2019) 145614–145633. Crossref, Web of Science, Google Scholar
21. A. Onan, Mining opinions from instructor evaluation reviews: A deep learning approach, Comput. Appl. Eng. Educ. 28(1) (2020) 117–138. Crossref, Web of Science, Google Scholar
22. A. Onan, Sentiment analysis on massive open online course evaluations: A text mining and deep learning approach, Comput. Appl. Eng. Educ. 29(3) (2021) 572–589. Crossref, Web of Science, Google Scholar
23. A. Onan, Sentiment analysis on product reviews based on weighted word embeddings and deep neural networks, Concurr. Comput.: Pract. Exp. 33(23) (2021) e5909. Crossref, Web of Science, Google Scholar
24. A. Onan, Bidirectional convolutional recurrent neural network architecture with group-wise enhancement mechanism for text sentiment classification, J. King Saud Univ. Comput. Inf. Sci. 34(5) (2022) 2098–2117. Crossref, Web of Science, Google Scholar
25. A. Onan, Gtr-ga: Harnessing the power of graph-based neural networks and genetic algorithms for text augmentation, Expert Syst. Appl. 232 (2023) 120908. Crossref, Web of Science, Google Scholar
26. A. Onan, Hierarchical graph-based text classification framework with contextual node embedding and bert-based dynamic fusion, J. King Saud Univ. Comput. Inf. Sci. 35(7) (2023) 101610. Web of Science, Google Scholar
27. A. Onan, Srl-aco: A text augmentation framework based on semantic role labeling and ant colony optimization, J. King Saud Univ. Comput. Inf. Sci. 35(7) (2023) 101611. Web of Science, Google Scholar
28. A. Onan et al., Consensus clustering-based undersampling approach to imbalanced learning, Sci. Program. 2019 (2019) 5901087. Web of Science, Google Scholar
29. A. Onan, S. Korukoğlu and H. Bulut, Ensemble of keyword extraction methods and classifiers in text classification, Expert Syst. Appl. 57 (2016) 232–247. Crossref, Web of Science, Google Scholar
30. A. Onan, S. Korukoğlu and H. Bulut, A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification, Expert Syst. Appl. 62 (2016) 1–16. Crossref, Web of Science, Google Scholar
31. A. Onan, S. Korukoğlu and H. Bulut, A hybrid ensemble pruning approach based on consensus clustering and multi-objective evolutionary algorithm for sentiment classification, Inf. Process. Manag. 53(4) (2017) 814–833. Crossref, Web of Science, Google Scholar
32. A. Onan and M. A. Toçoğlu, A term weighted neural language model and stacked bidirectional lstm based framework for sarcasm identification, IEEE Access 9 (2021) 7701–7722. Crossref, Web of Science, Google Scholar
33. H. Qiu, H. Li, Q. Wu, F. Meng, K. N. Ngan and H. Shi, A2rmnet: Adaptively aspect ratio multi-scale network for object detection in remote sensing images, Remote Sens. 11(13) (2019) 1594. Crossref, Web of Science, Google Scholar
34. P. Ramachandran, B. Zoph and Q. V. Le, Searching for activation functions, preprint (2017), arXiv:1710.05941. Google Scholar
35. R. Ranftl, A. Bochkovskiy and V. Koltun, Vision transformers for dense prediction, in Proc. IEEE/CVF Int. Conf. Computer Vision (IEEE, Montreal, 2021), pp. 12179–12188. Crossref, Google Scholar
36. J. Redmon and A. Farhadi, Yolov3: An incremental improvement, preprint (2018), arXiv:1804.02767. Google Scholar
37. L. Sun et al., Adaptive feature selection guided deep forest for COVID-19 classification with chest ct, IEEE J. Biomed. Health Inf. 24(10) (2020) 2798–2805. Crossref, Web of Science, Google Scholar
38. Q. Sun and X. Wu, A deep learning-based approach for emotional analysis of sports dance, PeerJ Comput. Sci. 9 (2023) e1441. Crossref, Web of Science, Google Scholar
39. M. Tan, R. Pang and Q. V. Le, Efficientdet: Scalable and efficient object detection, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (IEEE, Seattle, 2020), pp. 10781–10790. Crossref, Google Scholar
40. W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo and L. Shao, Pyramid vision transformer: A versatile backbone for dense prediction without convolutions, in Proc. IEEE/CVF Int. Conf. Computer Vision (IEEE, Montrea, 2021), pp. 568–578. Crossref, Google Scholar
41. S. Woo, J. Park, J.-Y. Lee and I. S. Kweon, Cbam: Convolutional block attention module, in Proc. European Conf. Computer Vision (ECCV) (Springer, Munich, 2018), pp. 3–19. Crossref, Google Scholar
42. C. Xu, J. Wang, W. Yang, H. Yu, L. Yu and G.-S. Xia, Rfla: Gaussian receptive field based label assignment for tiny object detection, in ECCV 2022: Computer Vision, Lecture Notes in Computer Science, Vol. 13669 (Springer, Cham, 2022), pp. 526–543. Crossref, Google Scholar
43. X. Yang, J. Yang, J. Yan, Y. Zhang, T. Zhang, Z. Guo, X. Sun and K. Fu, Scrdet: Towards more robust detection for small, cluttered and rotated objects, in Proc. IEEE/CVF Int. Conf. Computer Vision (IEEE, Seoul, 2019), pp. 8232–8241. Crossref, Google Scholar
44. L. Zhu, X. Wang, Z. Ke, W. Zhang and R. W. Lau, Biformer: Vision transformer with bi-level routing attention, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (IEEE, Vancouver, 2023), pp. 10323–10333. Crossref, Google Scholar
45. B. Zoph, E. D. Cubuk, G. Ghiasi, T.-Y. Lin, J. Shlens and Q. V. Le, Learning data augmentation strategies for object detection, in ECCV 2020: Computer Vision, Lecture Notes in Computer Science, Vol. 12372 (Springer, Cham, 2020), pp. 566–583. Crossref, Google Scholar