Residual Spatial Reduced Transformer Based on YOLOv5 for UAV Images Object Detection
Abstract
Object detection on unmanned aerial vehicle (UAV) images is an important branch of object detection, belonging to small object detection in a broad sense. Detecting objects in UAV images poses a greater challenge due to the predominance of small objects and dense occlusion caused by UAV capturing images from varying heights and angles. To solve the above problems, we propose Residual Spatial Reduced Transformer based on YOLOv5 (RSRT-YOLOv5). Specifically, Slice Aided Enhancement Module (SAEM) is introduced to enhance the feature quality of small objects. Secondly, a Global attention-based Bi-directional Feature Fusion (GBFF) module is proposed. In the Neck architecture, an efficient Residual Spatial Reduced Transformer (RSRT) module is integrated in order to achieve more efficient feature representation and richer global contextual associations. Finally, our method is evaluated on the Visdrone2019 dataset, and the experimental results show that RSRT-YOLOv5 outperforms the baseline model (yolov5) and successfully improves the detection performance of UAV images.