Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Detecting small objects is a significant challenge when evaluating images captured from UAVs. Although YOLO models have been successful in detecting conventional objects, but they still face challenges in detecting small objects in traffic monitoring scenarios. Applying the Slicing-Aided Hyper Inference (SAHI) framework to YOLO models can expand the pixel regions containing small objects, thereby improving the detection of objects with tiny and small sizes. However, this can also lead to the cropping of larger vehicles, which reduces accuracy. This paper proposes a method that combines the results of the YOLO model with an SAHI-enhanced YOLO version, after adjusting the confidence scores of bounding boxes using a Gaussian Mixture Model-based approach, to improve accuracy in detecting small vehicles without compromising the detection of larger objects. This combined approach enhances detection coverage by normalizing the model outputs based on synthesized data, including removing duplicates and score normalization, and then merging the two detection sets. We conduct a thorough evaluation of the proposed method with different versions of YOLO: YOLOv5, YOLOv6, YOLOv7 and YOLOv8. Experimental results on two aerial surveillance datasets, namely, Visdrone and DroneVehicle, demonstrate the increased accuracy of mAP across all tested models.
Road detection in aerial images is a crucial technique for visual navigation and scene understanding in relation to unmanned aerial vehicles (UAVs). A shape-aware road detection method for aerial images is proposed in this paper. It first employs the stroke width transform (SWT) and a geodesic distance based superpixel clustering to generate proposal regions. Then, a shape classification is responsible for selecting all potential road segments from the proposal regions which appear to be long and with consistent width. All road segments selected are clustered into several groups based on width and color features. A global graph based labeling model is then applied based on each group to remove potential background clutters, as well as to generate the final output. Experiments on two public datasets demonstrate that the proposed method can handle more diverse and challenging road scenes and needs less pre-training, leading to better performance compared to conventional methods.
The main challenge for object detection in aerial images is small object detection. Most existing methods use feature fusion strategies to enhance small object features in shallow layers but ignore the problem of inconsistent small object local region responses between feature layers, namely the semantic gap, which may lead to underutilization of small object information in multiple feature layers. To lift the above limitations, we propose a scale enhancement module that adaptively passes valuable small object features in different feature layers to shallow layers to alleviate the semantic gap problem. In particular, the module includes the novel fine-coarse self-attention mechanism, which captures global contextual information by performing strong interaction of pixel-level information at the local scale and weak interaction of region-level information at the global scale. In addition, the anchor assignment strategy based on the Intersection over Union (IoU) metric is not favorable for small objects as the IoU metric for small objects has a lower tolerance for position deviation compared to large ones. For this reason, we design the dynamic anchor assignment strategy with a scale-insensitive metric to assign adequate anchors to small objects. Extensive experiments on three aerial datasets demonstrate the effectiveness and adaptability of our method.
Object detection for aerial images is a crucial and challenging task in the field of computer vision. Previous CNN-based methods face problems related to extreme variation of object scales and the complex background in aerial images, which vary significantly from natural scenes. On the other hand, a great many of existing detectors highly rely on computational performance and cannot handle real-time tasks. To address this problems, we propose a lightweight real-time object detection network which is named VC-YOLO. In the backbone part, we introduce a receptive field extended backbone with limited number of convolution layers to learn the features and context information of various objects. In the detection part, channel attention module and spatial attention module are used to generate discriminative feature representation. To make full use of semantic feature maps in backbone network, we improve the feature pyramid network (FPN) with more lateral connections to reuse the features in each convolution stage. We evaluate VC-YOLO on NWPU VHR-10 and VisDrone benchmark datasets. Experimental results show that VC-YOLO achieves superior detection accuracy with high efficiency compared with the existing methods.