VC-YOLO: Towards Real-Time Object Detection in Aerial Images
Abstract
Object detection for aerial images is a crucial and challenging task in the field of computer vision. Previous CNN-based methods face problems related to extreme variation of object scales and the complex background in aerial images, which vary significantly from natural scenes. On the other hand, a great many of existing detectors highly rely on computational performance and cannot handle real-time tasks. To address this problems, we propose a lightweight real-time object detection network which is named VC-YOLO. In the backbone part, we introduce a receptive field extended backbone with limited number of convolution layers to learn the features and context information of various objects. In the detection part, channel attention module and spatial attention module are used to generate discriminative feature representation. To make full use of semantic feature maps in backbone network, we improve the feature pyramid network (FPN) with more lateral connections to reuse the features in each convolution stage. We evaluate VC-YOLO on NWPU VHR-10 and VisDrone benchmark datasets. Experimental results show that VC-YOLO achieves superior detection accuracy with high efficiency compared with the existing methods.
This paper was recommended by Regional Editor Takuro Sato.