Please login to be able to save your searches and receive alerts for new content matching your search criteria.
High-precision traffic sign detection plays an important role in enhancing road traffic safety, ensuring traffic smoothness, supporting the development of intelligent transport systems and promoting the standardization and normalization of traffic facilities. To overcome the limitations of traditional methods, this study proposes an enhanced YOLOv5 algorithm for complex road environments. First, the K-Means clustering algorithm is used to cluster and analyze the boundary boxes of traffic signs in the training dataset, obtaining anchor box sizes that are closer to the distribution of the dataset to improve detection accuracy. Then, a genetic algorithm was used to further optimize the initial anchor box, and a global search strategy was used to find the optimal anchor box configuration, further improving detection performance. In terms of model structure, a bidirectional feature pyramid network (Bi-FPN) is introduced, which effectively utilizes multi-scale feature information and enhances the adaptability of the model to traffic signs of different sizes through top-down and bottom-up feature fusion paths, as well as cross-scale connections. In addition, a global attention mechanism (GAM) was introduced to recalibrate the feature maps through channel attention and spatial attention dimensions, improving the robustness and detection accuracy of the model for complex environments. Finally, a loss function called Focal-EIoU was used to solve the problems of class imbalance and sample imbalance, which improved the stability and performance of the object detection model. Experiments on a Chinese traffic sign dataset demonstrate significant improvements, with an increase in mAP by 10.03%, precision by 4.7%, recall by 2.6% and F1-score by 3.48%, respectively. The results prove the validity of the proposed traffic sign recognition method, especially in complex road environments. This study provides new ideas and methods for the field of traffic sign detection, which has important theoretical significance and application value.
In response to the challenges in detecting damage features in X-ray images of steel wire rope-cores in conveyor belts, such as complex damage shapes, small sizes, low detection precision, and poor generalization ability, an improved YOLOv5 algorithm was proposed. The aim of the model is to accurately and efficiently identify and locate damage in the X-ray images of steel wire rope-cores in conveyor belts. First, the Adaptive Histogram Equalization (AHE) method is used to preprocess the images, reducing the interference of harsh mining environments and improving the quality of the dataset. Second, to better retain image details and enhance the detection ability of damage features, transpose convolutional upsampling is adopted, and the C3 module in the backbone network is replaced by C2f to ensure lightweight network models, meanwhile, it obtains richer gradient flow information and optimizing the loss function. Finally, the improved algorithm is compared with four classical detection algorithms using the damage feature dataset of steel wire rope-core conveyor belts. The experimental result shows that the proposed algorithm achieves an average detection precision of 91.8% and a detection speed of 40 frames per second (FPS) for images collected in harsh mining environments. The designed detection model provides a reference for the automatic recognition and detection of damage to steel wire rope-core conveyor belts.
Object detection on unmanned aerial vehicle (UAV) images is an important branch of object detection, belonging to small object detection in a broad sense. Detecting objects in UAV images poses a greater challenge due to the predominance of small objects and dense occlusion caused by UAV capturing images from varying heights and angles. To solve the above problems, we propose Residual Spatial Reduced Transformer based on YOLOv5 (RSRT-YOLOv5). Specifically, Slice Aided Enhancement Module (SAEM) is introduced to enhance the feature quality of small objects. Secondly, a Global attention-based Bi-directional Feature Fusion (GBFF) module is proposed. In the Neck architecture, an efficient Residual Spatial Reduced Transformer (RSRT) module is integrated in order to achieve more efficient feature representation and richer global contextual associations. Finally, our method is evaluated on the Visdrone2019 dataset, and the experimental results show that RSRT-YOLOv5 outperforms the baseline model (yolov5) and successfully improves the detection performance of UAV images.
Printed circuit boards (PCBs) play a critical role in electronic products. Ensuring these products’ long-term reliability and consistent performance requires effective PCB defect detection. Although existing deep learning models for PCB defect detection are not highly accurate, they often neglect capability considerations. This paper introduces a precise, fast, and lightweight defect detection model, CCG-YOLO, based on an enhanced YOLOv5 model to address this issue. The enhancements in CCG-YOLO can be summarized as follows: (1) Improved Backbone network: The feature extraction ability of the Backbone network is enhanced by introducing a C3HB module, which fosters spatial interaction capabilities. (2) Lightweight feature fusion network: A lightweight convolution structure called Ghost-Shuffle Convolution is incorporated in the feature fusion network, remarkably reducing model parameters while maintaining performance. (3) Efficient residual networking: To enhance model performance further, a CNeB module is introduced based on the ConvNeXt network, which replaces the C3 module in the Neck. CNeB improves model detection accuracy and reduces the number of model parameters. The combination of these enhancements results in impressive performance. CCG-YOLO achieves mean average precision (mAP@0.5) of 99.5% and 88.75% in mAP@0.5:0.95 on the TDD-Net public dataset. Compared with the original YOLOv5s algorithm, CCG-YOLO offers a 4.24% improvement in mAP@0.5:0.95, a 1MB reduction in model size, a 0.472M decrease in the number of parameters, a 0.6G floating point operation reduction in computational complexity, and a 120 frames per second real-time inference speed. These experimental results underscore that the proposed model excels in accuracy and speed and has a compact size for PCB defect detection. Moreover, CCG-YOLO is easily deployable on low-end devices, making it well-suited for meeting the real-time requirements of industrial defect detection.
As a natural, intuitive and easy-to-learn mode of interaction, gesture plays an important role in communication. Hand detection, containing multimodal information, includes static and dynamic detection and involves intricate spatial relationship problems such as different hand sizes, complex joints, occlusion and self-occlusion. This study focused on a multimodal hand gesture recognition system based on YOLOv5 and MediaPipe with fused spatio-temporal features. First, the Mediapipe and OpenCV libraries were employed to implement hand keypoint detection. Subsequently, the human–computer interaction (HCI) of volume control was realized by identifying the distance between thumb and index. Finally, model training was conducted based on the YOLOv5 algorithm, and the recognition of different gesture categories was realized. The performance was evaluated and compared through YOLOv5s, YOLOv5m, and YOLOv5l. The gesture recognition system interface visualization was achieved through pyqt5. Experiments show that the average detection accuracy of the model is 99.4% and the recognition speed is around 0.2s.
Wheat spike detection is a crucial component of wheat yield prediction. In this study, n lightweight and efficient wheat spike detection model is proposed. The model employs a novel Wheat Spike Net Block (WSNB) within a lightweight network architecture, integrating Depth-Wise Convolution (DW-Conv) and Efficient Window Multi-Head Self-Attention (EW-MHSA) to rapidly process images and accurately identify wheat spikes, even under compact small target conditions. The model is equipped with four detection heads to effectively handle targets of varying scales and incorporates the innovative EMF-IOU loss function for refined bounding box estimation. Tested on a self-constructed Shangluo winter wheat dataset, the model achieves a detection speed of 96.1 FPS on NVIDIA Tesla V100 and mAP@0.5 of 95.3%, surpassing YOLOv5, EfficientV2, YOlOX,transformer, and MobileVIt3 in terms of accuracy and efficiency. The model’s performance across diverse hardware platforms highlights its potential for practical implementation in real-time wheat yield estimation and precision agriculture.
In the realm of precise management, artificial intelligence has garnered significant attention and adoption, particularly within the domain of smart agriculture. In modern animal husbandry, animal face detection is conducive to individual identification, expression detection and behavior analysis of animals, and this technological advancement holds immense importance in fostering the advancement of intelligent farming practices. In order to solve the challenge of face detection caused by similar appearance features (color, texture, etc.) and no obvious feature differences between the solid-color goats and sheep in the natural environment, this research introduces a novel approach for face detection by combining the capabilities of YOLOv5 and a convolutional block attention module (CBAM). First, datasets of goats and sheep with different angles, scales and densities were constructed. Second, the basic framework of YOLOv5 was used for object detection. To address the obstacle posed by the limited presence of distinguishing features on the faces of goats and sheep, this study aims to overcome the challenge of extracting informative facial characteristics. The CBAM block was introduced to construct the YOLOv5-CBAM model to improve the feature extraction ability. Finally, 2412 images were selected and divided into training set and verification set according to 8:1. The experimental results of this dataset show that the proposed YOLOv5-CBAM model yielded remarkable results with a precision rate of 0.970, a recall rate of 0.890, a mAP@0.5 score of 0.935, an frames per second (FPS) of 140.845, and a model size of 14.680MB. In comparison to other approaches such as Faster R-CNN, SSD, YOLOv3, and YOLOv5, the proposed model demonstrated superior performance in some aspects. In addition, it excelled in both lightweight design and overall effectiveness, and it is well-suited for real-time detection of animal faces in real-world farming settings, ensuring efficient identification and monitoring of animals within practical agricultural environments.
Apple-picking robot can promote the development of smart agriculture, and accurate object recognition in complex natural environments using deep learning algorithms is critical. However, research has shown that changes in illumination and object occlusion remain significant challenges for recognition. In order to improve the accuracy of apple apple-picking robot’s identification and positioning of apples in natural environment, a method using YOLOv5 (You Only Look Once, YOLO) combined with fast-guided filter is proposed. By introducing a fast-guided filtering module, the ability to extract image features is improved, and the problem of inaccurate occlusion targets and edge detection is solved; K-means clustering algorithm is introduced in improving YOLOv5, which can realize automatic adjustment of image size and step size; BiFPN structure is introduced in Neck network to add weighted feature fusion to highlight the detailed features. The results show that the algorithm proposed in this paper can well remove noise information such as occlusion edge blurring in apple images in a natural light environment. In the real orchard environment, the apple recognition accuracy rate reached 97.8%, the recall rate was 97.3% and the recognition rate was about 26.84fps. The results show that this research based on YOLOv5 and fast-guided filtering can realize fast and accurate identification of apple fruits in natural environment, and meet the practical application requirements of real-time target detection.
Wheat head identification and localization are crucial for field management and yield estimation. Traditional wheat head detection relies on labor-intensive manual counting. To address this issue, this study establishes a densely populated wheat head dataset and correspondingly proposes an efficient, rapid real-time model of CB-YOLO, which incorporates a Convolutional Block Attention Module (CBAM) mechanism (a sophisticated fusion of spatial attention mechanism and channel attention mechanism) so as to integrate channel and spatial features. Also, with an incorporated Bidirectional Feature Pyramid Network (BiFPN) to the neck, crowded wheat head detection was enhanced by establishing connections from the backbone to the bottom-up path. This enhancement surpasses the performance of the basic YOLOv5L model, achieving AP50 of 94.3% and AP50:95 of 53.9%. Among 10 classical detection models, CB-YOLO has a relatively small parameter size of 44Mb and high computing speed of 108G FLOPs. Additionally, ablation experiments demonstrate that compared to YOLOv5L without CBAM and BiFPN, the feature fusion module improves AP50 from 93.2% to 94.3%, indicating its effectiveness in enhancing detection performance. In conclusion, CB-YOLO exhibits outstanding identification performance, showing its feasibility in practical agricultural applications, and future work will focus on improving its localization ability.
Computer Vision (CV) has become an essential field in Artificial Intelligence applications. Object detection and recognition (ODR) is one of the fundamental tasks of computer vision implementations. However, developing an efficient ODR model is still a significant problem. The model’s execution time and speed are the most critical features during the inference or detection and recognition process, which need to be improved using the latest object detection architectures. In this paper, the handwritten detection and recognition (HDR) model is developed based on previously known algorithms with their efficiency, such as Faster R-CNN and YOLOv4 in the first hand. On the other hand, two new models capable of detecting and recognizing handwritten digits using the latest ODR algorithm are proposed, one based on the latest YOLO family architecture (YOLOv5-HDR) with high speed and accuracy and the other using the transformers architecture (DETR). To the best of our knowledge, this is the first study to achieve a details comparison between YOLOv5 and transformers-based models in handwritten digit detection. Finally, the detailed performance analysis achieved by the paper proves that the YOLOv4-based model achieved the testing inference 13% faster than Faster R-CNN. However, the proposed YOLOv5-based model outperformed the YOLOv4 and the transformers-based one as it increased the testing execution time 25% faster than the YOLOv4, three times faster than the DETR model. A further adversarial attack test has been conducted to ensure the robust performance of the proposed model. Furthermore, numerical experiment results and their analyses demonstrate the robustness and effectiveness of the proposed YOLOv5-based model being the most stable for handwritten digit detection and recognition tasks.
Automatic scoring of students’ physical experimental operations is a very practical application which has not been researched deeply. The common method for automatic scoring of students’ experimental operations is to infer the behavior of experimental operations through the state of experimental instruments. Video object detection is the basic task of detecting the state of experimental instruments, and the problem of missed detection or false detection in video multi-object detection is one of the main reasons leading to the error of automatic scoring results. However, existing methods of video object detection mainly improve the accuracy of the model in public datasets, which has the disadvantage of not correcting false detection while improving accuracy. Therefore, an efficient video object detection method composed of YOLOv5 and a logical reasoning post-processing method was proposed to fill this gap. We compared our method with other state-of-the-art methods on three independent datasets of physical experimental instruments. We established a pipeline for automatic scoring of students’ experimental operations, designed flow charts and state score tables of three physics experiments, and compared the automatic scoring results with the average scores of six experimental teachers. The results show that our method is more robust and efficient in this application scenario. We hope this report can promote the application of logical reasoning methods in video object detection.
Deep networks have achieved great success in forest fire detection by exploiting visible light images. However, visible light images are susceptible to strong light, smoke, and obstruction interference. The infrared image has high sensitivity to temperature changes of targets, which can alleviate the deficiency of visible light image. Due to the significant distribution shift between visible light and infrared images, directly using the visible light-based pre-trained network for infrared forest fire results in a significant decrease in performance. To resolve this issue, this paper proposes an infrared image forest fire detection system based on domain adaptive learning. We adopt two YOLOv5 frameworks to extract features from visible light images (source domain) and infrared images (target domain). To align the features of the two domains, we construct a novel adaptation learning mechanism based on Kullback–Leibler (KL) loss and feature maximum mean discrepancy (FMMD) loss. We conducted extensive comparative experiments on two publicly available datasets to verify the effectiveness of the proposed model. All experimental results indicate that our proposed domain adaptive learning mechanism effectively improves the performance of infrared forest fire detection.
Cracking is one of the most common bridge diseases. If bridge cracks are not repaired: in time, they may cause gradual changes to the concrete structure, which can seriously affect its strength. A network called YOLOv5-TS is what we suggest to detect intelligently bridge surface cracks in images. To improve the network performance, we integrate SPPCSPC into the original YOLOv5 network to ensure adaptive image output and obtain receptive fields of various sizes. Meanwhile, transposed convolution is incorporated to improve the capacity of the network for learning weights on its own and reduce characteristic information loss. In response to the diverse morphology of bridge cracks, cracks are identified according to their mechanical causes crack inclination, and divided into four categories: horizontal cracks (0∘–20∘), low-angle cracks (20∘–45∘), vertical cracks (70∘–90∘) and high-angle cracks (45∘–70∘). Experiments on the ZJU SYG crack data set confirm that the proposed YOLOv5-TS has a better crack intelligent identification effect on bridge surface images than other compared baselines. The best performance of YOLOv5-TS is found in mAP@0.5 (0.752), mAP@0.5:0.95 (0.518), and recall (0.794), thus demonstrating the model’s practical value.
Multiple objects tracking in a video sequence can be performed by detecting and distinguishing the objects that appear in the sequence. In the context of computer vision, the robust multi-object tracking problem is a difficult problem to solve. Visual tracking of multiple objects is a vital part of an autonomous driving vehicle’s vision technology. Wide-area video surveillance is increasingly using advanced imaging devices with increased megapixel resolution and increased frame rates. As a result, there is a huge increase in demand for high-performance computation system of video surveillance systems for real-time processing of high-resolution videos. As a result, in this paper, we used a single stage framework to solve the MOT problem. We proposed a novel architecture in this paper that allows for the efficient use of one and multiple GPUs are used to process Full High Definition video in real time. For high-resolution video and images, the suggested approach is real-time multi-object detection based on Enhanced Yolov5-7S on Multi-GPU Vertex. We added one more layer at the top in backbone to increase the resolution of feature extracted image to detect small object and increase the accuracy of model. In terms of speed and accuracy, our proposed approach outperforms the state-of-the-art techniques.
The rapid development of computer vision raises a new research area involving patient care robots. Such robotic systems require fast target recognition at long ranges, where detecting smaller objects is notoriously challenging due to the cameras’ low resolution and noise. Spurred by these concerns, this paper develops a novel object recognition algorithm that solves these problems. Specifically, we amend YOLOv5 with our proposed sparse detection algorithm aiming to improve detection efficiency by separating the most significant context features and constructing smaller and less computational expensive models. Furthermore, we extend FReLU and suggest a novel activation function to improve recognition accuracy, which presents an extended nonlinearity increasing the expressiveness of the activation function. Finally, we propose sine annealing, which affords a trajectory that tends to cross over barriers and escape from local minima during training phase, addressing the challenging small object detection problem. The experimental results highlight that our algorithm has a lower memory consumption (Mem) value than the traditional YOLOv5 with a 5% boost down. Additionally, our method runs twice as fast as the traditional YOLOv5 while preserving accuracy, achieving more than 14.5 FPS on a medium-capability CPU. Overall, the detection results prove that our method can faster and accurately classifies and localizes most small-scaled objects.
Medical X-rays are one of the primary choices for diagnosis because of their potential to disclose previously undetected pathologic changes, non-invasive qualities, radiation dosage, and cost concerns. There are several advantages to creating computer-aided detection (CAD) technologies for X-Ray analysis. With the advancement of technology, researchers have lately used the deep learning approach to obtain high accuracy outcomes in the CAD system. With CAD, computer output may be utilized as a backup option for radiologists, assisting doctors in making the best selections. Chest X-Rays (CXRs) are commonly used to diagnose heart and lung problems. Automatically recognizing these problems with high accuracy might considerably improve real-world diagnosis processes. However, the lack of standard publicly available datasets and benchmark research makes comparing and establishing the best detection algorithms challenging. In order to overcome these difficulties, we have used the VinDr-CXR dataset, which is one of the latest public datasets including 18,000 expert-annotated images labeled into 22 local position-specific abnormalities and 6 globally suspected diseases. To improve the identification of chest abnormalities, we proposed a data preparation procedure and a novel model based on YOLOv5 and ResNet50. YOLOv5 is the most recent YOLO series, and it is more adaptable than previous one-stage detection algorithms. In our paper, the role of YOLOv5 is to locate the abnormality location. On the other side, we employ ResNet for classification, avoiding gradient explosion concerns in deep learning. Then we filter the YOLOv5 and ResNet results. The YOLOv5 detection result is updated if ResNet determines that the image is not anomalous.
Nowadays, YOLOv5 is one of the most popular object detection network architectures used in real-time and industrial systems. Traffic management and regulation are typical applications. To take advantage of the YOLOv5 network and develop a parking management tool, this paper proposes a car detection network based on redesigning the YOLOv5 network architecture. This research focuses on network parameter optimization using lightweight modules from EfficientNet and PP-LCNet architectures. On the other hand, this work also presents an aerial view dataset for car detection tasks in the parking, named the AVPL. The proposed network is trained and evaluated on two benchmark datasets which are the Car Parking Lot Dataset and the Pontifical Catholic University of Parana+ Dataset and one proposed dataset. The experiments are reported on mAP@0.5 and mAP@0.5:0.95 measurement units. As a result, this network achieves the best performances at 95.8%, 97.4%, and 97.0% of mAP@0.5 on the Car Parking Lot Dataset, the Pontifical Catholic University of Parana+ Dataset, and the proposed AVPL dataset, respectively. A set of demonstration videos and the proposed dataset are available here: https://bit.ly/3YUoSwi.
As the population in India keeps growing day by day, so does the traffic. At present, the backbone of the toll booth in India is the FASTag, which requires that the vehicles come to a complete halt. This results in clogging of traffic, which wastes both time and fuel. The growth of smart cities and smart transportation systems has led to the growing need of sustainable automatic license plate recognition systems. It is crucial that a system not only accurately identify the number plates, but also perform at high speeds to not disrupt the flow of traffic. This article presents an approach through Deep Learning techniques like Object Detection and Optical Character Recognition, which are purveyors of both speed and accuracy. The proposed License Plate Identification system is an end-to-end solution that works on moving vehicles in real time. The system has been tested thoroughly and the findings indicate that the pipeline can accurately predict the number plates of vehicles traveling at speeds of 35–40km/h. Further testing will be conducted for deployment on edge devices and GPUs.