In order to optimize the detection capability of small and weak ground targets in remote sensing images, a remote sensing image ground weak target detection method based on multi-scale module improved Faster R-CNN is proposed. By adjusting the Region Proposal Network (RPN) and Region of Interest (RoI) pooling layers, the model is better suited for feature extraction and classification of small targets. Generate feature maps at multiple scales by introducing additional convolutional layers to process inputs at different resolutions, with each layer focusing on targets within a specific size range. These multi-scale features are integrated and input into RPN to provide richer target proposal characteristics. By using image transformation techniques such as scaling, cropping, and rotation, the training dataset is expanded to simulate different small target scenes, enhancing the model’s adaptability to changes in target size, direction, and environment. The experimental results show that the proposed method not only outperforms other models in accuracy, recall, and F1 score, but also maintains high consistency and stability across different datasets. Its significant advantage in mean accuracy (MAP) further proves its reliability and effectiveness in remote sensing applications.
In recent decades, time-series analysis and value forecasting have grown in research fields. Making a significant decision in the stock market prediction strategy requires knowledge that may be gained by forecasting with time-series data. Various prediction methods are developed to forecast future stock prices, but accurate prediction with the time-series data using external factors still results in a difficult task. An effective prediction approach is designed in this paper using the adopted Competitive Swarm Feedback Algorithm-based Deep Long Short-Term Memory (CSFA-based Deep LSTM) classifier for predicting the stock market data. The CSFA is the integration of the Competitive Swarm Optimizer (CSO) and Feedback Artificial Tree (FAT) algorithm. Then, technical indicators’ extraction, feature fusion and data augmentation steps are carried out. The Deep LSTM achieved higher prediction results than the other traditional classifiers. The proposed method achieved lower MAE, lower MSE, and minimum RMSE with the values of 0.1418, 0.1119, and 0.2557.
This paper introduces a novel hill-climbing technique aimed at maximizing the prediction accuracy in skin cancer classification. It highlights the efficacy of epoch–baseline termination management coupled with data augmentation, especially when dealing with the severely imbalanced HAM10000 data set. The method alters the hill-climbing function’s termination condition, enabling the system to evade local minima and find superior solutions. Termination conditions using baseline and epochs significantly enhance prediction accuracy, while data augmentation balances the data set. The method achieves a prediction accuracy of over 0.99 in classifying seven types of skin cancers using the HAM10000 data set, as validated by repetitive random cross-validation and the confusion matrix. The paper concludes that the proposed method, when combined with data augmentation, can enhance deep learning and is applicable to oncology classification. The advanced skin classification algorithm proposed herein has achieved the highest prediction accuracy in the benchmark with the HAM10000 data set, which can be used for cancer research classification in general.
Autism spectrum disorder is a neurodevelopmental disorder typically characterized by abnormalities in social interaction and stereotyped and repetitive behaviors. Diagnosis of autism is mainly based on behavioral tests and interviews. In recent years, studies involving the diagnosis of autism based on analysis of EEG signals have increased. In this paper, recorded signals from people suffering from autism and healthy individuals are divided to without overlap windows considered as images and these images are classified using a two-dimensional Deep Convolution Neural Network (2D-DCNN). Deep learning models require a lot of data to extract the appropriate features and automate data classification. But, in most neurological studies, preparing a large number of measurements is difficult (a few 1000s as compared to million natural images), due to the cost, time, and difficulty of recording these signals. Therefore, to make the appropriate number of data, in our proposed method, some of the data augmentation methods are used. These data augmentation methods are mainly introduced for image databases and should be generalized for EEG-as-an-image database. In this paper, one of the nonlinear image mixing methods is used that mixes the rows of two images. According to the fact that any row in our image is one channel of EEG signal, this method is named channel combination. The result is that in the best case, i.e., augmentation according to channel combination, the average accuracy of 88.29% is achieved in the classification of short signals of healthy people and ASD ones and 100% for ASD and epilepsy ones, using 2D-DCNN. After the decision on joined windows related to each subject, we could achieve 100% accuracy in detecting ASD subjects, according to long EEG signals.
The necessary step in the diagnosis of leukemia by the attending physician is to classify the white blood cells in the bone marrow, which requires the attending physician to have a wealth of clinical experience. Now the deep learning is very suitable for the study of image recognition classification, and the effect is not good enough to directly use some famous convolution neural network (CNN) models, such as AlexNet model, GoogleNet model, and VGGFace model. In this paper, we construct a new CNN model called WBCNet model that can fully extract features of the microscopic white blood cell image by combining batch normalization algorithm, residual convolution architecture, and improved activation function. WBCNet model has 33 layers of network architecture, whose speed has greatly been improved compared with the traditional CNN model in training period, and it can quickly identify the category of white blood cell images. The accuracy rate is 77.65% for Top-1 and 98.65% for Top-5 on the training set, while 83% for Top-1 on the test set. This study can help doctors diagnose leukemia, and reduce misdiagnosis rate.
Conventional pattern recognition methods employed for differentiating the types of insulation defects in power cables usually rely on the manual extraction of partial discharge features, which is inefficient and easily affected by subjective uncertainty. This work addresses this problem by proposing a new framework based on the automatic features extraction of partial discharge signal. The method first applies a sliding time window to convert partial discharge signals in the time domain into two-dimensional images that serve directly as the input to the convolutional neural networks (CNNs). Then a nonlinear encoder is employed to automatically extract the features of the partial discharge image data as the input of CNNs for classification. In addition, we address the overfitting problem associated with the few-shot by applying a deep convolutional generative adversarial network (DCGAN) to augment the original training dataset. Experimental results demonstrate the validity of the proposed algorithm; it increases the classification accuracy by 4.18% relative to that achieved with manually extracted features; the overall accuracy of the proposed algorithm training with the augmented dataset is 3.175% higher than that with the original experimental dataset.
Fingerprints being the most widely employed biometric trait, due to their high acceptability and low sensing cost, have replaced the traditional methods of human authentication. Although, the deployment of these biometrics-based recognition systems is accelerating, they are still susceptible to spoofing attacks where an attacker presents a fake artifact generated from silicone, candle wax, gelatin, etc. To safeguard sensor modules from these attacks, there is a requirement of an anti-deception mechanism known as fingerprint spoof detectors (FSD) also known as anti-spoofing mechanisms. A lot of research work has been carried out to design fingerprint anti-spoofing techniques in the past decades and currently, it is oriented towards deep learning (DL)-based modeling. In the field of fingerprint anti-spoofing, since the 2014, the paradigm has shifted from manually crafted features to deep features engineering. Hence, in this study, we present a detailed analysis of the recent developments in DL based FSDs. Additionally, we provide a brief comparative study of standard evaluation protocols that include benchmark anti-spoofing datasets as well as performance evaluation metrics. Although significant progress has been witnessed in the field of DL-based FSDs, still challenges are manifold. Therefore, we investigated these techniques critically to list open research issues along with their viable remedies that may put forward a future direction for the research community. The majority of the research work reveals that deep feature extraction for fingerprint liveness detection demonstrates promising performance in the case of cross-sensor scenarios. Though convolution neural network (CNN) models extract deep-level features to improve the classification accuracy, their increased complexity and training overhead is a tradeoff between both the parameters. Furthermore, enhancing the performance of presentation attack detection (PAD) techniques in the cross-material scenario is still an open challenge for researchers.
Human facial expressions are an essential and fundamental component for expressing the state of the human mind. The automatic analysis of these nonverbal facial expressions has become a fascinating and quite challenging problem in computer vision, with its application in different areas, such as psychology, human–machine interaction, health, and augmented reality. Recently, deep learning (DL) has become a widespread technique for studying human nonverbal facial sentiment expressions, and some research attempts have been made to propose a certain model on this topic. The purpose of this paper is to apply the appropriate convolutional neural network (CNN) approach by adding several layers of different dimensions, which allows the CNN approach to efficiently classify human facial sentiment expressions with data augmentation capable of recognizing seven basic human facial expressions: anger, sadness, fear, disgust, happiness, surprise, and neutral. In particular, this study mainly proposes a convolution neural network architecture, as well as learning factors that minimize the memory space and total training time of the proposed network due to the shallow architecture of the model. Following that, we demonstrated our proposed model’s network complexity, computational cost, and classification accuracy on the three benchmark datasets: FER2013, KDEF, and JAFFE. As a result, our proposed approach achieves accuracy of 67.5%, 79.5%, 90.0% in the FER2013, KDEF, and JAFFE, respectively, which is better compared to other state-of-the-art approaches.
The timely and precise discovery of traffic signs is considered an effective part of modeling automated vehicle driving. However, the dimension of traffic signs accounted for a lower ratio of input pictures which elevated the complexity of discovery. Hence, a new model is devised using faster region-based convolution neural network (faster R-CNN) traffic for detecting traffic signs. The Region of Interest (RoI) extraction and Median filter are executed for discarding the superfluous data from the dataset. The method extracted a Pyramid Histogram of Oriented Gradient (PHoG), local vector pattern (LVP), CNN and ResNet features for generating beneficial information. It is used to lessen the loss of contextual data and the data augmentation is further applied for making the training of the model more stable and time-saving. The traffic sign recognition is executed with faster R-CNN wherein the tuning of hyperparameters such as batch normalization rate, epoch and learning rate is determined by the proposed pelican cuckoo search (PCS). The method revealed improved efficacy without presenting additional computational costs in the model. Moreover, the faster R-CNN is termed the finest technique to enhance the discovery of traffic signs. The proposed PCS-based faster R-CNN outperformed with the highest precision 92.7%, specificity of 93.7% and F-measure of 93.2%.
Counting small pixel-sized vehicles and crowds in unmanned aerial vehicles (UAV) images is crucial across diverse fields, including geographic information collection, traffic monitoring, item delivery, communication network relay stations, as well as target segmentation, detection, and tracking. This task poses significant challenges due to factors such as varying view angles, non-fixed drone cameras, small object sizes, changing illumination, object occlusion, and image jitter. In this paper, we introduce a novel multi-data-augmentation and multi-deep-learning framework designed for counting small vehicles and crowds in UAV images. The framework harnesses the strengths of specific deep-learning detection models, coupled with the convolutional block attention module and data augmentation techniques. Additionally, we present a new method for detecting cars, motorcycles, and persons with small pixel sizes. Our proposed method undergoes evaluation on the test dataset v2 of the 2022 AI Cup competition, where we secured the first place on the private leaderboard by achieving the highest harmonic mean. Subsequent experimental results demonstrate that our framework outperforms the existing YOLOv7-E6E model. We also conducted comparative experiments using the publicly available VisDrone datasets, and the results show that our model outperforms the other models with the highest AP50 score of 52%.
The style of the underwater images varies according to the region of the sea. However, Tidal Stream Turbine (TST) rotor images captured in the laboratory environment cannot reflect the real underwater environment in image style, resulting in poor generalization of image signal-based fault detection algorithms. Due to the fixed capture position of the camera, the TST rotor image dataset has a high semantic similarity between images, resulting in content loss in conventional image-to-image translation networks. Meanwhile, the one-to-one translation feature in other works cannot meet our requirements. In this work, a Domain Variable Prior-based Multi-style Transfer Network (DVP-MSTN) is proposed to achieve TST rotor image style augmentation. First, the backbone network is trained using a public paired dataset to acquire prior knowledge of domain variable (Knowledge Acquiring, KA). Next, a Multi-domain Transfer Unit (MDT unit) is introduced to enable the conversion of style representations in low-dimensional space. Finally, the prior knowledge is shared to train the MDT unit by fixing the parameters of the backbone network optimized from the KA process (Knowledge Sharing, KS). In addition, an algorithm based on the dark channel of the image is proposed to improve the transfer of low-contrast features. Specifically, a discriminator is used to discriminate the image dark channel to guide the MDT unit to generate low-contrast style representation conditionally. Meanwhile, color loss is employed to preserve the color feature of the image. By controlling the weights of the style code, this method enables control over the image style transfer process, thereby expanding the variety of image styles in the dataset for the purpose of data augmentation.
Medical information system, like the Internet of Medical Things (IoMT), has gained more attention in recent decades. Disease diagnosis is an important facility of the medical healthcare system. Wearable devices become popular in a wide range of applications in the health monitoring system and this has stimulated the increasing growth of IoMT. Recently, a smart healthcare system has been more effective, and various methods have been developed to classify the disease at the beginning stage. To capture the patient’s information and detect the disease, a new framework is designed using the developed Conditional Auto regressive Mayfly Algorithm (CAMA)-based Deep Residual Network (DRN). Initially, pre-processing is done by the T2FCS filtering technique to increase the image quality by eliminating noises. The second step is segmentation. Here, the segmentation of brain tumor is done using U-Net. After that, data augmentation is performed to enhance image dimensions using the techniques, such as flipping, shearing, and translation to solve the issues of data samples. After processing the data augmentation mechanism, the next step is brain tumor detection, which is done using DRN. Here, DRN is trained by the proposed CAMA, which is the integration of conditional auto regressive value at risk (CAViaR) with the mayfly algorithm (MA). The developed model reduces computational complexity and increases effectiveness and robustness. The proposed CAMA-based DRN outperformed with an utmost testing accuracy of 0.921, sensitivity of 0.931, specificity of 0.928, distance of 52.842 and trust of 0.697.
With the development of artificial intelligence, computer vision technology has been widely used in the fields of security monitoring, automatic driving and wisdom city. However, there has not been a research on the detection of the meridians in human bodies by using the computer vision technology. In order to promote the use of the computer vision technology in human meridian detection, this paper first releases a dataset based on human meridians, which makes up for the gap in the field of human meridian detection using image processing technology. Moreover, the human meridian detection dataset is manually annotated and proofread by experienced Traditional Chinese Medicine (TCM) practitioners according to the position and direction of the human meridians, so that the annotated human meridians are as accurate as possible. The released human meridian dataset label’s 12 meridians, including spleen meridian, pericardium meridian, stomach meridian, lung meridian, heart meridian, kidney meridian, gallbladder meridian, liver meridian, triple energizer meridian, bladder meridian, large intestine meridian and small intestine meridian. A total of 296 acupoints were labeled. At last, this paper proposes a method for data augmentation, especially for datasets with a small amount of data, wherein the data amount can be augmented by enhancing the underlying edge visual features of the data. Experimental results show that human meridians can be detected by using image processing technology, and the proposed method for data augmentation can effectively improve the detection accuracy of human meridians. The dataset can be downloaded from https://www.zksylf.com/col.jsp?id=127.
To address the issue of low accuracy in speech fatigue recognition (SFR) under small samples, a method for small-sample SFR based on generative adversarial networks (GANs) is proposed. First, we enable the generator and discriminator to adversarially train and learn the features of the samples, and use the generator to generate high-quality simulated samples to expand our dataset. Then, we transfer discriminator parameters to fatigue identification network to accelerate network training speed. Furthermore, we use a bidirectional long short-term memory network (BLSTM) to further learn temporal fatigue features and improve the recognition rate of fatigue. 720 speech samples from a self-made Chinese speech database (SUSP-SFD) were chosen for training and testing. The results indicate that compared with traditional SFR methods, like convolutional neural networks (CNNs) and long short-term memory network (LSTM), our method improved the SFR rate by about 2.3–6.7%, verifying the effectiveness of the method.
The speech emotion recognition based on the deep networks on small samples is often a very challenging problem in natural language processing. The massive parameters of a deep network are much difficult to be trained reliably on small-quantity speech samples. Aiming at this problem, we propose a new method through the systematical cooperation of Generative Adversarial Network (GAN) and Long Short Term Memory (LSTM). In this method, it utilizes the adversarial training of GAN’s generator and discriminator on speech spectrogram images to implement sufficient sample augmentation. A six-layer convolution neural network (CNN), followed in series by a two-layer LSTM, is designed to extract features from speech spectrograms. For accelerating the training of networks, the parameters of discriminator are transferred to our feature extractor. By the sample augmentation, a well-trained feature extraction network and an efficient classifier could be achieved. The tests and comparisons on two publicly available datasets, i.e., EMO-DB and IEMOCAP, show that our new method is effective, and it is often superior to some state-of-the-art methods.
Being a critical factor affecting the maintainability and reusability of the software, code readability is growing crucial in modern software development, where a metric for classifying code readability levels is both applicable and desired. However, most prior research has treated code readability classification as a binary classification task due to the lack of labeled data. To support the training of multi-class code readability classification models, we propose an enhanced data augmentation approach that could be used to generate sufficient readability data and well train a multi-class code readability model. The approach includes the use of domain-specific data transformation and GAN-based data augmentation. We conduct a series of experiments to verify our augmentation approach and gain a state-of-the-art multi-class code readability classification performance with 69.5% Micro-F1, 54.0% Macro-F1 and 67.7% Macro-AUC. Compared to the results where no augmented data is used, the improvements on Micro-F1, Macro-F1 and Macro-AUC are significant with 6.9%, 11.3% and 11.2%, respectively. As an innovative work of proposing multi-class code readability classification and an enhanced code readability data augmentation approach, our method is proved to be effective.
This paper proposes a high-performance framework for accurate multi-stage object detection in low-altitude based UAV images. The proposed system employs a cascade style architecture with increasing thresholds for achieving accurate detection. The framework makes use of highly efficient Feature Pyramid Networks (FPNs) to detect objects of small sizes, and various scales which are the main challenge in low-altitude aerial images. FPNs aim to resolve scale variation problems in object detection by combining features of multiple levels. The experiments have been performed on a complex low-altitude aerial dataset VisDrone which has multiple categories of classes. The FPN-Cascade detector has been supported by slicing the data horizontally and vertically that resulted in an advancement of 8% mAP when compared with the base detector. The experiments compare the FPN-Cascade performance on the standard as well as augmented VisDrone dataset. A concrete methodology about the training process, hyperparameter tuning, and performance evaluation methods for Cascade RCNN on the VisDrone dataset is highlighted. The proposed framework achieves state of the art 30.04% mAP value on the VisDrone dataset.
The tracking of underwater targets represents a basic unit in underwater surveillance for offering effective search and rescue operations. However, the tracking of the target is a major constraint in finding the underwater target. This paper devises a model for underwater target tracking considering radar signals. Radar signals are subjected to the image reconstruction process to make them suitable for further processing. The gridding is the next step, which is utilized for generating the grids to focus on each region in a precise manner. The features like Ridgelet transform and Local Gabor XOR Pattern (LGXP) are produced. Data augmentation is the next process, which helps to reduce the size of data by performing resizing. At last, the tracking of targets is performed from each grid with a Deep neuro fuzzy network (DNFN). Here, DNFN training is performed with Exponential-Competitive Swarm Optimization (E-CSO), which is derived by integrating Exponential Weighted Moving Average (EWMA) and Competitive Swarm Optimization (CSO). The proposed E-CSO-based DNFN offered high efficiency with a maximal detection rate of 98.8%, minimal Mean square error of 0.012, and Mean Absolute Percentage Error (MAPE) of 17.883.
Data augmentation in Natural Language Processing (NLP) faces various challenges that hinder its widespread adoption, unlike its ever-present usage in the field of vision. It is even more the case for the text summarization task where one should focus on both article and summary. In this paper, we review the effect of back translation augmentation, present the diverse beam search decoding strategy, and masking as a method to generate synthetic data for text summarization. The approaches will be evaluated by ROUGE score, novelty, summary length, and GPT-4 to analyze their effectiveness. Our proposed framework presents multiple combinations of back translation and masking for articles, along with diverse augmentation for summaries. Although applicable to networks of any size, we decided to use BART-large, a relatively smaller model, in order to conduct a larger number of experiments. The experiments demonstrated superior performance across all specified metrics when compared to fine-tuning BART-large on the CNN/Dailymail dataset. Specifically, we showed a significant improvement in novelty; 158% and 56% increase rate for bigrams and unigrams, respectively. It could eliminate some copyright concerns around generating content similar to human writing. Additionally, the GPT-4 assessment indicates that models trained using the augmentation technique tend to capture important information more effectively than the baseline model.
Diabetes is a chronic disease that poses a serious threat to health, and its early risk prediction has been a hot research topic in the field of medical artificial intelligence. Routine medical checkups are the most common way to monitor people’s health status, and the data from medical checkups contain rich diagnostic information, which is valuable for diabetes risk prediction. Currently, most of the available studies on diabetes risk prediction are based on publicly available datasets, and the models and algorithms do not work well on real clinical datasets. Real routine checkup data are characterized by complex information, diverse features, high redundancy and poor balance, which pose great challenges for diabetes risk prediction. To address this problem, this paper proposes a multi-strategy data augmentation-based diabetes risk prediction method, after completing data pre-processing and feature selection, a counterfactual-based data balancing strategy is used to augment a minority class of instances, and a density clustering-based supplemental counterfactual data augmentation strategy is proposed to address the problem of insufficient representation of generated instances in the counterfactual method. Moreover, the uncertainty-weighted method is used in the model training phase. Based on the real checkup dataset, five machine learning methods including Logistic Regression (LR), SVM, Decision Tree, Random Forest and Gradient Boosting are used to model and use 5-fold cross-validation to carry out diabetes risk assessment and prediction. The experimental results showed that the sensitivity and precision of the models were significantly improved compared with the existing methods, and the sensitivity of the LR model for diabetes risk prediction on the real routine checkup dataset reached more than 90%, which meet the requirements of clinical application.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.