Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Computer Aided Diagnosis (CAD) constitutes an important tool for the early diagnosis of Alzheimer’s Disease (AD), which, in turn, allows the application of treatments that can be simpler and more likely to be effective. This paper explores the construction of classification methods based on deep learning architectures applied on brain regions defined by the Automated Anatomical Labeling (AAL). Gray Matter (GM) images from each brain area have been split into 3D patches according to the regions defined by the AAL atlas and these patches are used to train different deep belief networks. An ensemble of deep belief networks is then composed where the final prediction is determined by a voting scheme. Two deep learning based structures and four different voting schemes are implemented and compared, giving as a result a potent classification architecture where discriminative features are computed in an unsupervised fashion. The resulting method has been evaluated using a large dataset from the Alzheimer’s disease Neuroimaging Initiative (ADNI). Classification results assessed by cross-validation prove that the proposed method is not only valid for differentiate between controls (NC) and AD images, but it also provides good performances when tested for the more challenging case of classifying Mild Cognitive Impairment (MCI) Subjects. In particular, the classification architecture provides accuracy values up to 0.90 and AUC of 0.95 for NC/AD classification, 0.84 and AUC of 0.91 for stable MCI/AD classification and 0.83 and AUC of 0.95 for NC/MCI converters classification.
A philosophically consistent axiomatic approach to classical and quantum mechanics is given. The approach realizes a strong formal implementation of Bohr's correspondence principle. In all instances, classical and quantum concepts are fully parallel: the same general theory has a classical realization and a quantum realization. Extending the ''probability via expectation'' approach of Whittle to noncommuting quantities, this paper defines quantities, ensembles, and experiments as mathematical concepts and shows how to model complementarity, uncertainty, probability, nonlocality and dynamics in these terms. The approach carries no connotation of unlimited repeatability; hence it can be applied to unique systems such as the universe. Consistent experiments provide an elegant solution to the reality problem, confirming the insistence of the orthodox Copenhagen interpretation on that there is nothing but ensembles, while avoiding its elusive reality picture. The weak law of large numbers explains the emergence of classical properties for macroscopic systems.
The reasons for superconductivity enhancement in small metallic grains including hundreds of thousand electrons are investigated by solving the generalized gap equation based on BCS mean field theory. The analysis suggests that the superconductivity enhancement in small metallic grains are the results caused by the pairing correlation and the level statistics in the Gaussian orthogonal ensemble (GOE) and the Gaussian unitary ensemble (GUE).
We analyze two theoretical approaches to ensemble averaging for integrable systems in quantum chaos, spectral averaging (SA) and parametric averaging (PA). For SA, we introduce a new procedure, namely, rescaled spectral averaging (RSA). Unlike traditional SA, it can describe the correlation function of spectral staircase (CFSS) and produce persistent oscillations of the interval level number variance (IV). PA while not as accurate as RSA for the CFSS and IV, can also produce persistent oscillations of the global level number variance (GV) and better describes saturation level rigidity as a function of the running energy. Overall, it is the most reliable method for a wide range of statistics.
Speaker recognition is the technique to identify the identity of a person from statistical features obtained from speech signals. Many speaker recognition techniques have been designed and implemented so far to efficiently recognize the speaker. From the existing review, it is found that the existing speaker recognition techniques suffer from the over-fitting issues. Therefore, to overcome the over-fitting issue in this paper, we design, a novel ensemble-based quantum neural network. It selects one base model (i.e. expert) for each query, and concentrates on inductive bias reduction. A set of quantum neural networks are trained by considering different kinds of quantum features and are afterwards used to recognize the speaker. In the end, ensembling is used to combine these classification results. Extensive experiments have been carried out by considering the proposed technique and existing competitive machine learning-based speaker recognition techniques on speaker recognition data. It is observed that the proposed technique outperforms existing speaker recognition techniques in terms of accuracy and sensitivity by 1.371% and 1.291%, respectively.
Designing an efficient fingerprint recognition technique is an ill-posed problem. Recently, many researchers have utilized machine learning techniques to improve the fingerprint recognition rate. The random forest (RF) is found to be one of the extensively utilized machine learning techniques for fingerprint recognition. Although it provides good recognition results at significant computational speed, still there is room for improvement. RF is not so-effective for high-dimensional features and also when features contain both discrete and continuous values at the same time. Therefore, in this paper, a novel similarity measure-based random forest (NRF) is proposed. The proposed technique, initially, computes both mutual information and conditional entropy. Thereafter, it uses three designed if-then rules to obtain final information measure. Additionally, to obtain feature set for fingerprint dataset, dual-tree complex wavelet transform is used to evaluate complex detail coefficients. Thereafter, ring project is considered to compute significant moments from these complex detail coefficients. Finally, information gain-based feature selection technique is used to select potential features. To prevent over-fitting, 20-fold cross validation is also used. Extensive experiments are considered to evaluate the effectiveness of the proposed technique. The comparative analyses reveal that the proposed technique outperforms the existing techniques in terms of accuracy, f-measure, sensitivity, specificity, kappa statistics and computational speed.
This paper describes the investigation results about the usage of shallow (limited by few layers only) convolutional neural networks (CNNs) to solve the video-based gender classification problem. Different architectures of shallow CNN are proposed, trained and tested using balanced and unbalanced static image datasets. The influence of diverse voting over confidences methods, applied for frame-by-frame gender classification of the video stream, is investigated for possible enhancement of the classification accuracy. The possibility of the grouping of shallow networks into ensembles is investigated; it has been shown that the accuracy may be more improved with the further voting of separate shallow CNN classification results inside an ensemble over a single frame or different ones.
Icons and screenshots are important media displayed in game distribution platforms for providing a brief understanding of the game content to the customers. In this study, we develop ensemble convolutional neural networks for icon and screenshot analysis as three applications: an automatic genre classification, a similar game searching, and a recognition quality assessment. First, the genre classifier is developed using 154358 images from 18 030 games in 17 genres. The proposed genre classifiers achieve 40.5% and 47.6% accuracies for classifying a single icon and a single screenshot, which outperform the average performance of the human testers. The accuracy can be boosted to 54.2% by aggregating results from every image of the game. The Grad-CAM is applied to analyze what models learned. Then, the feature extraction part trained by this task is transferred to the other two applications. For the similar game searching, a dissimilarity of two images is directly computed by the Euclidean distance in the feature space. We define a dissimilarity between two games which are sets of multiple images based on their image-pairwise dissimilarity. The results show that the features are successfully transferred, and the model seems to be able to cluster the games with a similar gameplay and differentiate them from the other gameplays even if they come from the same genre. For the third application, we develop a system for quality assessment of game images based on the correctness of viewers’ understanding of game content by combining multiple models from three different problem definitions. Our system can identify good-genre-representing game images which most of the human testers can recognize their genre correctly with 75.0% accuracy for icons and 76.2% accuracy for screenshots.
An intruder of a company’s network may use stolen login credentials to silently collect sensitive data. Such malicious user behavior is difficult to detect as long as it does not trigger access violation or data leak alert. In this paper, we propose to use an ensemble of three unsupervised anomaly detection algorithms, namely OCSVM, RNN and Isolation Forest, to detect abnormal user behavior patterns. Besides, an User Behavior Analytics (UBA) Platform is proposed to collect logs, extract features and conduct experiments. The experiment results indicate that our algorithm outperforms each individual algorithm with recall of 96.55% and precision of 91.24% on average, while both OCSVM and RNN suffer from anomalies in the training set, and iForest produces more false positives and false negatives in prediction.
This research presents a new learning model, the Parallel Decision DAG (PDDAG), and shows how to use it to represent an ensemble of decision trees while using significantly less storage. Ensembles such as Bagging and Boosting have a high probability of encoding redundant data structures, and PDDAGs provide a way to remove this redundancy in decision tree based ensembles. When trained by encoding an ensemble, the new model behaves similar to the original ensemble, and can be made to perform identically to it. The reduced storage requirements allow an ensemble approach to be used in cases where storage requirements would normally be exceeded, and the smaller model can potentially execute faster by reducing redundant computation.
Class-imbalance is very common in real world. However, conventional advanced methods do not work well on imbalanced data due to imbalanced class distribution. This paper proposes a simple but effective Hybrid-based Ensemble (HE) to deal with two-class imbalanced problem. HE learns a hybrid ensemble using the following two stages: (1) learning several projection matrixes from the rebalanced data obtained by under-sampling the original training set and constructing new training sets by projecting the original training set to different spaces defined by the matrixes, and (2) undersampling several subsets from each new training set and training a model on each subset. Here, feature projection aims to improve the diversity between ensemble members and under-sampling technique is to improve generalization ability of individual members on minority class. Experimental results show that, compared with other state-of-the-art methods, HE shows significantly better performance on measures of AUC, G-mean, F-measure and recall.
Currently, social media networks such as Facebook and Twitter have evolved into valuable platforms for global communication. However, due to their extensive user bases, Twitter is often misused by illegitimate users engaging in illicit activities. While there are numerous research papers available that delve into combating illegitimate users on Twitter, a common shortcoming in most of these works is the failure to address the issue of class imbalance, which significantly impacts the effectiveness of spam detection. Few other research works that have addressed class imbalance have not yet applied bio-inspired algorithms to balance the dataset. Therefore, we introduce PSOB-U, a particle swarm optimization-based undersampling technique designed to balance the Twitter dataset. In PSOB-U, various classifiers and metrics are employed to select majority samples and rank them. Furthermore, an ensemble learning approach is implemented to combine the base classifiers in three stages. During the training phase of the base classifiers, undersampling techniques and a cost-sensitive random forest (CS-RF) are utilized to address the imbalanced data at both the data and algorithmic levels. In the first stage, imbalanced datasets are balanced using random undersampling, particle swarm optimization-based undersampling, and random oversampling. In the second stage, a classifier is constructed for each of the balanced datasets obtained through these sampling techniques. In the third stage, a majority voting method is introduced to aggregate the predicted outputs from the three classifiers. The evaluation results demonstrate that our proposed method significantly enhances the detection of illegitimate users in the imbalanced Twitter dataset. Additionally, we compare our proposed work with existing models, and the predicted results highlight the superiority of our spam detection model over state-of-the-art spam detection models that address the class imbalance problem. The combination of particle swarm optimization-based undersampling and the ensemble learning approach using majority voting results in more accurate spam detection.
Symbolically representing the knowledge acquired by a neural network is a profound endeavor aimed at illuminating the latent information embedded within the network. The literature offers a multitude of algorithms dedicated to extracting symbolic classification rules from neural networks. While some excel in producing highly accurate rules, others specialize in generating rules that are easily comprehensible. Nevertheless, only a scant few algorithms manage to strike a harmonious balance between comprehensibility and accuracy. One such exemplary technique is the Rule Extraction from Neural Network Using Classified and Misclassified Data (RxNCM) algorithm, which adeptly generates straightforward and precise rules outlining input data ranges with commendable accuracy. This article endeavors to enhance the classification performance of the RxNCM algorithm by leveraging ensemble technique. Ensembles, a burgeoning field, focus on augmenting classifier performance by harnessing the strengths of individual classifiers. Extraction of rules through neural network ensembles is relatively underexplored, this paper bridges the gap by introducing the Rule extraction using Neural Network Ensembles (RENNE) algorithm. RENNE is designed to refine the classification rules derived from the RxNCM algorithm through ensemble strategy. Specifically, RENNE leverages patterns correctly predicted by an ensemble of neural networks during the rule generation process. The efficacy of the algorithm is validated using seven datasets sourced from the UCI repository. The outcomes indicate that the proposed RENNE algorithm outperforms the RxNCM algorithm in terms of performance.
In this study, AdaBoost-Bi-LSTM ensemble models are developed to predict the number of COVID-19 confirmed cases by effectively learning volatile and unstable data using a nonparametric method. The performance of the developed models in terms of prediction accuracy is compared with those of existing deep learning models such as GRU, LSTM, and Bi-LSTM. The COVID-19 outbreak in 2019 has resulted in a global pandemic with a significant number of deaths worldwide. There have long been ongoing efforts to prevent the spread of infectious diseases, and a number of prediction models have been developed for the number of confirmed cases. However, there are many variables that continuously mutate the virus and therefore affect the number of confirmed cases, which makes it difficult to accurately predict the number of COVID-19 confirmed cases. The goal of this study is to develop a model with a lower error rate and higher predictive accuracy than existing models to more effectively monitor and handle endemic diseases. To this end, this study predicts COVID-19 confirmed cases from April to October 2022 based on the analysis of COVID-19 confirmed cases data from 16 December 2020 to 27 September 2022 using the developed models. As a result, the AdaBoost-Bi-LSTM model shows the best performance, even though the data from the period of high variability in the number of confirmed cases was used for model training. The AdaBoost-Bi-LSTM model achieved improved predictive power and shows an increased performance of 17.41% over the simple GRU/LSTM model and of 15.62% over the Bi-LSTM model.
Accurate forecasting of stock market volatility is an important issue in portfolio risk management. In this paper, an ensemble system for stock market volatility is presented. It is composed of three different models that hybridize the exponential generalized autoregressive conditional heteroscedasticity (GARCH) process and the artificial neural network trained with the backpropagation algorithm (BPNN) to forecast stock market volatility under normal, t-Student, and generalized error distribution (GED) assumption separately. The goal is to design an ensemble system where each single hybrid model is capable to capture normality, excess skewness, or excess kurtosis in the data to achieve complementarity. The performance of each EGARCH-BPNN and the ensemble system is evaluated by the closeness of the volatility forecasts to realized volatility. Based on mean absolute error and mean of squared errors, the experimental results show that proposed ensemble model used to capture normality, skewness, and kurtosis in data is more accurate than the individual EGARCH-BPNN models in forecasting the S&P 500 intra-day volatility based on one and five-minute time horizons data.
Classification algorithms that help to identify software defects or faults play a crucial role in software risk management. Experimental results have shown that ensemble of classifiers are often more accurate and robust to the effects of noisy data, and achieve lower average error rate than any of the constituent classifiers. However, inconsistencies exist in different studies and the performances of learning algorithms may vary using different performance measures and under different circumstances. Therefore, more research is needed to evaluate the performance of ensemble algorithms in software defect prediction. The goal of this paper is to assess the quality of ensemble methods in software defect prediction with the analytic hierarchy process (AHP), which is a multicriteria decision-making approach that prioritizes decision alternatives based on pairwise comparisons. Through the application of the AHP, this study compares experimentally the performance of several popular ensemble methods using 13 different performance metrics over 10 public-domain software defect datasets from the NASA Metrics Data Program (MDP) repository. The results indicate that ensemble methods can improve the classification results of software defect prediction in general and AdaBoost gives the best results. In addition, tree and rule based classifiers perform better in software defect prediction than other types of classifiers included in the experiment. In terms of single classifier, K-nearest-neighbor, C4.5, and Naïve Bayes tree ranked higher than other classifiers.
One of the most popular approaches to Collaborative Filtering is based on Matrix Factorization (MF). In this paper, we focus on improving MF-based recommender's accuracy by homogeneous ensemble methods. To build such ensembles, we investigate a series of methods primarily in two aspects: (i) manipulating the training examples, including bagging, AdaBoost, and Forward Stepwise Additive Regression; (ii) injecting randomness to the base models' training settings, including randomizing the initializing parameters and randomizing the training sequences. Each method is evaluated on two large, real datasets, and then the effective methods are combined to form a cascade MF ensemble scheme. The validation results on experiment datasets demonstrate that compared to a single MF-based recommender, our ensemble scheme could obtain a significant improvement in the prediction accuracy.
The paper continues the development of the multiple 1D-embedding-based (or, 1D multi-embedding) methods for semi-supervised learning, which is preliminarily introduced by the author in [J. Wang, Semi-supervised learning using multiple one-dimensional embedding based adaptive interpolation, Int. J. Wavelets, Multiresolut. Inform Process.14(2) (2016) 11 pp.]. This paper puts the development in a more general framework and creates a new method, which employs the ensemble technique to integrate multiple 1D embedding-based regularization and label boosting for semi-supervised learning (SSL). It combines parallel ensemble and serial ensemble. In each stage of parallel ensemble, the dataset is first smoothly mapped onto multiple 1D sequences. On each 1D embedded data, a classical regularization method is applied to construct a weak classifier. All of these weak classifiers are then integrated to an ensemble of 1D labeler (E1DL), which together with a nearest neighbor cluster (NNC) algorithm extracts a newborn labeled subset from the unlabeled set. The subset is believed to be correctly labeled with a high confidence, so that it joins with the original labeled set for the next learning stage. Repeating this process, we gradually obtain a boosted labeled set and the process will not stopped until the updated labeled set reaches a certain size. Finally, we use E1DL to build the target classifier, which labels all points of the dataset. In this paper, we also set the universal parameters for all experiments to make the algorithm as a parameter-free one. The validity of our method in the classification of the handwritten digits is confirmed by several experiments. Comparing to several other popular SSL methods, our results are very promising.
The aim of direct marketing is to find the right customers who are most likely to respond to marketing campaign messages. In order to detect which customers are most valuable, response modeling is used to classify customers as respondent or non-respondent using their purchase history information or other behavioral characteristics. Data mining techniques, including effective classification methods, can be used to predict responsive customers. However, the inherent problem of imbalanced data in response modeling brings some difficulties into response prediction. As a result, the prediction models will be biased towards non-respondent customers. Another problem is that single models cannot provide the desired high accuracy due to their internal limitations. In this paper, we propose an ensemble classification method which removes imbalance in the data, using a combination of clustering and under-sampling. The predictions of multiple classifiers are combined in order to achieve better results. Using data from a bank’s marketing campaigns, this ensemble method is implemented on different classification techniques and the results are evaluated. We also evaluate the performance of this ensemble method against two alternative ensembles. The experimental results demonstrate that our proposed method can improve the performance of the response models for bank direct marketing by raising prediction accuracy and increasing response rate.
The main problem that often becomes a challenge in classification analysis is a class imbalance, for example, in bank credit collectability where performing loans (PL) are 5% and non-performing loans (NPL) are 95%. The purpose of this research is to develop a classification model for the imbalance of collectibility data on Bank X mortgage credit. The analysis developed is Ensemble on Discriminant Analysis and Logistic Regression. The ensemble used in this study is Bagging (Bootstrap Aggregating). The data used are secondary data on bank X mortgage credit collectibility with a sample of n = 100 and simulation data. Generated data with n = 1000 and consists of two scenarios, namely, data with unbalanced classes (50:950) and data with balanced classes (500:500). Evaluation of the classification model is seen from the accuracy, sensitivity, and specificity. The results of classification analysis with Ensemble using Bagging Discriminant and Logistic Regression Bagging on secondary data and simulations are better than ordinary Discriminant Analysis and Logistic Regression. The sensitivity and specificity of the credit collectibility classification using Bagging Discriminant and Logistic Regression Bagging are also higher. The originality in this study is in the form of an Ensemble model using Bagging Discriminant and Logistic Regression which can improve the performance of classification analysis and has implications for reducing risk both for banks and for Bank X KPR customers.