In order to solve the low efficiency of public opinion influence analysis of social media, a new public opinion influence algorithm K-adaboost has been proposed in this paper according to adaboost and K-means algorithms. We first group the training samples and calculate the clustering center of all types of users in the group using the K-means algorithm, and then train the weak classifier of public opinion data and confirm the influence of public opinion on all types of users using the adaboost algorithm, so as to get the total influence of public opinions. Finally, we compare and analyze the performance of K-adaboost, K-means and adaboost algorithms through simulation experiments. The results show that K-adaboost has good adaptability in convergence time and accuracy.
In this paper, we propose a regularization technique for AdaBoost. The method implements a bias-variance control strategy in order to avoid overfitting in classification tasks on noisy data. The method is based on a notion of easy and hard training patterns as emerging from analysis of the dynamical evolutions of AdaBoost weights. The procedure consists in sorting the training data points by a hardness measure, and in progressively eliminating the hardest, stopping at an automatically selected threshold. Effectiveness of the method is tested and discussed on synthetic as well as real data.
Brownboost is an adaptive, continuous time boosting algorithm based on the Boost-by-Majority (BBM) algorithm. Though it has been little studied at the time of writing, it is believed that it should prove especially robust with respect to noisy data sets. This would make it a very useful boosting algorithm for real-world applications. More familiar algorithms such as Adaboost, or its successor Logitboost, are known to be especially susceptible to overfitting the training data examples. This can lead to a poor generalization error in the presence of class noise, since weak hypotheses induced at later iterations to fit the noisy examples will tend to be given undue influence in the final combined hypothesis. Brownboost allows us to specify an expected base-line error rate in advance, corresponding to our prior beliefs about the proportion of noise in the training data, and thus avoid overfitting. The original derivation of Brownboost is restricted to binary classification problems. In this paper we propose a natural multiclass extension to the basic algorithm, incorporating error-correcting output codes and a multiclass gain measure. We test two-class and multiclass versions of the algorithm on a number of real and simulated data sets with artificial class noise, and show that Brownboost consistently outperforms Adaboost in these situations.
Handwritten text recognition is one of the most difficult problems in the field of pattern recognition. In this paper, we describe our efforts towards improving the performance of state-of-the-art handwriting recognition systems through the use of classifier ensembles. There are many examples of classification problems in the literature where multiple classifier systems increase the performance over single classifiers. Normally one of the two following approaches is used to create a multiple classifier system. (1) Several classifiers are developed completely independent of each other and combined in a last step. (2) Several classifiers are created out of one prototype classifier by using so-called classifier ensemble creation methods. In this paper an algorithm which combines both approaches is introduced and it is used to increase the recognition rate of a hidden Markov model (HMM) based handwritten word recognizer.
This paper proposes a Content-Based Image Retrieval (CBIR) system applicable in mobile devices. Due to the fact that different queries to a content-based image retrieval (CBIR) system emphasize different subsets of a large collection of features, most CBIR systems using only a few features are therefore only suitable for retrieving certain types of images. In this research we combine a wide range of features, including edge information, texture energy, and the HSV color distributions, forming a feature space of up to 1053 dimensions, in which the system can search for features most desired by the user. Through a training process using the AdaBoost algorithm9 our system can efficiently search for important features in a large set of features, as indicated by the user, and effectively retrieve the images according to these features. The characteristics of the system meet the requirements of mobile devices for performing image retrieval. The experimental results show that the performance of the proposed system is sufficiently applicable for mobile devices to retrieve images from a huge database.
Novel features and weak classifiers are proposed for face detection within the AdaBoost learning framework. Features are histograms computed from a set of spatial templates in filtered images. The filter banks consist of Intensity, Laplacian of Gaussian (Difference of Gaussians), and Gabor filters, aiming to capture spatial and frequency properties of faces at different scales and orientations. Features selected by AdaBoost learning, each of which corresponds to a histogram with a pair of filter and template, can thus be interpreted as boosted marginal distributions of faces. We fit the Gaussian distribution of each histogram feature only for positives (faces) in the sample set as the weak classifier. The results of the experiment demonstrate that classifiers with corresponding features are more powerful in describing the face pattern than haar-like rectangle features introduced by Viola and Jones.
This paper presents an efficient IrisCode classifier, built from phase features which uses AdaBoost for the selection of Gabor wavelets bandwidths. The final iris classifier consists of a weighted contribution of weak classifiers. As weak classifiers we use three-split decision trees that identify a candidate based on the Levenshtein distance between phase vectors of the respective iris images. Our experiments show that the Levenshtein distance has better discrimination in comparing IrisCodes than the Hamming distance. Our process also differs from existing methods because the wavelengths of the Gabor filters used, and their final weights in the decision function, are chosen from the robust final classifier, instead of being fixed and/or limited by the programmer, thus yielding higher iris recognition rates. A pyramidal strategy for cascading filters with increasing complexity makes the system suitable for real-time operation. We have designed a processor array to accelerate the computation of the Levenshtein distance. The processing elements are simple basic cells, interconnected by relatively short paths, which makes it suitable for a VLSI implementation.
A new approach for ensemble construction based on restricting a set of weights of examples in training data to avoid overfitting is proposed in the paper. The algorithm called EPIBoost (Extreme Points Imprecise Boost) applies imprecise statistical models to restrict the set of weights. The updating of the weights within the restricted set is carried out by using its extreme points. The approach allows us to construct various algorithms by applying different imprecise statistical models for producing the restricted set. It is shown by various numerical experiments with real data sets that the EPIBoost algorithm may outperform the standard AdaBoost for some parameters of imprecise statistical models.
In a learning process, features play a fundamental role. In this paper, we propose a Boosting-based feature selection algorithm called BoostFS. It extends AdaBoost which is designed for classification problems to feature selection. BoostFS maintains a distribution over training samples which is initialized from the uniform distribution. In each iteration, a decision stump is trained under the sample distribution and then the sample distribution is adjusted so that it is orthogonal to the classification results of all the generated stumps. Because a decision stump can also be regarded as one selected feature, BoostFS is capable to select a subset of features that are irrelevant to each other as much as possible. Experimental results on synthetic datasets, five UCI datasets and a real malware detection dataset all show that the features selected by BoostFS help to improve learning algorithms in classification problems, especially when the original feature set contains redundant features.
In this paper, we investigate the notion of incorporating feature computation time (CT) measures during feature selection in a boosted cascade people detector utilizing heterogeneous pool of features. We present various approaches based on pareto-front analysis, CT weighted adaboost, and Binary Integer Programming (BIP) with comparative evaluations. The novel feature selection method proposed based on BIP — the main contribution — mines heterogeneous features taking both detection performance and CT explicitly into consideration. The results demonstrate that the detector using this feature selection scheme exhibits low miss rates (MRs) with significant boost in frame rate. For example, it achieves a 2.6% less MR at 10−4 FPPW compared to Dalal and Triggs HOG detector with a 9.22x speed improvement. The presented extensive experimental results clearly highlight the improvements the proposed framework brings to the table.
Computing performance is one of the key problems in embedded systems for high-resolution face detection applications. To improve the computing performance of embedded high-resolution face detection systems, a novel parallel implementation of embedded face detection system was established based on a low power CPU-Accelerator heterogeneous many-core architecture. First, a basic CPU version of face detection prototype was implemented based on the cascade classifier and Local Binary Patterns operator. Second, the prototype was extended to a specified embedded parallel computing platform that is called Parallella and consists of Xilinx Zynq and Adapteva Epiphany. Third, the face detection algorithm was optimized to adapt to the Parallella architecture to improve the detection speed and the utilization of computing resources. Finally, a face detection experiment was conducted to evaluate the computing performance of the proposal in this paper. The experimental results show that the proposed implementation obtained a very consistent accuracy as that of the dual-core ARM, and achieved 7.8 times speedup than that of the dual-core ARM. Experiment results prove that the proposed implementation has significant advantages on computing performance.
DNA-binding proteins are an essential part of the DNA. It also an integral component during life processes of various organisms, for instance, DNA recombination, replication, and so on. Recognition of such proteins helps medical researchers pinpoint the cause of disease. Traditional techniques of identifying DNA-binding proteins are expensive and time-consuming. Machine learning methods can identify these proteins quickly and efficiently. However, the accuracies of the existing related methods were not high enough. In this paper, we propose a framework to identify DNA-binding proteins. The proposed framework first uses PseKNC (ps), MomoKGap (mo), and MomoDiKGap (md) methods to combine three algorithms to extract features. Further, we apply Adaboost weight ranking to select optimal feature subsets from the above three types of features. Based on the selected features, three algorithms (k-nearest neighbor (knn), Support Vector Machine (SVM), and Random Forest (RF)) are applied to classify it. Finally, three predictors for identifying DNA-binding proteins are established, including ps+mm, ps+md, ps+mm+md. We utilize benchmark and independent datasets to train and evaluate the proposed framework. Three tests are performed, including Jackknife test, 10-fold cross-validation and independent test. Among them, the accuracy of ps+md is the highest. We named the model with the best result as psmdDBPs and applied it to identify DNA-binding proteins.
Background: The concept of tuberculosis diagnosis plays a significant role in the current world since, in accordance with the Global Tuberculosis (TB) Report in 2019, more than one million cases are reported per year in India. Various tests are available even then the chest X-ray is the most significant one, devoid of which the diagnosis will be incomplete. By the usage of computationally designed algorithms, various clinical, as well as diagnostic functions, were built in ancient poster anterior chest radiographs. The Digital image (X-ray) may be an essential medium for examining and annotating patient’s demographics coverage in the screening of TB via chest radiography. Results: Even though several medicines are available to cure TB, diagnosis with accuracy is a major challenge. So, we have introduced a fastened technique with the merged combination of Adaptive Boosting (AdaBoost) and learning vector quantization (LVQ) for determining TB in an easier way with the input chest X-ray image of a person with the aid of computer-aided diagnosis with greatest accuracy, precision, recall and F1 values. This finest technique got an accuracy of 94.73% when compared to the prior conventional methods used such as SVM and Convolutional Neural Network. Conclusions: Tuberculosis detection can be done in a meaningful way with the aid of MATLAB simulation using Computer Aided Diagnosis. The algorithms Adaboost and LVQ works best with the datasets for around 400 chest X-ray images for detecting the normal and abnormal images conditions for the detection of the disease for a patient suspected to have TB, in a fraction of seconds.
This article introduces a novel ensemble method named eAdaBoost (Effective Adaptive Boosting) is a meta classifier which is developed by enhancing the existing AdaBoost algorithm and to handle the time complexity and also to produce the best classification accuracy. The eAdaBoost reduces the error rate when compared with the existing methods and generates the best accuracy by reweighing each feature for further process. The comparison results of an extensive experimental evaluation of the proposed method are explained using the UCI machine learning repository datasets. The accuracy of the classifiers and statistical test comparisons are made with various boosting algorithms. The proposed eAdaBoost has been also implemented with different decision tree classifiers like C4.5, Decision Stump, NB Tree and Random Forest. The algorithm has been computed with various dataset, with different weight thresholds and the performance is analyzed. The proposed method produces better results using random forest and NB tree as base classifier than the decision stump and C4.5 classifiers for few datasets. The eAdaBoost gives better classification accuracy, and prediction accuracy, and execution time is also less when compared with other classifiers.
In this study, AdaBoost-Bi-LSTM ensemble models are developed to predict the number of COVID-19 confirmed cases by effectively learning volatile and unstable data using a nonparametric method. The performance of the developed models in terms of prediction accuracy is compared with those of existing deep learning models such as GRU, LSTM, and Bi-LSTM. The COVID-19 outbreak in 2019 has resulted in a global pandemic with a significant number of deaths worldwide. There have long been ongoing efforts to prevent the spread of infectious diseases, and a number of prediction models have been developed for the number of confirmed cases. However, there are many variables that continuously mutate the virus and therefore affect the number of confirmed cases, which makes it difficult to accurately predict the number of COVID-19 confirmed cases. The goal of this study is to develop a model with a lower error rate and higher predictive accuracy than existing models to more effectively monitor and handle endemic diseases. To this end, this study predicts COVID-19 confirmed cases from April to October 2022 based on the analysis of COVID-19 confirmed cases data from 16 December 2020 to 27 September 2022 using the developed models. As a result, the AdaBoost-Bi-LSTM model shows the best performance, even though the data from the period of high variability in the number of confirmed cases was used for model training. The AdaBoost-Bi-LSTM model achieved improved predictive power and shows an increased performance of 17.41% over the simple GRU/LSTM model and of 15.62% over the Bi-LSTM model.
In metal-cutting operations, the surface roughness of the end product plays a significant role. It not only affects the aesthetic appearance of the end product but also signifies the product’s performance in the long run. Products with a high surface finish have higher endurance limits with negligible local stresses. On the other hand, products with rough surfaces are subjected to high stresses when they are engaged in various mechanical operations with varying loads. Surface roughness depends on various machining factors such as feed rate, depth of cut, cutting speed, or spindle speed. Therefore, it is required to predict surface roughness for the given machining parameters to reduce the cost and increase the life of the end product. In this work, an attempt has been made to evaluate the surface roughness of AZ91 alloy during the end milling operation. In this regard, various state-of-the-art ensemble learning models have been adopted and compared with the proposed hybrid ensemble model. The proposed hybrid ensemble model is the integration of random forest, gradient boosting, and a deep multi-layered neural network. In order to evaluate the performance of the proposed model, comparative analyses have been made in terms of mean square error, mean average error, and R2 score. The result shows that the proposed hybrid model gives minimum error for surface roughness.
This paper investigates adult images detection based on the shape features of skin regions. In order to accurately detect skin regions, we propose a skin detection method using multi-Bayes classifiers in the paper. Based on skin color detection results, shape features are extracted and fed into a boosted classifier to decide whether or not the skin regions represent a nude. We evaluate adult image detection performance using different boosted classifiers and different shape descriptors. Experimental results show that classification using boosted C4.5 classifier and combination of different shape descriptors outperforms other classification schemes.
A new adaptive weighted deep forest algorithm which can be viewed as a modification of the confidence screening mechanism is proposed. The main idea underlying the algorithm is based on adaptive weigting of every training instance at each cascade level of the deep forest. The confidence screening mechanism for the deep forest proposed by Pang et al., strictly removes instances from training and testing processes to simplify the whole algorithm in accordance with the obtained random forest class probability distributions. This strict removal may lead to a very small number of training instances at the next levels of the deep forest cascade. The presented modification is more flexible and assigns weights to instances in order to differentiate their use in building decision trees at every level of the deep forest cascade. It overcomes the main disadvantage of the confidence screening mechanism. The proposed modification is similar to the AdaBoost algorithm to some extent. Numerical experiments illustrate the outperformance of the proposed modification in comparison with the original deep forest. It is also illustrated how the proposed algorithm can be extended for solving the transfer learning and distance metric learning problems.
Recommender systems are often employed in different fields such as music, travel, and movies. The recommender systems are broadly utilised nowadays due to the emergence of social activities, in which particular recommendations are offered by group recommender systems. It is a system for recommending the items to a set of users together based on their preferences. The user preferences are used from the behavioural and social aspects of group members to enhance the quality of products recommended in various groups for generating the group recommendations. These group recommender systems solve the cold start problem, which is raised in an individual recommendation system. The ultimate aim of this paper is to design and develop a new Improved Deep Ensemble Learning Model (ID-ELM) for the group recommender systems concerning different application-oriented datasets. Initially, the datasets from different applications like healthcare, e-commerce, and e-learning are gathered from benchmark sources and split the data into various groups. These data are given to the pre-processing for making it fit for further processing. The pre-processing steps like stop word removal, stemming, and punctuation removal are performed here. Then the features are extracted using the Continuous Bag of Words Model (CBOW), and Principal Component Analysis (PCA) is used for dimension reduction. These features are fed to the ID-ELM, in which the optimised Convolutional Neural Network (CNN) extracts the significant features from the pooling layer, and the fully connected layer is replaced by a set of classifiers termed Neural Networks (NN), AdaBoost, and Logistic Regression (LR). Finally, the ranking of the ensemble learning model based on the group reviews extends the recommendation outcome. The optimised CNN is proposed by the Adaptive Seeking Range-based Cat Swarm Optimisation (ASR-CSO) for attaining better results. This model is validated on the benchmark datasets to show the efficiency of the designed model with different meta-heuristic-based algorithms and classification algorithms.
Eye location is an important step in automatic visual interpretation and face recognition. In this paper, we present a novel eye location algorithm based on Dempster–Shafer's evidential reasoning. Four eye detectors are trained by AdaBoost with different combinations of feature spaces and samples. They detect face region respectively and produce an eye candidate set, then two of the four eye detectors calculate the confidence of every eye candidate. The confidence is converted to belief and plausibility. We use the combining rule to obtain combined belief and plausibility of each eye candidate, which represent the fusion information of the two eye detectors. The centers of eye candidates with great plausibility are considered as actual eye centers. Experimental results on several open face databases demonstrate that our method is precise, robust and has less computational complexity than other newly proposed ones.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.