Foreign exchange reserves are used by governments to balance international payments and make stable the exchange rate. Numerous works have developed models to predict foreign exchange reserves; however, the existing models have limitations and the literature demands more research on the subject given that the accuracy of the models is still poor, and they have only been used for emerging countries. This paper presents a new prediction model of foreign exchange reserves for both emerging countries and developed countries, applying a method of Bayesian model averaging-Naïve Bayes, which shows better precision results than the individual classifier. Our model has a great potential impact on the adequacy of macroeconomic policy against the risks derived from balance of payment crises providing tools that help to achieve financial stability on a global level.
Naïve Bayes is one of the most efficient and effective inductive learning algorithms for machine learning and data mining. Its competitive performance in classification is surprising, because the conditional independence assumption on which it is based is rarely true in real-world applications. An open question is: what is the true reason for the surprisingly good performance of Naïve Bayes in classification?
In this paper, we propose a novel explanation for the good classification performance of Naïve Bayes. We show that, essentially, dependence distribution plays a crucial role. Here dependence distribution means how the local dependence of an attribute distributes in each class, evenly or unevenly, and how the local dependences of all attributes work together, consistently (supporting a certain classification) or inconsistently (canceling each other out). Specifically, we show that no matter how strong the dependences among attributes are, Naïve Bayes can still be optimal if the dependences distribute evenly in classes, or if the dependences cancel each other out. We propose and prove a sufficient and necessary condition for the optimality of Naïve Bayes. Further, we investigate the optimality of Naïve Bayes under the Gaussian distribution. We present and prove a sufficient condition for the optimality of Naïve Bayes, in which the dependences among attributes exist. This provides evidence that dependences may cancel each other out.
Our theoretic analysis can be used in designing learning algorithms. In fact, a major class of learning algorithms for Bayesian networks are conditional independence-based (or CI-based), which are essentially based on dependence. We design a dependence distribution-based algorithm by extending the ChowLiu algorithm, a widely used CI based algorithm. Our experiments show that the new algorithm outperforms the ChowLiu algorithm, which also provides empirical evidence to support our new explanation.
Automatic reusability appraisal is helpful in evaluating the quality of developed or developing reusable software components and in identification of reusable components from existing legacy systems; that can save cost of developing the software from scratch. But the issue of how to identify reusable components from existing systems has remained relatively unexplored. In this paper, we mention a two-tier approach by studying the structural attributes as well as usability or relevancy of the component to a particular domain. We evaluate Probabilistic Latent Semantic Analysis (PLSA) approach, LSA's Singular Value Decomposition (SVD) technique, LSA's Semi-Discrete Matrix Decomposition (SDD) technique and Naïve Bayes approach to determine the Domain Relevancy of software components. It exploits the fact that Feature Vector codes can be seen as documents containing terms — the identifiers present in the components — and so text modeling methods that capture co-occurrence information in low-dimensional spaces can be used. In this research work, structural attributes of software components are explored using software metrics and quality of the software is inferred by Neuro-Fuzzy (NF) Inference engine, taking the metric values as input. The influence of different factors on the reusability is studied and the condition for the optimum reusability index is derived using Taguchi Analysis. The NF system is optimized by selecting initial rule-base through modified ID3 decision tree algorithm in combination with the results of Taguchi Analysis. The calculated reusability value enables to identify a good quality code automatically. It is found that the reusability value determined is close to the manual analysis used to be performed by the programmers or repository managers. So, the system developed can be used to enhance the productivity and quality of software development.
Naïve Bayes is a simple and efficient classification algorithm which performs well on text classification, which is also known as text categorization. Many researches have been done to improve the performance of the naïve Bayes classifier by weighting the correlated terms, in order to relax the strong assumption of independence between terms. In this paper, we first introduce a new χ2 statistical data, denoted by Rw,c, which can measure positive term-class dependency accurately, and then propose a new weighted naïve Bayes classifier using Rw,c at the training phase. Experimental results with real data sets show that our weighted naïve Bayes classifier has much better performance than the basic naïve Bayes classifier in most cases.
The naïve Bayes model is a simple but often satisfactory supervised classification method. The original naïve Bayes scheme, does, however, have a serious weakness, namely, the harmful effect of redundant predictors. In this paper, we study how to apply a regularization technique to learn a computationally efficient classifier that is inspired by naïve Bayes. The proposed formulation, combined with an L1-penalty, is capable of discarding harmful, redundant predictors. A modification of the LARS algorithm is devised to solve this problem. We tackle both real-valued and discrete predictors, assuring that our method is applicable to a wide range of data. In the experimental section, we empirically study the effect of redundant and irrelevant predictors. We also test the method on a high dimensional data set from the neuroscience field, where there are many more predictors than data cases. Finally, we run the method on a real data set than combines categorical with numeric predictors. Our approach is compared with several naïve Bayes variants and other classification algorithms (SVM and kNN), and is shown to be competitive.
The issue of privacy in clinical data restricts data sharing among various organizations because of legal and ethical concerns. Every medical organization (hospital, research center, testing lab, etc.) needs to protect personal and medical data privacy and confidentiality while also sharing data with efficient and accurate learning models for various diseases. This paper addresses a method that combines machine learning approaches on blockchain network for classification of patients with liver disease or not, while maintaining the privacy and enabling data sharing. The contribution of this paper is five-folds: (i) To deal with the liver disease dataset, data preprocessing phases are utilized (i.e., missing data treatment, data normalization, and data transformation). (ii) For feature reduction, two feature selection models are used: correlation feature selection (CFS) and wrapper metaheuristic (PSO+KNN). For training the proposed global model, the selected features (10, 4, 3) from the three models (all features, CFS, PSO+KNN) are distributed to the shared blockchain network. (iii) To detect liver disease in the preprocessed dataset, three machine learning methods are proposed: random forest, decision tree (J48), and Naïve Bayes. (iv) The distributed network (blockchain) will publish the results of the liver disease diagnosis on the network. (v) The smart contract shares the data and diagnosis between the medical organizations for future processing. An empirical study is executed to validate the relevance of our suggested framework for early diagnosis of liver disease using blockchain and machine learning models. The experimental results showed that feature selection using the wrapper metaheuristic (PSO+KNN) with a random forest model achieves the best performance with the minor features for detecting liver disease with an accuracy of 99.9 compared to 99.2, 89.2 for Naïve Bayes, and J48, respectively.
Social media platforms have become vast repositories of user-generated content, offering an abundant data source for sentiment analysis (SA). SA is a natural language processing (NLP) algorithm that defines the sentiment or emotional tone expressed in the given text. It includes utilizing computational techniques to automatically detect and categorize the sentiment as negative, positive, or neutral. Aspect-based SA (ABSA) systems leverage machine learning (ML) approaches to discriminate nuanced opinions within the text, which break down sentiment through particular attributes or aspects of the subject matter. Businesses and researchers can gain deep insights into brand perception, public opinion, and product feedback by integrating social media data with ABSA methodologies. This enables the extraction of sentiment polarity and more actionable and targeted insights. By applying ML approaches trained on the abundance of social media data, organizations can identify areas for improvement, tailor their strategies to meet their audience’s evolving needs and preferences and better understand customer sentiments. In this view, this study develops a new Fractal Snow Ablation Optimizer with Bayesian Machine Learning for Aspect-Level Sentiment Analysis (SAOBML-ALSA) technique on social media. The SAOBML-ALSA approach examines social media content to identify sentiments into distinct classes. In the primary stage, the SAOBML-ALSA technique preprocesses the input social media content to transform it into a meaningful format. This is followed by a LeBERT-based word embedding process. The SAOBML-ALSA technique applies a Naïve Bayes (NB) classifier for ALSA. Eventually, the parameter selection of the NB classifier will be done using the SAO technique. The performance evaluation of the SAOBML-ALSA methodology was examined under the benchmark database. The experimental results stated that the SAOBML-ALSA technique exhibits promising performance compared to other models.
Within the field of supervised classification, the naïve Bayes (NB) classifier is a very simple and fast classification method that obtains good results, being even comparable with much more complex models. It has been proved that the NB model is strongly dependent on the estimation of conditional probabilities. In the literature, it had been shown that the classical and Laplace estimations of probabilities have some drawbacks and it was proposed a NB model that takes into account the a priori probabilities in order to estimate the conditional probabilities, which was called m-probability-estimation. With a very scarce experimentation, this approximation based on m-probability-estimation demonstrated to provide better results than NB with classical and Laplace estimations of probabilities. In this research, a new naïve Bayes variation is proposed, which is based on the m-probability-estimation version and takes into account imprecise probabilities in order to calculate the a priori probabilities. An exhaustive experimental research is carried out, with a large number of data sets and different levels of class noise. From this experimentation, we can conclude that the proposed NB model and the m-probability-estimation approach provide better results than NB with classical and Laplace estimation of probabilities. It will be also shown that the proposed NB implies an improvement over the m-probability-estimation model, especially when there is some class noise.
This paper proposes a novel algorithm of segmentation of diseased part in apple leaf images. In agriculture-based image processing, leaf diseases segmentation is the main processing task for region of interest extraction. It is also extremely important to segment the plant leaf from the background in case on live images. Automated segmentation of plant leaves from the background is a common challenge in the processing of plant images. Although numerous methods have been proposed, still it is tough to segment the diseased part of the leaf from the live leaf images accurately by one particular method. In the proposed work, leaves of apple having different background have been segmented. Firstly, the leaves have been enhanced by using Brightness-Preserving Dynamic Fuzzy Histogram Equalization technique and then the extraction of diseased apple leaf part is done using a novel extraction algorithm. Real-time plant leaf database is used to validate the proposed approach. The results of the proposed novel methodology give better results when compared to existing segmentation algorithms. From the segmented apple leaves, color and texture features are extracted which are further classified as marsonina coronaria or apple scab using different machine learning classifiers. Best accuracy of 96.4% is achieved using K nearest neighbor classifier.
In recent days, the major concern for diabetic patients is foot ulcers. According to the survey, among 15 people among 100 are suffering from this foot ulcer. The wound or ulcer found which is found in diabetic patients consumes more time to heal, also required more conscious treatment. Foot ulcers may lead to deleterious danger condition and also may be the cause for loss of limb. By understanding this grim condition, this paper proposes Fractional-Order Darwinian Particle Swarm Optimization (FO-DPSO) technique for analyzing foot ulcer 2D color images. This paper deals with standard image processing, i.e. efficient segmentation using FO-DPSO algorithm and extracting textural features using Gray Level Co-occurrence Matrix (GLCM) technique. The whole effort projected results as accuracy of 91.2%, sensitivity of 100% and specificity as 96.7% for Naïve Bayes classifier and accuracy of 91.2%, sensitivity of 100% and sensitivity of 79.6% for Hoeffding tree classifier.
The electronic mail (email) is nowadays an essential communication service being widely used by most Internet users. One of the main problems affecting this service is the proliferation of unsolicited messages (usually denoted by spam) which, despite the efforts made by the research community, still remains as an inherent problem affecting this Internet service. In this perspective, this work proposes and explores the concept of a novel symbiotic feature selection approach allowing the exchange of relevant features among distinct collaborating users, in order to improve the behavior of anti-spam filters. For such purpose, several Evolutionary Algorithms (EA) are explored as optimization engines able to enhance feature selection strategies within the anti-spam area. The proposed mechanisms are tested using a realistic incremental retraining evaluation procedure and resorting to a novel corpus based on the well-known Enron datasets mixed with recent spam data. The obtained results show that the proposed symbiotic approach is competitive also having the advantage of preserving end-users privacy.
The main goal of this paper is to introduce a new procedure for a naïve Bayes classifier, namely alpha skew Gaussian naïve Bayes (ASGNB), which is based on a flexible generalization of the Gaussian distribution applied to continuous variables. As a direct advantage, this method can accommodate the possibility to handle with asymmetry in the uni or bimodal behavior. We provide the estimation procedure of this method, and we check the predictive performance when compared to other traditional classification methods using simulation studies and many real datasets with different application fields. The ASGNB is a powerful alternative to classification tasks when lie the presence of asymmetry of bimodality in the data and outperforms well when compared to other traditional classification methods in most of the cases analyzed.
Recently, the multivariate analysis methods have been widely used for predicting the human cognitive states from fMRI data. Here, we explore the possibility of predicting the human cognitive states using a pattern of brain activities associated with thinking about concrete objects. The fMRI signals in conjunction with pattern recognition methods were used for the analysis of cognitive functions associated with viewing of 60 object pictures named by the words in 12 categories. The important step in Multi Voxel Pattern Analysis (MVPA) is feature extraction and feature selection parts. In this study, the new feature selection method (accuracy method) was developed for multi-class fMRI dataset to select the informative voxels corresponding to the objects category from the whole brain voxels. Here the result of three multivariate classifiers namely, Naïve Bayes, K-nearest neighbor and support vector machine, were compared for predicting the category of presented objects from activation BOLD patterns in human whole brain. We investigated whether the multivariate classifiers are capable to find the associated regions of the brain with the visual presentation of categories of various objects. Overall Naïve Bayes classifier perfumed best and it was the best method for extracting features from the whole brain data. In addition, the results of this study indicate that thinking about different semantic categories of objects have an effect on different spatial patterns of neural activation, and so it is possible to identify the category of the objects based on the patterns of neural activation recorded during representation of object line drawing from participants with high accuracy. Finally we demonstrated that the selected brain regions that were informative for object categorization were similar across subjects and this distribution of selected voxels on the cortex may neutrally represent the various object's category properties.
Bayesian methods have been successfully used for feature selection in many supervised learning tasks. In this paper, the adaptation of such methods for unsupervised learning (clustering) is investigated. We adopt an algorithm that iterates between clustering (assuming that the number of clusters is unknown a priori) and feature selection. From this standpoint, two Bayesian approaches for feature selection are addressed: (i) Naïve Bayes Wrapper (NBW), and (ii) Markov Blanket Filter (MBF) obtained from the construction of Bayesian networks. Experiments in ten datasets illustrate the performance of each proposed method. Advantages of feature selection are demonstrated by comparing the results obtained from Bayesian feature selection with the results achieved without any kind of feature selection, i.e., using all the available features. In most of the performed experiments, NBW and MBF have allowed reducing the number of features, while providing good quality partitions in relation to those found by means of the full set of features. Also, NBW has outperformed its Bayesian feature selection counterpart (MBF) in most of the assessed datasets, mainly when the cardinality of the selected feature subset is taken into consideration.
Email has become one of the fastest and most economical forms of communication. However, the increase of email users has resulted in the dramatic increase of suspicious emails during the past few years. This paper proposes to apply classification data mining for the task of suspicious email detection based on deception theory. In this paper, email data was classified using four different classifiers (Neural Network, SVM, Naïve Bayesian and Decision Tree). The experiment was performed using weka on the basis of different data size by which the suspicious emails are detected from the email corpus. Experimental results show that simple ID3 classifier which make a binary tree, will give a promising detection rates.
How to selecting a small subset out of the thousands of genes in microarray data is important for accurate classification of phenotypes. Widely used methods typically rank genes according to their differential expressions among phenotypes and pick the top-ranked genes. We observe that feature sets so obtained have certain redundancy and study methods to minimize it. We propose a minimum redundancy — maximum relevance (MRMR) feature selection framework. Genes selected via MRMR provide a more balanced coverage of the space and capture broader characteristics of phenotypes. They lead to significantly improved class predictions in extensive experiments on 6 gene expression data sets: NCI, Lymphoma, Lung, Child Leukemia, Leukemia, and Colon. Improvements are observed consistently among 4 classification methods: Naïve Bayes, Linear discriminant analysis, Logistic regression, and Support vector machines.
Supplimentary: The top 60 MRMR genes for each of the datasets are listed in . More information related to MRMR methods can be found at .
Author identification algorithms attempt to ascribe document to author, with an eye towards diverse application areas including: forensic evidence, authenticating communications, and intelligence gathering. We view author identification as a single label classification problem, where 2000 authors would imply 2000 possible categories to assign to a post. Experiments with a naive Bayes classifier on a blog author identification task demonstrate a remarkable tendency to over-predict the most prolific authors. Literature search confirms that the class imbalance phenomenon is a challenge for author identification as well as other machine learning tasks. We develop a vector projection method to remove this hazard, and achieve a 63% improvement in accuracy over the baseline on the same task. Our method adds no additional asymptotic computational complexity to naive Bayes, and has no free parameters to set. The projection technique will likely prove useful for other natural language tasks exhibiting class imbalance.
Chronic kidney disease (CKD) is listed among the top 20 leading causes of death worldwide, highlighting the urgent need for effective methods of early disease prediction for chronic conditions. This paper presents a novel strategy for the early diagnosis of CKD, leveraging machine learning (ML) methods to support researchers in their pursuit of preventative approaches. The study incorporates data from 400 patients, analyzing a comprehensive set of 25 attributes. The missing values in the dataset handled by mean and mode statistics were employed for numerical and nominal dimensions respectively. The recursive feature elimination (RFE) with cross-validation (CV) technique was utilized to identify the most crucial attributes. The resulting predictive model achieved positive outcomes in terms of accuracy, precision, recall, and F-measure across all implemented classification algorithms (random forest (RF), Naive Bayes (NB), decision tree (DT), multi-layer neural network (MLNN), partial decision tree algorithm (PART), repeated incremental pruning to produce error reduction (RIPPER), k Star and extreme gradient boosting (XGBoost)). Notably, the XGBoost classification algorithm demonstrated an impressive accuracy of up to 99.5%. The utilization of predictive models is indispensable for healthcare professionals, as they play a pivotal role in enabling early detection of CKD and mitigating the risk of kidney failure occurrence. This research offers valuable insights into the development of reliable methods for the early identification of CKD, providing a promising avenue for enhancing patient outcomes and reducing the burden of this prevalent chronic condition.
Sentiment analysis (SA), also known as opinion mining, is a natural language processing (NLP) technique used to determine the sentiment or emotional tone behind a piece of text. It involves analyzing the text to identify whether it expresses a positive, negative, or neutral sentiment. SA can be applied to various types of text data such as social media posts, customer reviews, news articles, and more. This experiment is based on the Internet Movie Database (IMDB) dataset, which comprises movie reviews and the positive or negative labels related to them. Our research experiment’s objective is to identify the model with the best accuracy and the most generality. Text preprocessing is the first and most critical phase in an NLP system since it significantly impacts the overall accuracy of the classification algorithms. The experiment implements unsupervised sentiment classification algorithms including Valence Aware Dictionary and sentiment Reasoner (VADER) and TextBlob. We also examine the supervised sentiment classifications methods such as Naïve Bayes (Bernoulli NB and Multinomial NB). The Term Frequency-Inverse Document Frequency (TFIDF) model is used to feature selection and extractions. The combination of Multinomial NB and TFIDF achieves the highest accuracy, 87.63%, for both classification reports based on our experiment result.
This study examines the effectiveness of the Naïve Bayes Rule relative to that of five other popular algorithms in constructing scorecards that correctly discriminate between good-risk and bad-risk credit applicants. Scorecard performance is assessed on a real-world data sample by both the percentage of correctly classified cases and the more relevant criterion of bad rate among accepts. Naive Bayes is found to produce the worst-performing scorecard under both measures used.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.