Processing math: 100%
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

SEARCH GUIDE  Download Search Tip PDF File

  Bestsellers

  • articleNo Access

    Multi-Layered Algorithm for Chinese Cultural Dissemination Power Based on Feature Selection and Weight Learning

    This paper presents an innovative approach to analyzing and enhancing the effectiveness of cultural dissemination using a multi-layered algorithm framework based on feature selection and weight learning. We first employ the Least Absolute Shrinkage and Selection Operator (Lasso) regularization technique for feature selection, identifying the most informative features crucial for predicting the power of cultural transmission. Following this, a reinforcement learning framework based on Deep Q-Networks (DQN) is established, incorporating a reward mechanism that favors feature combinations promoting cultural dissemination. Through interaction with the environment, the model learns the weights of these features, reflecting their contribution to successful cultural transmission. The identified features and learned weights are then integrated into a multi-layered algorithmic framework. Each layer of this framework represents a different aspect of cultural transmission, such as content creation, dissemination channels, and audience feedback, ensuring effective interaction between layers. Finally, the model is applied to real-world cultural dissemination cases, like popular music, movies, or literary works, to validate its effectiveness. The results demonstrate the potential of this approach in providing insightful strategies for optimizing cultural dissemination.

  • articleNo Access

    SHAP as a Data Reduction Technique for Highly Imbalanced Big Data

    Fraud detection through the classification of highly imbalanced Big Data is an exciting area of Machine Learning research. On the one hand, in certain fraud detection application domains, the use of One-Class classifiers is an overlooked opportunity. On the other hand, for researchers faced with the task of building Machine Learning models for identifying fraud, when only legitimate transaction data is available, One-Class Classifiers are indispensable. We investigate the efficacy of SHapley Additive exPlanations (SHAP) as a feature selection technique for One-Class classification tasks. In this study we utilize authentic data from the Credit Card fraud and Medicare insurance fraud application domains. Our contribution is to show that researchers can use SHAP in conjunction with One-Class Classifiers to do feature selection on highly imbalanced datasets, and then build models, with the selected features, that yield performance similar to, or better than, models built using all features. Our results in Big Medicare data fraud detection show that an over 90% data reduction through feature selection can nevertheless coincide with the best performance in terms of Area under the Precision Recall Curve.

  • articleNo Access

    Optimization-Based Feature Selection and Classification with Modified Activation-Tuned Deep BiLSTM for Attack Detection in IoT

    With the increasing demand for automated network systems in the Internet of Things (IoT), the models are becoming more complex and undergoing a tremendous change. Since the gadgets broadcast data wirelessly, they are easily targeted for attacks. Every day, thousands of attacks arise as a result of the addition of new protocols to the Internet of Things. This frequently makes the computing process more unreliable, ineffective and worse. The majority of these assaults are scaled-down versions of recognized cyberattacks from the past. This suggests that over time, even sophisticated systems like conventional systems will have trouble identifying even minute variations in attacks. However, Deep Learning (DL) has shown tremendous promise among attack detection techniques because of its early detection capability. Nevertheless, the efficacy of these DL methods is contingent upon the ability to gather vast amounts of labeled data from IoT sensors, requires more training time, and suffers from inaccuracies in detection. Hence, this research presents a modified activation function-based deep bidirectional long-short-term memory (Deep BiLSTM) model, which effectively captures the temporal dependencies and detect attacks effectively. Here, the modified activation function solves the vanishing gradient problem and high computational requirements. Specifically, the efficient features are extracted through the Ant-Chase optimization (AnChO), which assists in optimizing the BiLSTM model by tuning its parameters for attaining the best solution as well as to detect the attack in a precise manner with less computational time. Therefore, the accuracy, specificity, precision and recall of the proposed attack detection model attain the values of 96.46%, 97.40%, 97.91%, 95.05% and 97.465% correspondingly and the proposed system enhances IoT security by effectively detecting attacks.

  • articleNo Access

    Student Performance Prediction with Decision Tree Ensembles and Feature Selection Techniques

    The prevalence of student dropout in academic settings is a serious issue that affects individuals and society as a whole. Timely intervention and support can be provided to such students if we get an accurate prediction of student performance. However, class imbalance and data complexity in education data are major challenges for traditional predictive analytics. Our research focusses on utilising machine learning techniques to predict student performance while handling imbalanced datasets. To address the imbalanced class problem, we employed both oversampling and undersampling techniques in our decision tree ensemble methods for the risk classification of prospective students. The effectiveness of classifiers was evaluated by varying the sizes of the ensembles and the oversampling and undersampling ratios. Additionally, we conducted experiments to integrate the feature selection processes with the best ensemble classifiers to further enhance the prediction. Based on the extensive experimentation, we concluded that ensemble methods such as Random Forest, Bagging, and Random Undersampling Boosting perform well in terms of performance measures such as Recall, Precision, F1-score, Area Under the Receiver Operating Characteristic Curve, and Geometric Mean. The F1-score of 0.849 produced by the Random Undersampling Boost classifier in conjunction with the Least Absolute Shrinkage and Selection Operator feature selection method indicates that this ensemble produces the best results.

  • articleNo Access

    A Particle Swarm Optimizer with Biased Exploration and Exploitation for High-Dimensional Optimization and Feature Selection

    Unmanned Systems29 Nov 2024

    Autonomous intelligent systems have been widely implemented in a broad range of applications. These applications often involve data-dense tasks where an efficient feature selection process is required to eliminate redundant features and improve model performance. However, the feature selection tasks with high dimensionality still remain challenging to deal with. In order to address high-dimensional feature selection problems with greater effectiveness and efficiency, this paper proposes a particle swarm optimizer variant named PSO-BEE that allows the swarm to take full advantage of exploration and exploitation at due evolutionary stages. At the early stage, a large size of candidate exemplar group is formed for diversity enhancement, attempting to find as many new combinations of features as possible. At the later stage, a tiny size of candidate exemplar group is constructed, allowing updated particles to learn from the few best elite exemplars to continuously refine feature selection solutions with a lower classification error rate. Three PSO-BEE variants with distinct preferences for exploration and exploitation are proposed based on different parameter adjustment strategies. Experiments are executed on 500, 1000, and 2000-dimensional benchmark suites presented by CEC and real-world feature selection datasets from the Machine Learning Repository. Experimental results demonstrate the competitive performance of PSO-BEE in high-dimensional global optimization and feature selection when compared with several state-of-the-art approaches.

  • articleNo Access

    HYBRIDIZATION OF GENETIC AND QUANTUM ALGORITHM FOR GENE SELECTION AND CLASSIFICATION OF MICROARRAY DATA

    In this work, we hybridize the Genetic Quantum Algorithm with the Support Vector Machines classifier for gene selection and classification of high dimensional Microarray Data. We named our algorithm GQASVM. Its purpose is to identify a small subset of genes that could be used to separate two classes of samples with high accuracy. A comparison of the approach with different methods of literature, in particular GASVM and PSOSVM [2], was realized on six different datasets issued of microarray experiments dealing with cancer (leukemia, breast, colon, ovarian, prostate, and lung) and available on Web. The experiments clearified the very good performances of the method. The first contribution shows that the algorithm GQASVM is able to find genes of interest and improve the classification on a meaningful way. The second important contribution consists in the actual discovery of new and challenging results on datasets used.

  • articleNo Access

    A NEURAL NETWORK MODEL WITH FEATURE SELECTION FOR KOREAN SPEECH ACT CLASSIFICATION

    A speech act is a linguistic action intended by a speaker. Speech act classification is an essential part of a dialogue understanding system because the speech act of an utterance is closely tied with the user's intention in the utterance. We propose a neural network model for Korean speech act classification. In addition, we propose a method that extracts morphological features from surface utterances and selects effective ones among the morphological features. Using the feature selection method, the proposed neural network can partially increase precision and decrease training time. In the experiment, the proposed neural network showed better results than other models using comparatively high-level linguistic features. Based on the experimental result, we believe that the proposed neural network model is suitable for real field applications because it is easy to expand the neural network model into other domains. Moreover, we found that neural networks can be useful in speech act classification if we can convert surface sentences into vectors with fixed dimensions by using an effective feature selection method.

  • articleNo Access

    VISUAL APPROACH TO SUPERVISED VARIABLE SELECTION BY SELF-ORGANIZING MAP

    Practical data analysis often encounters data sets with both relevant and useless variables. Supervised variable selection is the task of selecting the relevant variables based on some predefined criterion. We propose a robust method for this task. The user manually selects a set of target variables and trains a Self-Organizing Map with these data. This sets a criterion to variable selection and is an illustrative description of the user's problem, even for multivariate target data. The user also defines another set of variables that are potentially related to the problem. Our method returns a subset of these variables, which best corresponds to the description provided by the Self-Organizing Map and, thus, agrees with the user's understanding about the problem. The method is conceptually simple and, based on experiments, allows an accessible approach to supervised variable selection.

  • articleNo Access

    FRANKSUM: NEW FEATURE SELECTION METHOD FOR PROTEIN FUNCTION PREDICTION

    In the study of in silico functional genomics, improving the performance of protein function prediction is the ultimate goal for identifying proteins associated with defined cellular functions. The classical prediction approach is to employ pairwise sequence alignments. However this method often faces difficulties when no statistically significant homologous sequences are identified. An alternative way is to predict protein function from sequence-derived features using machine learning. In this case the choice of possible features which can be derived from the sequence is of vital importance to ensure adequate discrimination to predict function. In this paper we have successfully selected biologically significant features for protein function prediction. This was performed using a new feature selection method (FrankSum) that avoids data distribution assumptions, uses a data independent measurement (p-value) within the feature, identifies redundancy between features and uses an appropiate ranking criterion for feature selection. We have shown that classifiers generated from features selected by FrankSum outperforms classifiers generated from full feature sets, randomly selected features and features selected from the Wrapper method. We have also shown the features are concordant across all species and top ranking features are biologically informative. We conclude that feature selection is vital for successful protein function prediction and FrankSum is one of the feature selection methods that can be applied successfully to such a domain.

  • articleNo Access

    DECODING GRATING ORIENTATION FROM MICROELECTRODE ARRAY RECORDINGS IN MONKEY CORTICAL AREA V4

    We propose an invasive brain-machine interface (BMI) that decodes the orientation of a visual grating from spike train recordings made with a 96 microelectrodes array chronically implanted into the prelunate gyrus (area V4) of a rhesus monkey. The orientation is decoded irrespective of the grating's spatial frequency. Since pyramidal cells are less prominent in visual areas, compared to (pre)motor areas, the recordings contain spikes with smaller amplitudes, compared to the noise level. Hence, rather than performing spike decoding, feature selection algorithms are applied to extract the required information for the decoder. Two types of feature selection procedures are compared, filter and wrapper. The wrapper is combined with a linear discriminant analysis classifier, and the filter is followed by a radial-basis function support vector machine classifier. In addition, since we have a multiclass classification problen, different methods for combining pairwise classifiers are compared.

  • articleNo Access

    COMBINATION OF HETEROGENEOUS EEG FEATURE EXTRACTION METHODS AND STACKED SEQUENTIAL LEARNING FOR SLEEP STAGE CLASSIFICATION

    This work proposes a methodology for sleep stage classification based on two main approaches: the combination of features extracted from electroencephalogram (EEG) signal by different extraction methods, and the use of stacked sequential learning to incorporate predicted information from nearby sleep stages in the final classifier. The feature extraction methods used in this work include three representative ways of extracting information from EEG signals: Hjorth features, wavelet transformation and symbolic representation. Feature selection was then used to evaluate the relevance of individual features from this set of methods. Stacked sequential learning uses a second-layer classifier to improve the classification by using previous and posterior first-layer predicted stages as additional features providing information to the model. Results show that both approaches enhance the sleep stage classification accuracy rate, thus leading to a closer approximation to the experts' opinion.

  • articleNo Access

    ASSESSMENT OF FEATURE SELECTION AND CLASSIFICATION APPROACHES TO ENHANCE INFORMATION FROM OVERNIGHT OXIMETRY IN THE CONTEXT OF APNEA DIAGNOSIS

    This study is aimed at assessing the usefulness of different feature selection and classification methodologies in the context of sleep apnea hypopnea syndrome (SAHS) detection. Feature extraction, selection and classification stages were applied to analyze blood oxygen saturation (SaO2) recordings in order to simplify polysomnography (PSG), the gold standard diagnostic methodology for SAHS. Statistical, spectral and nonlinear measures were computed to compose the initial feature set. Principal component analysis (PCA), forward stepwise feature selection (FSFS) and genetic algorithms (GAs) were applied to select feature subsets. Fisher's linear discriminant (FLD), logistic regression (LR) and support vector machines (SVMs) were applied in the classification stage. Optimum classification algorithms from each combination of these feature selection and classification approaches were prospectively validated on datasets from two independent sleep units. FSFS + LR achieved the highest diagnostic performance using a small feature subset (4 features), reaching 83.2% accuracy in the validation set and 88.7% accuracy in the test set. Similarly, GAs + SVM also achieved high generalization capability using a small number of input features (7 features), with 84.2% accuracy on the validation set and 84.5% accuracy in the test set. Our results suggest that reduced subsets of complementary features (25% to 50% of total features) and classifiers with high generalization ability could provide high-performance screening tools in the context of SAHS.

  • articleNo Access

    SENSOR (GROUP FEATURE) SELECTION WITH CONTROLLED REDUNDANCY IN A CONNECTIONIST FRAMEWORK

    For many applications, to reduce the processing time and the cost of decision making, we need to reduce the number of sensors, where each sensor produces a set of features. This sensor selection problem is a generalized feature selection problem. Here, we first present a sensor (group-feature) selection scheme based on Multi-Layered Perceptron Networks. This scheme sometimes selects redundant groups of features. So, we propose a selection scheme which can control the level of redundancy between the selected groups. The idea is general and can be used with any learning scheme. We have demonstrated the effectiveness of our scheme on several data sets. In this context, we define different measures of sensor dependency (dependency between groups of features). We have also presented an alternative learning scheme which is more effective than our old scheme. The proposed scheme is also adapted to radial basis function (RBS) network. The advantages of our scheme are threefold. It looks at all the groups together and hence can exploit nonlinear interaction between groups, if any. Our scheme can simultaneously select useful groups as well as learn the underlying system. The level of redundancy among groups can also be controlled.

  • articleNo Access

    Improving Human Activity Recognition and its Application in Early Stroke Diagnosis

    The development of efficient stroke-detection methods is of significant importance in today's society due to the effects and impact of stroke on health and economy worldwide. This study focuses on Human Activity Recognition (HAR), which is a key component in developing an early stroke-diagnosis tool. An overview of the proposed global approach able to discriminate normal resting from stroke-related paralysis is detailed. The main contributions include an extension of the Genetic Fuzzy Finite State Machine (GFFSM) method and a new hybrid feature selection (FS) algorithm involving Principal Component Analysis (PCA) and a voting scheme putting the cross-validation results together. Experimental results show that the proposed approach is a well-performing HAR tool that can be successfully embedded in devices.

  • articleNo Access

    Discrimination of Schizophrenia Auditory Hallucinators by Machine Learning of Resting-State Functional MRI

    Auditory hallucinations (AH) are a symptom that is most often associated with schizophrenia, but patients with other neuropsychiatric conditions, and even a small percentage of healthy individuals, may also experience AH. Elucidating the neural mechanisms underlying AH in schizophrenia may offer insight into the pathophysiology associated with AH more broadly across multiple neuropsychiatric disease conditions. In this paper, we address the problem of classifying schizophrenia patients with and without a history of AH, and healthy control (HC) subjects. To this end, we performed feature extraction from resting state functional magnetic resonance imaging (rsfMRI) data and applied machine learning classifiers, testing two kinds of neuroimaging features: (a) functional connectivity (FC) measures computed by lattice auto-associative memories (LAAM), and (b) local activity (LA) measures, including regional homogeneity (ReHo) and fractional amplitude of low frequency fluctuations (fALFF). We show that it is possible to perform classification within each pair of subject groups with high accuracy. Discrimination between patients with and without lifetime AH was highest, while discrimination between schizophrenia patients and HC participants was worst, suggesting that classification according to the symptom dimension of AH may be more valid than discrimination on the basis of traditional diagnostic categories. FC measures seeded in right Heschl's gyrus (RHG) consistently showed stronger discriminative power than those seeded in left Heschl's gyrus (LHG), a finding that appears to support AH models focusing on right hemisphere abnormalities. The cortical brain localizations derived from the features with strong classification performance are consistent with proposed AH models, and include left inferior frontal gyrus (IFG), parahippocampal gyri, the cingulate cortex, as well as several temporal and prefrontal cortical brain regions. Overall, the observed findings suggest that computational intelligence approaches can provide robust tools for uncovering subtleties in complex neuroimaging data, and have the potential to advance the search for more neuroscience-based criteria for classifying mental illness in psychiatry research.

  • articleNo Access

    A Pareto-based Ensemble with Feature and Instance Selection for Learning from Multi-Class Imbalanced Datasets

    Imbalanced classification is related to those problems that have an uneven distribution among classes. In addition to the former, when instances are located into the overlapped areas, the correct modeling of the problem becomes harder. Current solutions for both issues are often focused on the binary case study, as multi-class datasets require an additional effort to be addressed. In this research, we overcome these problems by carrying out a combination between feature and instance selections. Feature selection will allow simplifying the overlapping areas easing the generation of rules to distinguish among the classes. Selection of instances from all classes will address the imbalance itself by finding the most appropriate class distribution for the learning task, as well as possibly removing noise and difficult borderline examples. For the sake of obtaining an optimal joint set of features and instances, we embedded the searching for both parameters in a Multi-Objective Evolutionary Algorithm, using the C4.5 decision tree as baseline classifier in this wrapper approach. The multi-objective scheme allows taking a double advantage: the search space becomes broader, and we may provide a set of different solutions in order to build an ensemble of classifiers. This proposal has been contrasted versus several state-of-the-art solutions on imbalanced classification showing excellent results in both binary and multi-class problems.

  • articleNo Access

    Classification of Diffusion Tensor Metrics for the Diagnosis of a Myelopathic Cord Using Machine Learning

    In this study, we propose an automated framework that combines diffusion tensor imaging (DTI) metrics with machine learning algorithms to accurately classify control groups and groups with cervical spondylotic myelopathy (CSM) in the spinal cord. The comparison between selected voxel-based classification and mean value-based classification were performed. A support vector machine (SVM) classifier using a selected voxel-based dataset produced an accuracy of 95.73%, sensitivity of 93.41% and specificity of 98.64%. The efficacy of each index of diffusion for classification was also evaluated. Using the proposed approach, myelopathic areas in CSM are detected to provide an accurate reference to assist spine surgeons in surgical planning in complicated cases.

  • articleNo Access

    Multi-Objective Genetic Algorithms to Find Most Relevant Volumes of the Brain Related to Alzheimer's Disease and Mild Cognitive Impairment

    Computer-Aided Diagnosis (CAD) represents a relevant instrument to automatically classify between patients with and without Alzheimer's Disease (AD) using several actual imaging techniques. This study analyzes the optimization of volumes of interest (VOIs) to extract three-dimensional (3D) textures from Magnetic Resonance Image (MRI) in order to diagnose AD, Mild Cognitive Impairment converter (MCIc), Mild Cognitive Impairment nonconverter (MCInc) and Normal subjects. A relevant feature of the proposed approach is the use of 3D features instead of traditional two-dimensional (2D) features, by using 3D discrete wavelet transform (3D-DWT) approach for performing feature extraction from T-1 weighted MRI. Due to the high number of coefficients when applying 3D-DWT to each of the VOIs, a feature selection algorithm based on mutual information is used, as is the minimum Redundancy Maximum Relevance (mRMR) algorithm. Region optimization has been performed in order to discover the most relevant regions (VOIs) in the brain with the use of Multi-Objective Genetic Algorithms, being one of the objectives to be optimize the accuracy of the system. The error index of the system is computed by the confusion matrix obtained by the multi-class support vector machine (SVM) classifier. Principal Component Analysis (PCA) is used with the purpose of reducing the number of features to the classifier. The cohort of subjects used in the study consisted of 296 different patients. A first group of 206 patients was used to optimize VOI selection and another group of 90 independent subjects (that did not belong to the first group) was used to test the solutions yielded by the genetic algorithm. The proposed methodology obtains excellent results in multi-class classification achieving accuracies of 94.4% and also extracting significant information on the location of the most relevant points of the brain. This suggests that the proposed method could aid in the research of other neurodegenerative diseases, improving the accuracy of the diagnosis and finding the most relevant regions of the brain associated with them.

  • articleNo Access

    Revealing False Positive Features in Epileptic EEG Identification

    Feature selection plays a vital role in the detection and discrimination of epileptic seizures in electroencephalogram (EEG) signals. The state-of-the-art EEG classification techniques commonly entail the extraction of the multiple features that would be fed into classifiers. For some techniques, the feature selection strategies have been used to reduce the dimensionality of the entire feature space. However, most of these approaches focus on the performance of classifiers while neglecting the association between the feature and the EEG activity itself. To enhance the inner relationship between the feature subset and the epileptic EEG task with a promising classification accuracy, we propose a machine learning-based pipeline using a novel feature selection algorithm built upon a knockoff filter. First, a number of temporal, spectral, and spatial features are extracted from the raw EEG signals. Second, the proposed feature selection algorithm is exploited to obtain the optimal subgroup of features. Afterwards, three classifiers including k-nearest neighbor (KNN), random forest (RF) and support vector machine (SVM) are used. The experimental results on the Bonn dataset demonstrate that the proposed approach outperforms the state-of-the-art techniques, with accuracy as high as 99.93% for normal and interictal EEG discrimination and 98.95% for interictal and ictal EEG classification. Meanwhile, it has achieved satisfactory sensitivity (95.67% in average), specificity (98.83% in average), and accuracy (98.89% in average) over the Freiburg dataset.

  • articleNo Access

    Personalized EEG Feature Selection for Low-Complexity Seizure Monitoring

    Approximately, one third of patients with epilepsy are refractory to medical therapy and thus can be at high risk of injuries and sudden unexpected death. A low-complexity electroencephalography (EEG)-based seizure monitoring algorithm is critically important for daily use, especially for wearable monitoring platforms. This paper presents a personalized EEG feature selection approach, which is the key to achieve a reliable seizure monitoring with a low computational cost. We advocate a two-step, personalized feature selection strategy to enhance monitoring performances for each patient. In the first step, linear discriminant analysis (LDA) is applied to find a few seizure-indicative channels. Then in the second step, least absolute shrinkage and selection operator (LASSO) method is employed to select a discriminative subset of both frequency and time domain features (spectral powers and entropy). A personalization strategy is further customized to find the best settings (number of channels and features) that yield the highest classification scores for each subject. Experimental results of analyzing 23 subjects in CHB-MIT database are quite promising. We have achieved an average F-1 score of 88% with excellent sensitivity and specificity using not more than 7 features extracted from at most 3 channels.