Pattern RecognitionNo Access

R-GEFS: Condorcet Rank Aggregation with Graph Theoretic Ensemble Feature Selection Algorithm for Classification

Department of Computer Application, North-Eastern Hill University, Tura Campus, Meghalaya 794002, India

https://doi.org/10.1142/S021800142250032XCited by:7 (Source: Crossref)

Abstract

In the last few years, ensemble learning has received more interest primarily for the task of classification. It is based on the postulation that combining the output of multiple experts is better than the output of any individual expert. Ensemble feature selection may improve the performance of the learning algorithms and has the ability to obtain more stable and robust results. However, during the process of feature aggregation and selection, selected feature subset may contain high levels of inter-feature redundancy. To address this issue, a novel algorithm based on feature rank aggregation and graph theoretic technique for ensemble feature selection (R-GEFS) with the fusion of Pearson and Spearman correlation metrics is proposed. The method works by aggregation of the profile of preferences of five feature rankers as the base feature selectors. Then similar features are grouped into clusters using graph theoretic approach. The most representative feature strongly co-related to target decision classes is drawn from each cluster. The efficiency and effectiveness of the R-GEFS algorithm are evaluated through an empirical study. Extensive experiments on 15 diverse benchmark datasets are carried out to compare R-GEFS with seven state-of-the-art feature selection models with respect to four popular classifiers, namely decision tree, k nearest neighbor, random forest, and support vector machine. The proposed method turns out to be effective by selecting smaller feature subsets with lesser computational complexities and it assists in increasing the classification accuracy.

Keywords:

References

1. A. Ahmed, F. Saeed, N. Salim and A. Abdo , Condorcet and borda count fusion method for ligand-based virtual screening, J. Cheminform. 6(19) (2014) 1–16. Google Scholar
2. M. R. Alhamidi and W. Jatmiko , Optimal feature aggregation and combination for two-dimensional ensemble feature selection, Information 11(38) (2020) 1–16. Google Scholar
3. M. Ali, S. I. Ali, D. Kim and T. Hur , uEFS: An efficient and comprehensive ensemble-based feature selection methodology to select informative features, PlosOne 13(8) (2018) 1–28. Crossref, Web of Science, Google Scholar
4. R. Bania , Survey on feature selection for data reduction, Int. J. Comput. Appl. 94(18) (2014) 1–7. Google Scholar
5. R. K. Bania and A. Halder , R-Ensembler: A greedy rough set based ensemble attribute selection algorithm with kNN imputation for classification of medical data, Comput. Methods Prog. Biomed. 184 (2020) 105122. Crossref, Web of Science, Google Scholar
6. R. K. Bania and A. Halder , R-HEFS: Rough set based heterogeneous ensemble feature selection method for medical data classification, Artif. Intelli. Med. 114 (2021) 102049. Crossref, Web of Science, Google Scholar
7. L. Beretta and A. Santaniello , Nearest neighbor imputation algorithms: a critical evaluation, BMC Med. Inform. Decision Making 16(1) (2016) 198–208. Google Scholar
8. G. Biau , Analysis of a random forests model, J. Mach. Learn. Res. 13 (2012) 1063–1095. Web of Science, Google Scholar
9. C. Blake and C. Merz, Uci repository of machine learning databases(UCI), http://http://www.ics.uci.edu/mlearn/ml.html. Google Scholar
10. E. Borandag, A. Ozcift, D. Kilinc and F. Yucalar , Majority vote feature selection algorithm in software fault prediction, Comput. Sci. Inform. Syst. 16(2) (2020) 515–539. Crossref, Web of Science, Google Scholar
11. L. Breiman , Random forests, Mach. Learn. 45 (2001) 5–32. Crossref, Web of Science, Google Scholar
12. V. B. Canedo and A. A. Betanzos , Ensembles for feature selection: A review and future trends, Inform. Fusion 52 (2019) 1–12. Crossref, Web of Science, Google Scholar
13. H. Cheng, W. Deng, C. Fu, Y. Wang and Z. Qin , Graph-based semi-supervised feature selection with application to automatic spam image identification, in Int. Workshop on Computer Science for Environmental Engineering and EcoInformatics, July 29–30, Kunming, China (2011), pp. 259–264. Crossref, Google Scholar
14. N. D. Cilia, S. C. D, F. Fontanella, S. Raimondo and A. S. Freca , An experimental comparison of feature-selection and classification methods for microarray datasets, Information 10(109) (2019) 2–13. Google Scholar
15. S. Das and A. K. Das , A graph-based approach for feature selection from higher order correlations, Int. J. Comput. Syst. Eng. 4(1) (2018) 66–71. Crossref, Google Scholar
16. P. Drotar, M. Gazda and L. Vokorokos , Ensemble feature selection using election methods and ranker clustering, Inform. Sci. 480 (2019) 365–380. Crossref, Web of Science, Google Scholar
17. S. Fakhraei, H. S. Zadeh and F. Fotouhi , Bias and stability of single variable classifiers for feature ranking and selection, Expert Syst. Appl. 41 (2014) 6945–6958. Crossref, Web of Science, Google Scholar
18. A. J. Ferreira and M. A. T. Figueiredo , Efficient feature selection filters for high-dimensional data, Pattern Recogn. Lett. 33 (2012) 1794–1804. Crossref, Web of Science, Google Scholar
19. M. M. Gaona, D. M. López, R. V. Canas and U. Neumann , Framework for the ensemble of feature selection methods, Appl. Sci. 11(8122) (2021) 1–16. Google Scholar
20. M. A. Hall and G. Holmes , Benchmarking attribute selection techniques for discrete class data mining, IEEE Trans. Knowl. Data Eng. 15(6) (2003) 1437–1447. Crossref, Web of Science, Google Scholar
21. J. Han and M. Kamber , Data Mining Concepts and Techniques, 4th edn. (Morgan Kaufmann Publishers, New York, 2011). Google Scholar
22. K. Henni, N. Mezghani and C. G. Vallerand , Unsupervised graph-based feature selection via subspace and pagerank centrality, Expert Syst. Appl. 114 (2018) 45–53. Crossref, Web of Science, Google Scholar
23. N. Hoque, M. Singh and D. K. Bhattacharyya , EFS-MI: an ensemble feature selection method for classification, Compl. Intelli. Syst. 4(2) (2017) 105–118. Crossref, Web of Science, Google Scholar
24. U. M. Khaire and R. Dhanalakshmib , Stability of feature selection algorithm: A review, J. King Saud Univ. Comput. Inform. Sci. (2019) 1–14. https://doi.org/10.1016/j.jksuci.2019.06.012. Web of Science, Google Scholar
25. N. Khoury, F. Attal, Y. Amirat, L. Oukhellou and S. Mohammed , Data-driven based approach to aid parkinson’s disease diagnosis, MDPI Sensors 19(242) (2019) 1–27. Google Scholar
26. H. E. Kiziloz , Classifier ensemble methods in feature selection, Neurocomputing 419 (2021) 97–107. Crossref, Web of Science, Google Scholar
27. R. Kolde, S. Laur, P. Adler and J. Vilo , Robust rank aggregation for gene list integration and meta-analysis, Bioinformatics 28(4) (2012) 573–580. Crossref, Web of Science, Google Scholar
28. C. Li and J. Xu , Feature selection with the fisher score followed by the maximal clique centrality algorithm can accurately identify the hub genes of hepatocellular carcinoma, Sci. Rep. 9 (2019) 17283. Crossref, Web of Science, Google Scholar
29. J. Li, K. Cheng, S. Wang, F. Morstatter, R. Trevino, J. Tang and H. Liu , Feature selection: A data perspective, ACM Comput. Surv. 50(6) (2017) 94. Web of Science, Google Scholar
30. G. Liu, C. Yang, S. Liu, C. Xiao and B. Song , Feature selection method based on mutual information and support vector machine, Int. J. Pattern Recogn. Artifi. Intelli. 35(6) (2021) 2150021. Link, Web of Science, Google Scholar
31. Q. Liu, Q. Gu and Z. Wu , Feature selection method based on support vector machine and shape analysis for high-throughput medical data., Egypt. Inform. J. 1(91) (2017) 103–111. Google Scholar
32. X. Liu and H. Zhao , Robust hierarchical feature selection with a capped l2-norm, Neurocomputing 443 (2021) 131–146. Crossref, Web of Science, Google Scholar
33. M. D. Mauro, G. Galatro, G. Fortino and A. Liotta , Supervised feature selection techniques in network intrusion detection: A critical review, Eng. Appl. Artifi. Intelli. 101 (2021) 104216. Crossref, Web of Science, Google Scholar
34. P. Moradi and M. Rostami , A graph theoretic approach for unsupervised feature selection, Eng. Appl. Artifi. Intelli. 44 (2015) 33–45. Crossref, Web of Science, Google Scholar
35. M. M. Mukaka , A guide to appropriate use of correlation coefficient in medical research, Malawi Med. J. 24(3) (2012) 69–71. Web of Science, Google Scholar
36. E. Namsrai, T. Munkhdalai, M. Li, J. Shin, O. Namsrai and K. H. Ryu , A feature selection-based ensemble method for arrhythmia classification, J. Inform. Process Syst. 9(1) (2013) 31–40. Crossref, Google Scholar
37. S. Nogueira, K. Sechidis and G. Brown , On the stability of feature selection algorithms, J. Mach. Learn. Res. 18 (2018) 1–54. Web of Science, Google Scholar
38. O. Osanaiye, H. Cai, K. Raymond and M. Dlo , Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing, EURASIP J. Wirel. Commun. Netw. 92(2) (2016) 2–10. Google Scholar
39. B. S. Pardo, I. P. Diaz, B. Canedo and A. A. Betanzos , Ensemble feature selection: Homogeneous and heterogeneous approaches, Knowl.-Based Syst. 118 (2017) 124–139. Crossref, Web of Science, Google Scholar
40. B. S. Pardo, I. P. V. B. Canedo and A. A. Betanzos , Ensemble feature selection for rankings of features, in Int. Work Conf. Artificial Neural Networks (2015), pp. 29–42. Google Scholar
41. H. Peng, F. Long and C. Ding , Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy, IEEE Trans. Pattern Anal. Mach. Intelli. 27(8) (2005) 1226–1238. Crossref, Web of Science, Google Scholar
42. B. Pes , Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains, Neural Comput. Appl. 32 (2020) 5951–5973. Crossref, Web of Science, Google Scholar
43. R. P. Priyadarsini, M. L. Valarmathi and S. Sivakumari , Gain ratio based feature selection method for privacy preservation, ICTACT J. Soft Comput. 1(4) (2011) 201–205. Crossref, Google Scholar
44. M. Radovic, M. Ghalwash, N. Filipovic and Z. Obradovic , Minimum redundancy maximum relevance feature selection approach for temporal gene expression data, BMC Bioinform. 18(9) (2017) 1–14. Google Scholar
45. Z. Rustam, I. Primasari and D. Widya , Classification of cancer data based on support vectors machines with feature selection using genetic algorithm and Laplacian score, in Proc. 3rd Int. Symp. Current Progress in Mathematics and Sciences, 26–27 July, Indonesia University (2017), pp. 1–6. Google Scholar
46. Y. Saeys, T. Abeel and Y. V. D. Peer , Robust feature selection using ensemble feature selection techniques, in Proc. European Conf. Machine Learning and Knowledge Discovery in Databases, Antwerp, Belgium (2008), pp. 313–325. Crossref, Google Scholar
47. C. Sarkar, S. Cooley and J. Srivastava , Robust feature selection technique using rank aggregation, Appl. Artifi. Intelli. 28(3) (2014) 243–257. Crossref, Web of Science, Google Scholar
48. P. Schmitt, J. Mandel and M. Guedj , A comparison of six methods for missing data imputation, J. Biomet. Biostatist. 6(1) (2015) 2–6. Google Scholar
49. P. Shrivastava, A. Shukla, P. Vepakomma, N. Bhansali and K. Verma , A survey of nature-inspired algorithms for feature selection to identify Parkinson’s disease, Comput. Methods Prog. Biomed. 127 (2016) 144–164. Google Scholar
50. Y. J. Son and H. G. Kim , Application of support vector machine for prediction of medication adherence in heart failure patients, Healthcare Inform. Res. 16(4) (2010) 253–259. Crossref, Google Scholar
51. Q. Song, J. Ni and G. Wang , A fast clustering-based feature subset selection algorithm for high-dimensional data, IEEE Trans. Knowl. Data Eng. 25(1) (2013) 1–14. Crossref, Web of Science, Google Scholar
52. R. J. Urbanowicz, M. Meeker, W. L. Cava, R. S. Olson and H. Moore , Relief-based feature selection: Introduction and review, J. Biomed. Inform. 85 (2018) 189–203. Crossref, Web of Science, Google Scholar
53. R. Wald, T. Khoshgoftaar and D. Dittman , Mean aggregation versus robust rank aggregation for ensemble gene selection, in 11th Int. Conf. Machine Learning and Applications, December 12–15, Boca Raton, Florida (2012), pp. 29–35. Crossref, Google Scholar
54. J. Wang, J. Xu, C. Zhao, Y. Peng and H. Wang , An ensemble feature selection method for highdimensional data based on sort aggregation, Syst. Sci. Control Eng. 7(2) (2019) 32–39. Crossref, Web of Science, Google Scholar
55. Y. Xia , Correlation and association analyses in microbiome study integrating multiomics in health and disease, Prog. Molecul. Biol. Transl. Sci. 171 (2020) 309–491. Crossref, Web of Science, Google Scholar
56. C. Xiao, J. Ye, R. M. Esteves and C. Rong , Using spearman’s correlation coefficients for exploratory data analysis on big dataset, Concurr. Comput. 28(14) (2015) 1–13. Web of Science, Google Scholar
57. G. Xiao, J. Li, K. Li and Y. Chen , MalFCS: An effective malware classification framework with automated feature extraction based on deep convolutional neural networks, J. Parall. Distrib. Comput. 141 (2020) 49–58. Crossref, Web of Science, Google Scholar
58. L. Yu and H. Liu , Efficient feature selection via analysis of relevance and redundancy, J. Mach. Learn. Res. 5 (2004) 1205–1224. Web of Science, Google Scholar