Artificial IntelligenceNo Access

Synthetic Minority Oversampling Technique Based on Adaptive Noise Optimization and Fast Search for Local Sets for Random Forest

College of Intelligent Information Engineering, Chongqing Aerospace Polytechnic, Chongqing 400021, P. R. China

E-mail Address: luoshaopu2021@163.com

https://doi.org/10.1142/S0218001422590388Cited by:6 (Source: Crossref)

Abstract

The classification is usually degraded due to the imbalanced class distribution. Synthetic minority oversampling technique (SMOTE) has been successful in improving imbalanced classification and has received great praise. Overgeneralization is one of the most challenges in SMOTE. Although multiple SMOTE-based variations are proposed against overgeneralization, they still have the following shortcomings: (a) creating too many synthetic samples in high-density regions; (b) removing suspicious noise directly instead of modifying them; (c) relying on many parameters. This paper proposes a new SMOTE based on adaptive noise optimization and fast search for local sets (SMOTE-ANO-FLS) to overcome the overgeneralization and the shortcomings of existing works. First, SMOTE-ANO-FLS uses the $k$ $k$ -D tree to fast search the local sets for each sample. Second, a new noise detection method based on local sets and the imbalanced ratio is proposed to detect suspicious noise. Third, a new adaptive noise optimization method is proposed to modify detected suspicious noise instead of removing them. Finally, a new probability weight based on local sets is proposed to help create more synthetic minority class samples in borderline and sparse regions. The effectiveness of SMOTE-ANO-FLS is proven by employing 7 oversampling methods and random forest on the extensive synthetic and real data sets.

Keywords:

References

1. S. Barua, M. M. Islam, X. Yao and K. Murase , MWMOTE-majority weighted minority oversampling technique for imbalanced data set learning, IEEE Trans. Knowl. Data Eng. 26 (2014) 405–425. Crossref, Web of Science, Google Scholar
2. G. E. Batista, R. C. Prati and M. C. Monard , A study of the behavior of several methods for balancing machine learning training data, ACM SIGKDD Explor. Newslett. 6(1) (2004) 20–29. Crossref, Google Scholar
3. T. M. Beasley and B. D. Zumbo , Comparison of aligned Friedman rank and parametric methods for testing interactions in split-plot designs, Comput. Stat. Data Anal. 42(4) (2003) 569–593. Crossref, Web of Science, Google Scholar
4. J. L. Bentley , Multidimensional binary search trees used for associative searching, Commun. ACM 18(9) (1975) 509–517. Crossref, Web of Science, Google Scholar
5. C. Bunkhumpornpat, K. Sinapiromsaran and C. Lursinsap , Safe-level-SMOTE: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem, Pacific-Asia Conf. Knowledge Discovery and Data Mining (2009), pp. 475–482. Crossref, Google Scholar
6. C. Bunkhumpornpat, K. Sinapiromsaran and C. Lursinsap , DBSMOTE: Density based synthetic minority over-sampling technique, Appl. Intell. 36 (2012) 664–684. Crossref, Web of Science, Google Scholar
7. S. Chakraborty, J. Phukan, M. Roy and B. B. Chaudhuri , Handling the class imbalance in land-cover classification using bagging-based semisupervised neural approach, IEEE Geosci. Remote Sens. Lett. 17(9) (2020) 1493–1497. Crossref, Web of Science, Google Scholar
8. N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer , SMOTE: Synthetic minority oversampling technique, J. Artif. Intell. Res. 16 (2002) 321–357. Crossref, Web of Science, Google Scholar
9. H. Chen, F. Miao, Y. Chen, Y. Xiong and T. Chen , A hyperspectral image classification method using multifeature vectors and optimized KELM, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 14 (2021) 2781–2795. Crossref, Web of Science, Google Scholar
10. B. Chen, S. Xia, Z. Chen, B. Wang and G. Wang , RSMOTE: A self-adaptive robust SMOTE for imbalanced problems with label noise, Inf. Sci. 553 (2021) 397–428. Crossref, Web of Science, Google Scholar
11. D. A. Cieslak, N. V. Chawla and A. Striegel , Combating imbalance in network intrusion datasets, 2006 IEEE Int. Conf. Granular Computing (2006), pp. 732–737, https://doi.org/10.1109/GRC.2006.1635905. Crossref, Google Scholar
12. F. Dai and Y. Liu , A novel oversampling method based on SMOTE and local sets for imbalanced classification, J. Phys., Conf. Ser. 2026 (2021) 012007. Crossref, Google Scholar
13. J. Demiar and D. Schuurmans , Statistical comparisons of classifiers over multiple data sets, J. Mach. Learn. Res. 7(1) (2006) 1–30. Google Scholar
14. D. Elreedy and A. F. Atiya , A comprehensive analysis of synthetic minority oversampling technique (SMOTE) for handling class imbalance, Inf. Sci. 505 (2019) 32–64. Crossref, Web of Science, Google Scholar
15. A. González, D. P. José-Francisco, J. R. Juan and G. O. César , Local sets for multi-label instance selection, Appl. Soft Comput. 68 (2018) 651–666. Crossref, Web of Science, Google Scholar
16. H. Guan, Y. Zhang, M. Xian, H. D. Cheng and X. Tang , Smote-wenn: Solving class imbalance and small sample problems by oversampling and distance scaling, Appl. Intell. 51 (2021) 1394–1409. Crossref, Web of Science, Google Scholar
17. H. He, Y. Bai, E. A. Garcia and S. Li , ADASYN: Adaptive synthetic sampling approach for imbalanced learning, Proc. Int. Joint Conf. Neural Networks (2008), pp. 1322–1328. Google Scholar
18. Z. Jia and G. Chen , Study on digital image inpainting method based on multispectral image decomposition synthesis, Int. J. Pattern Recognit. Artif. Intell. 33(01) (2019) 954004. Link, Google Scholar
19. E. Kaya, S. Korkmaz, M. A. Sahman and A. C. Cinar , DEBOHID: A differential evolution based oversampling approach for highly imbalanced datasets, Expert Syst. Appl. 169 (2021) 114482. Crossref, Web of Science, Google Scholar
20. E. Leyva, A. González and R. Pérez , Three new instance selection methods based on local sets: A comparative study with several approaches from a bi-objective perspective, Pattern Recognit. 48(4) (2015) 1523–1537. Crossref, Web of Science, Google Scholar
21. J. Li, Q. Zhu, Q. Wu, Z. Zhang, Y. Gong, Z. He and F. Zhu , Smote-nan-de: Addressing the noisy and borderline examples problem in imbalanced classification by natural neighbors and differential evolution, Knowl. Based. Syst. 223(8) (2021) 107056. Crossref, Web of Science, Google Scholar
22. Y. Li, Y. Wang, T. Li, B. Li and X. Lan , SP-SMOTE: A novel space partitioning based synthetic minority oversampling technique, Knowl.-Based Syst. 228 (2021) 107269. Crossref, Web of Science, Google Scholar
23. P. Lim, C. K. Goh and K. C. Tan , Evolutionary cluster-based synthetic oversampling ensemble (ECO-ensemble) for imbalance learning, IEEE Trans. Cybernet. 47(9) (2017) 2850–2861. Crossref, Web of Science, Google Scholar
24. J. Ma, X. Fan, J. Ni, X. Zhu and C. Xiong , Multi-scale retinex with color restoration image enhancement based on Gaussian filtering and guided filtering, Int. J. Pattern Recognit. Artif. Intell. 31(16–19) (2017) 1744077. Google Scholar
25. S. Maldonado, C. Vairetti, A. Fernandez and F. Herrera , FW-SMOTE: A feature-weighted oversampling approach for imbalanced classification, Pattern Recognit. 124 (2022) 108511. Crossref, Web of Science, Google Scholar
26. W. W. Y. Ng, J. Hu, D. S. Yeung, S. Yin and F. Roli , Diversified sensitivity-based undersampling for imbalance classification problems, IEEE Trans. Cybernet. 45(11) (2015) 2402–2412. Crossref, Web of Science, Google Scholar
27. E. Ramentol, I. Gondres, S. Lajes, R. Bello, Y. Caballero, C. Cornelis and F. Herrera , Fuzzy-rough imbalanced learning for the diagnosis of high voltage circuit breaker maintenance: The SMOTE-FRST-2T algorithm, Eng. Appl. Artif. Intell. 46 (2016) 134–139. Crossref, Web of Science, Google Scholar
28. J. A. Sáeza, J. Luengob, J. Stefanowskic and F. Herreraa , SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering, Inf. Sci. 291(10) (2015) 184–203. Crossref, Web of Science, Google Scholar
29. S. Susan and A. Kumar , SSOMaj-SMOTE-SSOMin: Three-step intelligent pruning of majority and minority samples for learning from imbalanced datasets, Appl. Soft Comput. 78 (2019) 141–149. Crossref, Web of Science, Google Scholar
30. J. Wei, H. Huang, L. Yao, Y. Hu, Q. Fan and D. Huang , NI-MWMOTE: An improving noise-immunity majority weighted minority oversampling technique for imbalanced classification problems, Expert Syst. Appl. 158 (2020) 113504. Crossref, Web of Science, Google Scholar
31. D. Wu and C. Wu , Research on the time-dependent split delivery green vehicle routing problem for fresh agricultural products with multiple time windows, Agriculture 12(6) (2022) 793. Crossref, Google Scholar
32. Z. Xu, D. Shen, T. Nie, Y. Kou, N. Yin and X. Han , A cluster-based oversampling algorithm combining SMOTE and k-means for imbalanced medical data, Inf. Sci. 572 (2021) 574–589. Crossref, Web of Science, Google Scholar
33. X. Zhang and B. Hu , A new strategy of cost-free learning in the class imbalance problem, IEEE Trans. Knowl. Data Eng. 26(12) (2014) 2872–2885. Crossref, Web of Science, Google Scholar
34. H. Zhao, J. Liu, H. Chen, J. Chen, Y. Li, J. Xu and W. Deng , Intelligent diagnosis using continuous wavelet transform and gauss convolutional deep belief network, IEEE Trans. Reliab. (2022) 1–11. Web of Science, Google Scholar
35. X. Zhou, H. Ma, J. Gu, H. Chen and W. Deng , Parameter adaptation-based ant colony optimization with dynamic hybrid mechanism, Eng. Appl. Artif. Intell. 114 (2022) 105139. Crossref, Web of Science, Google Scholar