World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

Analyzing the Role of Class Rebalancing Techniques in Software Defect Prediction

    https://doi.org/10.1142/S0219622023500724Cited by:0 (Source: Crossref)

    Predicting software defects is an important task during software testing phase, especially for allocating appropriate resources and prioritizing testing tasks. Typically, classification algorithms are used to accomplish this task by using previously collected datasets. However, these datasets suffer from imbalanced label distribution where clean modules outnumber defective modules. Traditional classification algorithms cannot handle this nature in defect datasets because they assume the datasets are balanced. Failing to address this problem, the classification algorithm will produce a prediction biased towards the majority label. In the literature, there are several techniques designed to address this problem and most of them focus on data re-balancing. Recently, ensemble class imbalance techniques have emerged as an opposing approach to data rebalancing approaches. Regarding the software defect prediction, there are no studies examining the performance of ensemble class imbalance learning against data re-balancing approaches. This paper investigates the efficiency of ensemble class imbalance learning for software defect prediction. We conducted a comprehensive experiment that involved 12 datasets, six classifiers, nine class imbalance techniques, and 10 evaluation metrics. The experiments showed that ensemble approaches, particularly the Under Bagging technique, outperform traditional data re-balancing approaches, particularly when dealing with datasets that have high defect ratios.