Keyword: Rusboost : Search

Anywhere

Advanced Search

SEARCH GUIDE

Results: 1 - 2of2

Follow results:

refine search

Filters

per page:

Sort: Relevance

Context for search term 1Search term 1*

All Dates

LastSelect static range

Custom Range

Select starting monthSelect starting year

Select ending monthSelect ending year

Advanced

Search name	Searched On	Run search
Keyword: Rusboost (2)	31 Mar 2025	Run

articleNo Access
AN EMPIRICAL EVALUATION OF REPETITIVE UNDERSAMPLING TECHNIQUES
International Journal of Software Engineering and Knowledge Engineering01 Mar 2010
Preview Abstract
Class imbalance is a fundamental problem in data mining and knowledge discovery which is encountered in a wide array of application domains. Random undersampling has been widely used to alleviate the harmful effects of imbalance, however, this technique often leads to a substantial amount of information loss. Repetitive undersampling techniques, which generate an ensemble of models, each trained on a different, undersampled subset of the training data, have been proposed to allieviate this difficulty. This work reviews three repetitive undersampling methods currently used to handle imbalance and presents a detailed and comprehensive empirical study using four different learners, four performance metrics and 15 datasets from various application domains. To our knowledge, this work is the most thorough study of repetitive undersampling techniques.
articleNo Access
Comparing Feature Selection Techniques for Software Quality Estimation Using Data-Sampling-Based Boosting Algorithms
- Taghi M. Khoshgoftaar,
- Kehan Gao,
- Ye Chen, and
- Amri Napolitano
International Journal of Reliability, Quality and Safety Engineering01 Jun 2015
Preview Abstract
Software defect prediction is a classification technique that utilizes software metrics and fault data collected during the software development process to identify fault-prone modules before the testing phase. It aims to optimize project resource allocation and eventually improve the quality of software products. However, two factors, high dimensionality and class imbalance, may cause low quality training data and subsequently degrade classification models. Feature (software metric) selection and data sampling are frequently used to overcome these problems. Feature selection (FS) is a process of choosing a subset of relevant features so that the quality of prediction models can be maintained or improved. Data sampling alters the dataset to change its balance level, therefore alleviating the problem of traditional classification models that are biased toward the overrepresented (majority) class. A recent study shows that another method, called boosting (building multiple models, with each model tuned to work better on instances misclassified by previous models), is also effective for addressing the class imbalance problem. In this paper, we present a technique that uses FS followed by a boosting algorithm in the context of software quality estimation. We investigate four FS approaches: individual FS, repetitive sampled FS, sampled ensemble FS, and repetitive sampled ensemble FS, and study the impact of the four approaches on the quality of the prediction models. Ten base feature ranking techniques are examined in the case study. We also employ the boosting algorithm to construct classification models with no FS and use the results as the baseline for further comparison. The empirical results demonstrate that (1) FS is important and necessary prior to the learning process; (2) the repetitive sampled FS method generally has similar performance to the individual FS technique; and (3) the ensemble filter (including sampled ensemble filter and repetitive sampled ensemble filter) performs better than or similarly to the average of the corresponding individual base rankers.