[Author: Gao, Kehan] AND [Keyword: Software Metrics] : Search

Anywhere

Advanced Search

SEARCH GUIDE

Results: 1 - 5of5

Follow results:

refine search

Filters

per page:

Sort: Relevance

Context for search term 1Search term 1*

All Dates

LastSelect static range

Custom Range

Select starting monthSelect starting year

Select ending monthSelect ending year

Advanced

Search name	Searched On	Run search
[Author: Gao, Kehan] AND [Keyword: Software Metrics] (5)	15 Mar 2025	Run
[Author: Carvalho, P] AND [Keyword: Cardiovascular Diseases] AND [All Categories: C... (1)	15 Mar 2025	Run
[Author: Aliev, F G] AND [Keyword: Edge States] AND [All Categories: Nanofabricatio... (1)	15 Mar 2025	Run
[Author: Allen, E B] AND [Keyword: Software Quality] (1)	15 Mar 2025	Run
[Keyword: Cardiovascular Diseases] AND [All Categories: Neurology / Neuroscience] (1)	15 Mar 2025	Run

articleNo Access
AN EMPIRICAL STUDY OF FEATURE RANKING TECHNIQUES FOR SOFTWARE QUALITY PREDICTION
- TAGHI M. KHOSHGOFTAAR,
- KEHAN GAO, and
- AMRI NAPOLITANO
International Journal of Software Engineering and Knowledge Engineering01 Mar 2012
Preview Abstract
The primary goal of software quality engineering is to produce a high quality software product through the use of some specific techniques and processes. One strategy is applying data mining techniques to software metric and defect data collected during the software development process to identify potential low-quality program modules. In this paper, we investigate the use of feature selection in the context of software quality estimation (also referred to as software defect prediction), where a classification model is used to predict whether program modules (instances) are fault-prone or not-fault-prone. Seven filter-based feature ranking techniques are examined. Among them, six are commonly used, and the other one, named signal to noise ratio (SNR), is rarely employed. The objective of the paper is to compare these seven techniques for various software data sets and assess their effectiveness for software quality modeling. A case study is performed on 16 software data sets, and classification models are built with five different learners and evaluated with two performance metrics. Our experimental results are summarized based on statistical tests for significance. The main conclusion is that the SNR technique performs as well as the best performer of the six commonly used techniques.
articleNo Access
The Use of Ensemble-Based Data Preprocessing Techniques for Software Defect Prediction
- Kehan Gao,
- Taghi M. Khoshgoftaar, and
- Amri Napolitano
International Journal of Software Engineering and Knowledge Engineering01 Nov 2014
Preview Abstract
Software defect prediction models that use software metrics such as code-level measurements and defect data to build classification models are useful tools for identifying potentially-problematic program modules. Effectiveness of detecting such modules is affected by the software measurements used, making data preprocessing an important step during software quality prediction. Generally, there are two problems affecting software measurement data: high dimensionality (where a training dataset has an extremely large number of independent attributes, or features) and class imbalance (where a training dataset has one class with relatively many more members than the other class). In this paper, we present a novel form of ensemble learning based on boosting that incorporates data sampling to alleviate class imbalance and feature (software metric) selection to address high dimensionality. As we adopt two different sampling methods (Random Undersampling (RUS) and Synthetic Minority Oversampling (SMOTE)) in the technique, we have two forms of our new ensemble-based approach: selectRUSBoost and selectSMOTEBoost. To evaluate the effectiveness of these new techniques, we apply them to two groups of datasets from two real-world software systems. In the experiments, four learners and nine feature selection techniques are employed to build our models. We also consider versions of the technique which do not incorporate feature selection, and compare all four techniques (the two different ensemble-based approaches which utilize feature selection and the two versions which use sampling only). The experimental results demonstrate that selectRUSBoost is generally more effective in improving defect prediction performance than selectSMOTEBoost, and that the techniques with feature selection do help for getting better prediction than the techniques without feature selection.
articleNo Access
Investigating Two Approaches for Adding Feature Ranking to Sampled Ensemble Learning for Software Quality Estimation
- Kehan Gao,
- Taghi M. Khoshgoftaar, and
- Amri Napolitano
International Journal of Software Engineering and Knowledge Engineering01 Feb 2015
Preview Abstract
Defect prediction is very challenging in software development practice. Classification models are useful tools that can help for such prediction. Classification models can classify program modules into quality-based classes, e.g. fault-prone (fp) or not-fault-prone (nfp). This facilitates the allocation of limited project resources. For example, more resources are assigned to program modules that are of poor quality or likely to have a high number of faults based on the classification. However, two main problems, high dimensionality and class imbalance, affect the quality of training datasets and therefore classification models. Feature selection and data sampling are often used to overcome these problems. Feature selection is a process of choosing the most important attributes from the original dataset. Data sampling alters the dataset to change its balance level. Another technique, called boosting (building multiple models, with each model tuned to work better on instances misclassified by previous models), is found to also be effective for resolving the class imbalance problem.
In this study, we investigate an approach for combining feature selection with this ensemble learning (boosting) process. We focus on two different scenarios: feature selection performed prior to the boosting process and feature selection performed inside the boosting process. Ten individual base feature ranking techniques, as well as an ensemble ranker based on the ten, are examined and compared over the two scenarios. We also employ the boosting algorithm to construct classification models without performing feature selection and use the results as the baseline for further comparison. The experimental results demonstrate that feature selection is important and needed prior to the learning process. In addition, the ensemble feature ranking method generally has better or similar performance than the average of the base ranking techniques, and more importantly, the ensemble method exhibits better robustness than most base ranking techniques. As for the two scenarios, the results show that applying feature selection inside boosting performs better than using feature selection prior to boosting.
articleNo Access
A RULE-BASED SOFTWARE QUALITY CLASSIFICATION MODEL
International Journal of Reliability, Quality and Safety Engineering01 Jun 2008
Preview Abstract
A rule-based classification model is presented to identify high-risk software modules. It utilizes the power of rough set theory to reduce the number of attributes, and the equal frequency binning algorithm to partition the values of the attributes. As a result, a set of conjuncted Boolean predicates are formed. The model is inherently influenced by the practical needs of the system being modeled, thus allowing the analyst to determine which rules are to be used for classifying the fault-prone and not fault-prone modules. The proposed model also enables the analyst to control the number of rules that constitute the model. Empirical validation of the model is accomplished through a case study of a large legacy telecommunications system. The ease of rule interpretation and the transparency of the functional aspects of the model are clearly demonstrated. It is concluded that the new model is effective in achieving the software quality classification.
articleNo Access
WRAPPER-BASED FEATURE RANKING TECHNIQUES FOR DETERMINING RELEVANCE OF SOFTWARE ENGINEERING METRICS
- WILKER ALTIDOR,
- TAGHI M. KHOSHGOFTAAR, and
- KEHAN GAO
International Journal of Reliability, Quality and Safety Engineering01 Oct 2010
Preview Abstract
Classification, an important data mining function that assigns class label to items in a collection, is of practical applications in various domains. In software engineering, for instance, a common classification problem is to determine the quality of a software item. In such a problem, software metrics represent the independent features while the fault proneness represents the class label. With many classification problems, one must often deal with the presence of irrelevant features in the feature space. That, coupled with class imbalance, renders the task of discriminating one class from another rather difficult. In this study, we empirically evaluate our proposed wrapper-based feature ranking where nine performance metrics aided by a particular learner and a methodology are considered. We examine five learners and take three different approaches, each in conjunction with one of three different methodologies: 3-fold Cross-Validation, 3-fold Cross-Validation Risk Impact, and a combination of the two. In this study, we consider two sets of software engineering datasets. To evaluate the classifier performance after feature selection has been applied, we use Area Under Receiver Operating Characteristic curve as the performance evaluator. We investigate the performance of feature selection as we vary the three factors that form the foundation of the wrapper-based feature ranking. We show that the performance is conditioned by not only the choice of methodology but also the learner. We also evaluate the effect of sampling on wrapper-based feature ranking. Finally, we provide guidance as to which software metrics are relevant in software defect prediction problems and how the number of software metrics can be selected when using wrapper-based feature ranking.

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Narrow Results

Filters Applied

Publication Type

Article Type

Publication Date

Author

Publication/Book Series

Subjects

Access

AN EMPIRICAL STUDY OF FEATURE RANKING TECHNIQUES FOR SOFTWARE QUALITY PREDICTION

The Use of Ensemble-Based Data Preprocessing Techniques for Software Defect Prediction

Investigating Two Approaches for Adding Feature Ranking to Sampled Ensemble Learning for Software Quality Estimation

A RULE-BASED SOFTWARE QUALITY CLASSIFICATION MODEL

WRAPPER-BASED FEATURE RANKING TECHNIQUES FOR DETERMINING RELEVANCE OF SOFTWARE ENGINEERING METRICS