Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Highly reliable software is becoming an essential ingredient in many systems. However, assuring reliability often entails time-consuming costly development processes. One cost-effective strategy is to target reliability-enhancement activities to those modules that are likely to have the most problems. Software quality prediction models can predict the number of faults expected in each module early enough for reliability enhancement to be effective.
This paper introduces a case-based reasoning technique for the prediction of software quality factors. Case-based reasoning is a technique that seeks to answer new problems by identifying similar "cases" from the past. A case-based reasoning system can function as a software quality prediction model. To our knowledge, this study is the first to use case-based reasoning systems for predicting quantitative measures of software quality.
A case study applied case-based reasoning to software quality modeling of a family of full-scale industrial software systems. The case-based reasoning system's accuracy was much better than a corresponding multiple linear regression model in predicting the number of design faults. When predicting faults in code, its accuracy was significantly better than a corresponding multiple linear regression model for two of three test data sets and statistically equivalent for the third.
Embedded systems have become ubiquitous and essential entities in our ever growing high-tech world. The backbone of today's information-highway infrastructure are embedded systems such as telecommunication systems. They demand high reliability, so as to prevent severe consequences of failures including costly repairs at remote sites. Technology changes mandate that embedded systems evolve, resulting in a demand for techniques for improving reliability of their future system releases. Reliability models based on software metrics can be effective tools for software engineering of embedded systems, because quality improvements are so resource-consuming that it is not feasible to apply them to all modules. Identification of the likely fault-prone modules before system testing, can be effective in reducing the likelihood of faults discovered during operations.
A software quality classification model is calibrated using software metrics from a past release, and is then applied to modules currently under development to estimate which modules are likely to be fault-prone. This paper presents and demonstrates an effective case-based reasoning approach for calibrating such classification models. It is attractive for software engineering of embedded systems, because it can be used to develop software reliability models using a faster, cheaper, and easier method. We illustrate our approach with two large-scale case studies obtained from embedded systems. They involve data collected from telecommunication systems including wireless systems. It is indicated that the level of classification accuracy observed in both case studies would be beneficial in achieving high software reliability of subsequent releases of the embedded systems.
Estimating software fault-proneness early, i.e., predicting the probability of software modules to be faulty, can help in reducing costs and increasing effectiveness of software analysis and testing. The many available static metrics provide important information, but none of them can be deterministically related to software fault-proneness. Fault-proneness models seem to be an interesting alternative, but the work on these is still biased by lack of experimental validation.
This paper discusses barriers and problems in using software fault-proneness in industrial environments, proposes a method for building software fault-proneness models based on logistic regression and cross-validation that meets industrial needs, and provides some experimental evidence of the validity of the proposed approach.
This paper presents a diagnostic framework for assessing the significance of inconsistencies (i.e., violations of consistency rules) in software design models expressed in the Unified Modeling Language (UML). The assessment is based on significance criteria that software designers can specify and associate with specific consistency rules. These criteria define characteristics that the model elements involved in the violation of a rule should have for the inconsistency to be significant, and they are specified in a formal language derived from the Object Constraint Language (OCL). The satisfiability of the criteria by individual model elements is measured by belief functions defined by the framework. The measures generated by these functions are used to rank the inconsistencies caused by different model elements. The presented framework has been evaluated through a set of experiments. The results of these experiments show that criteria definable in the framework can be used to produce relatively elaborate significance-rankings of inconsistencies.
The performance of a classification model is invariably affected by the characteristics of the measurement data it is built upon. If the quality of the data is generally poor, then the classification model will demonstrate poor performance. The detection and removal of noisy instances will improve quality of the data, and consequently, the performance of the classification model. We investigate a noise handling technique that attempts to improve the quality of datasets for classification purposes by eliminating instances that are likely to be noise. Our approach uses twenty five different classification techniques to create an ensemble filter for eliminating likely noise. The basic assumption is that if a given majority of classifiers in the ensemble misclassify an instance, then it is likely to be a noisy instance. Using a relatively large number of base-level classifiers in the ensemble filter facilitates in achieving the desired level of noise removal conservativeness with several possible levels of filtering. It also provides a higher degree of confidence in the noise elimination procedure as the results are less likely to get influenced by (possibly) inappropriate learning bias of a few algorithms with twenty five base-level classifiers than with relatively smaller number of base-level classifiers. Empirical case studies of two high assurance software projects demonstrates the effectiveness of our noise elimination approach by the significant improvement achieved in classification accuracies at various levels of noise filtering.
Software metric models predict the target software metric(s), e.g., the development work effort or defect rates, for any future software project based on the project's predictor software metric(s), e.g., the project team size. Obviously, the construction of such a software metric model makes use of a data sample of such metrics from analogous past projects. However, incomplete data often appear in such data samples. Moreover, the decision on whether a particular predictor metric should be included is most likely based on an intuitive or experience-based assumption that the predictor metric has an impact on the target metric with a statistical significance. However, this assumption is usually not verifiable "retrospectively" after the model is constructed, leading to redundant predictor metric(s) and/or unnecessary predictor metric complexity. To solve all these problems, we derived a methodology consisting of the k-nearest neighbors (k-NN) imputation method, statistical hypothesis testing, and a "goodness-of-fit" criterion. This methodology was tested on software effort metric models and software quality metric models, the latter usually suffers from far more serious incomplete data. This paper documents this methodology and the tests on these two types of software metric models.
Maintainability is an important factor that developers should be concerned because two-thirds of software costs involve maintenance. Aspect-oriented programming (AOP) paradigm is aimed to increase the software maintainability. It solves code tangling and code scattering problems by introducing a new modular unit, called "aspect". Various research works are provided to support measuring the object-oriented software, but only few studies are set up to support measuring the aspect-oriented software. This paper proposes aspect-oriented software maintainability metrics and a set of aspect-oriented design guidelines to support the metrics. By combining the proposed guidelines, object-oriented design principles, and aspect-oriented design principles, the metrics are constructed according to the Factor-Strategy (FS) quality model and the Factor-Criteria-Metric (FCM) quality model. Principle violation check definitions in the form of Boolean expressions are also defined to conduct software measurement and to fulfill the metrics. Finally, the aspect-oriented software maintainability metrics are applied to detect design principle violations in fifty AspectJ systems. The results show that for all systems their hidden flaws are exposed. Moreover, the proposed metrics are used to compare the maintainability between two versions of systems written in Java and AspectJ.
The primary goal of software quality engineering is to produce a high quality software product through the use of some specific techniques and processes. One strategy is applying data mining techniques to software metric and defect data collected during the software development process to identify potential low-quality program modules. In this paper, we investigate the use of feature selection in the context of software quality estimation (also referred to as software defect prediction), where a classification model is used to predict whether program modules (instances) are fault-prone or not-fault-prone. Seven filter-based feature ranking techniques are examined. Among them, six are commonly used, and the other one, named signal to noise ratio (SNR), is rarely employed. The objective of the paper is to compare these seven techniques for various software data sets and assess their effectiveness for software quality modeling. A case study is performed on 16 software data sets, and classification models are built with five different learners and evaluated with two performance metrics. Our experimental results are summarized based on statistical tests for significance. The main conclusion is that the SNR technique performs as well as the best performer of the six commonly used techniques.
We present an analysis of the evolution of a Web application project developed with object-oriented technology and an agile process. During the development we systematically performed measurements on the source code, using software metrics that have been proved to be correlated with software quality, such as the Chidamber and Kemerer suite and Lines of Code metrics. We also computed metrics derived from the class dependency graph, including metrics derived from Social Network Analysis. The application development evolved through phases, characterized by a different level of adoption of some key agile practices — namely pair programming, test-based development and refactoring. The evolution of the metrics of the system, and their behavior related to the agile practices adoption level, is presented and discussed. We show that, in the reported case study, a few metrics are enough to characterize with high significance the various phases of the project. Consequently, software quality, as measured using these metrics, seems directly related to agile practices adoption.
The term "software entropy" refers to the tendency for software, over time, to become difficult and costly to maintain. A software system that undergoes continuous change, such as having new functionality added to its original design, will eventually become more complex and can become disorganized as it grows, losing its original design structure.
A recent study show that software degradation may be measured using the WMC expressed in terms of Shannon entropy. In this paper we extended the empirical analyses also to RFC and CBO since these CK metrics have been shown to be correlated with fault-proneness of OO classes.
We analyzed various releases of the publicly available Eclipse and Netbeans software systems, calculating the entropy of some CK metrics for every release analyzed. The validity is shown through a direct measure of software quality such as the number of detected defects. Our results display a very good correlation between the entropy of CBO and RFC and the number of bugs for Eclipse and Netbeans.
Complexity and quality metrics are in general computed on every system module while the entropy is just a scalar number that characterizes a whole system, this result suggests that the entropy of some CK metrics could be considered as a global quality metric for large software systems. Our results need, however, to be confirmed for other large software systems.
Software defect prediction models that use software metrics such as code-level measurements and defect data to build classification models are useful tools for identifying potentially-problematic program modules. Effectiveness of detecting such modules is affected by the software measurements used, making data preprocessing an important step during software quality prediction. Generally, there are two problems affecting software measurement data: high dimensionality (where a training dataset has an extremely large number of independent attributes, or features) and class imbalance (where a training dataset has one class with relatively many more members than the other class). In this paper, we present a novel form of ensemble learning based on boosting that incorporates data sampling to alleviate class imbalance and feature (software metric) selection to address high dimensionality. As we adopt two different sampling methods (Random Undersampling (RUS) and Synthetic Minority Oversampling (SMOTE)) in the technique, we have two forms of our new ensemble-based approach: selectRUSBoost and selectSMOTEBoost. To evaluate the effectiveness of these new techniques, we apply them to two groups of datasets from two real-world software systems. In the experiments, four learners and nine feature selection techniques are employed to build our models. We also consider versions of the technique which do not incorporate feature selection, and compare all four techniques (the two different ensemble-based approaches which utilize feature selection and the two versions which use sampling only). The experimental results demonstrate that selectRUSBoost is generally more effective in improving defect prediction performance than selectSMOTEBoost, and that the techniques with feature selection do help for getting better prediction than the techniques without feature selection.
The basic measurements for software quality control and management are the various project and software metrics collected at various states of a software development life cycle. The software metrics may not all be relevant for predicting the fault proneness of software components, modules, or releases. Thus creating the need for the use of feature (software metric) selection. The goal of feature selection is to find a minimum subset of attributes that can characterize the underlying data with results as well as, or even better than the original data when all available features are considered. As an example of inter-disciplinary research (between data science and software engineering), this study is unique in presenting a large comparative study of wrapper-based feature (or attribute) selection techniques for building defect predictors. In this paper, we investigated thirty wrapper-based feature selection methods to remove irrelevant and redundant software metrics used for building defect predictors. In this study, these thirty wrappers vary based on the choice of search method (Best First or Greedy Stepwise), leaner (Naïve Bayes, Support Vector Machine, and Logistic Regression), and performance metric (Overall Accuracy, Area Under ROC (Receiver Operating Characteristic) Curve, Area Under the Precision-Recall Curve, Best Geometric Mean, and Best Arithmetic Mean) used in the defect prediction model evaluation process. The models are trained using the three learners and evaluated using the five performance metrics. The case study is based on software metrics and defect data collected from a real world software project.
The results demonstrate that Best Arithmetic Mean is the best performance metric used within the wrapper. Naïve Bayes performed significantly better than Logistic Regression and Support Vector Machine as a wrapper learner on slightly and less imbalanced datasets. We also recommend Greedy Stepwise as a search method for wrappers. Moreover, comparing to models built with full datasets, the performances of defect prediction models can be improved when metric subsets are selected through a wrapper subset selector.
Defect prediction is very challenging in software development practice. Classification models are useful tools that can help for such prediction. Classification models can classify program modules into quality-based classes, e.g. fault-prone (fp) or not-fault-prone (nfp). This facilitates the allocation of limited project resources. For example, more resources are assigned to program modules that are of poor quality or likely to have a high number of faults based on the classification. However, two main problems, high dimensionality and class imbalance, affect the quality of training datasets and therefore classification models. Feature selection and data sampling are often used to overcome these problems. Feature selection is a process of choosing the most important attributes from the original dataset. Data sampling alters the dataset to change its balance level. Another technique, called boosting (building multiple models, with each model tuned to work better on instances misclassified by previous models), is found to also be effective for resolving the class imbalance problem.
In this study, we investigate an approach for combining feature selection with this ensemble learning (boosting) process. We focus on two different scenarios: feature selection performed prior to the boosting process and feature selection performed inside the boosting process. Ten individual base feature ranking techniques, as well as an ensemble ranker based on the ten, are examined and compared over the two scenarios. We also employ the boosting algorithm to construct classification models without performing feature selection and use the results as the baseline for further comparison. The experimental results demonstrate that feature selection is important and needed prior to the learning process. In addition, the ensemble feature ranking method generally has better or similar performance than the average of the base ranking techniques, and more importantly, the ensemble method exhibits better robustness than most base ranking techniques. As for the two scenarios, the results show that applying feature selection inside boosting performs better than using feature selection prior to boosting.
Software quality modeling is the process of using software metrics from previous iterations of development to locate potentially faulty modules in current under-development code. This has become an important part of the software development process, allowing practitioners to focus development efforts where they are most needed. One difficulty encountered in software quality modeling is the problem of high dimensionality, where the number of available software metrics is too large for a classifier to work well. In this case, many of the metrics may be redundant or irrelevant to defect prediction results, thereby selecting a subset of software metrics that are the best predictors becomes important. This process is called feature (metric) selection. There are three major forms of feature selection: filter-based feature rankers, which uses statistical measures to assign a score to each feature and present the user with a ranked list; filter-based feature subset evaluation, which uses statistical measures on feature subsets to find the best feature subset; and wrapper-based subset selection, which builds classification models using different subsets to find the one which maximizes performance. Software practitioners are interested in which feature selection methods are best at providing the most stable feature subset in the face of changes to the data (here, the addition or removal of instances). In this study we select feature subsets using fifteen feature selection methods and then use our newly proposed Average Pairwise Tanimoto Index (APTI) to evaluate the stability of the feature selection methods. We evaluate the stability of feature selection methods on a pair of subsamples generated by our fixed-overlap partitions algorithm. Four different levels of overlap are considered in this study. 13 software metric datasets from two real-world software projects are used in this study. Results demonstrate that ReliefF (RF) is the most stable feature selection method and wrapper based feature subset selection shows least stability. In addition, as the overlap of partitions increased, the stability of the feature selection strategies increased.
Unnecessary repeated codes, also known as code clones, have not been well documented and are difficult to maintain. Code clones may become an important problem in the software development cycle, since any detected error must be fixed in all occurrences. This condition significantly increases software maintenance costs and requires effort/duration for understanding the code. This research introduces a novel methodology to minimize or prevent the code cloning problem in software projects. In particular, this manuscript is focused on the detection of structural code clones, which are defined as similarity in software structure such as design patterns. Our proposed methodology provides a solution to the class-level structural code clone detection problem. We introduce a novel software architecture that provides unification of different software quality analysis tools that take measurements for software metrics for structural code clone detection. We present an empirical evaluation of our approach and investigate its practical usefulness. We conduct a user study using human judges to detect structural code clones in three different open-source software projects. We apply our methodology to the same projects and compare results. The results show that our proposed solution is able to show high consistency compared with the results reached by the human judges. The outcome of this study also indicates that a uniform structural code clone detection system can be built on top of different software quality tools, where each tool takes measurements of different object-oriented software metrics.
Object-oriented software systems contain large number of modules which make unit testing, integration testing, and system testing very difficult and challenging. While the aim of unit testing is to show that individual modules are working properly and the aim of the system testing is to determine whether the whole system meets its specifications, the aim of integration testing is to uncover errors in the interactions between system modules. However, it is generally impossible to test all connections between modules because of time and budget constraints. Thus, it is important to focus the testing on the connections presumed to be more error-prone.
The goal of this work is to guide software testers where in a software system to focus when performing integration testing to save time and resources. This paper proposes a new approach to predict and rank error-prone connections. We use method level metrics that capture both dependencies and internal complexity of methods. We performed experiments on several Java applications and used error seeding techniques for evaluation. The experimental results showed that our approach is effective for selecting the test focus in integration testing.
Testing is a vital part of achieving good-quality software. Deploying untested code can cause system crashes and unexpected behavior. To reduce these problems, testing should evolve with coding. In addition, test suites should not remain static throughout the software versions. Since whenever software gets updated, new functionalities are added, or existing functionalities are changed, test suites should be updated along with the software. Software repositories contain valuable information about the software systems. Access to older versions and differentiating adjacent versions’ source code and acceptance test changes can provide information about the evolution process of the software. This research proposes a method and implementation to analyze 21 open-source real-world projects hosted on GitHub regarding the co-evolution of both software and its acceptance test suites. Related projects are retrieved from repositories, their versions are analyzed, graphs are created, and analysis related to the co-evolution process is performed. Observations show that the source code is getting updated more frequently than the acceptance tests. They indicate a pattern that source code and acceptance tests do not evolve together. Moreover, the analysis showed that a few acceptance tests test most of the functionalities that take a significant line of code.
Association rule mining is a highly effective approach to data analysis for datasets of varying sizes, accommodating diverse feature values. Nevertheless, deriving practical rules from datasets with numerical variables presents a challenge, as these variables must be discretized beforehand. Quantitative association rule mining addresses this issue, allowing the extraction of valuable rules. This paper introduces an extension to quantitative association rules, incorporating a two-variable function in their consequent part. The use of correlation functions, statistical test functions, and error functions is also introduced. We illustrate the utility of this extension through three case studies employing software engineering datasets. In case study 1, we successfully pinpointed the conditions that result in either a high or low correlation between effort and software size, offering valuable insights for software project managers. In case study 2, we effectively identified the conditions that lead to a high or low correlation between the number of bugs and source lines of code, aiding in the formulation of software test planning strategies. In case study 3, we applied our approach to the two-step software effort estimation process, uncovering the conditions most likely to yield low effort estimation errors.
The use of empirical data to understand and improve software products and software engineering processes is gaining ever increasing attention. Empirical data from products and processes is needed to help an organization understand and improve its way of doing business in the software domain. Additional motivation for collecting and using data is provided by the need to conform to guidelines and standards which mandate measurement, specifically the SEI’s Capability Maturity Model and ISO 9000–3. Some software engineering environments (SEEs) offer automated support for collecting and, in a few cases, using empirical data. Measurement will clearly play a significant role in future SEEs. The paper surveys the trend towards supporting measurement in SEEs and gives details about several existing research and commercial software systems.
Software testing methodologies and metrics are key issues to improve and assure software quality. They are important areas in the research of Software Engineering. However, not many focus on the testing criteria and metrics evaluation of a class hierarchy. This paper introduces an important concept named URI (Unit Repeated Inheritance) to realize integrated object-oriented testing and object-oriented metrics. The approach describes an ILT (Inheritance Level Technique) method as a guide to test object-oriented software errors and measure the software complexity of a class hierarchy. The proposed technique shows that inheritance has a close relation to object-oriented software complexity and reveals that overuse of repeated(multiple) inheritance will increase software complexity and be prone to implicit software errors. Also, two test criteria: intra level first and inter level first are presented.
Software metrics serve as a significant issue to improve software quality. It is an important research of software engineering. In line with the methodologies of object-oriented analysis and design widely developed, many software metrics techniques have been proposed. However, not many focus on the metrics evaluation of an inheritance hierarchy. In this paper, we propose a novel concept named unit repeated inheritance (URI) in Z notation to realize object-oriented software metrics. The approach describes an inheritance level technique (ILT) method as a guide to measure the software complexity of an inheritance hierarchy. The measures of inheritance metrics are formed based on the proposed mechanism. Also, we use Lex and Yacc to construct a windowing tool which is used in conjunction with a conventional C++ programming environment to assist a programmer to analyze and measure his/her C++ programs.