Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Highly reliable software is becoming an essential ingredient in many systems. However, assuring reliability often entails time-consuming costly development processes. One cost-effective strategy is to target reliability-enhancement activities to those modules that are likely to have the most problems. Software quality prediction models can predict the number of faults expected in each module early enough for reliability enhancement to be effective.
This paper introduces a case-based reasoning technique for the prediction of software quality factors. Case-based reasoning is a technique that seeks to answer new problems by identifying similar "cases" from the past. A case-based reasoning system can function as a software quality prediction model. To our knowledge, this study is the first to use case-based reasoning systems for predicting quantitative measures of software quality.
A case study applied case-based reasoning to software quality modeling of a family of full-scale industrial software systems. The case-based reasoning system's accuracy was much better than a corresponding multiple linear regression model in predicting the number of design faults. When predicting faults in code, its accuracy was significantly better than a corresponding multiple linear regression model for two of three test data sets and statistically equivalent for the third.
Embedded systems have become ubiquitous and essential entities in our ever growing high-tech world. The backbone of today's information-highway infrastructure are embedded systems such as telecommunication systems. They demand high reliability, so as to prevent severe consequences of failures including costly repairs at remote sites. Technology changes mandate that embedded systems evolve, resulting in a demand for techniques for improving reliability of their future system releases. Reliability models based on software metrics can be effective tools for software engineering of embedded systems, because quality improvements are so resource-consuming that it is not feasible to apply them to all modules. Identification of the likely fault-prone modules before system testing, can be effective in reducing the likelihood of faults discovered during operations.
A software quality classification model is calibrated using software metrics from a past release, and is then applied to modules currently under development to estimate which modules are likely to be fault-prone. This paper presents and demonstrates an effective case-based reasoning approach for calibrating such classification models. It is attractive for software engineering of embedded systems, because it can be used to develop software reliability models using a faster, cheaper, and easier method. We illustrate our approach with two large-scale case studies obtained from embedded systems. They involve data collected from telecommunication systems including wireless systems. It is indicated that the level of classification accuracy observed in both case studies would be beneficial in achieving high software reliability of subsequent releases of the embedded systems.
Estimating software fault-proneness early, i.e., predicting the probability of software modules to be faulty, can help in reducing costs and increasing effectiveness of software analysis and testing. The many available static metrics provide important information, but none of them can be deterministically related to software fault-proneness. Fault-proneness models seem to be an interesting alternative, but the work on these is still biased by lack of experimental validation.
This paper discusses barriers and problems in using software fault-proneness in industrial environments, proposes a method for building software fault-proneness models based on logistic regression and cross-validation that meets industrial needs, and provides some experimental evidence of the validity of the proposed approach.
This paper presents a diagnostic framework for assessing the significance of inconsistencies (i.e., violations of consistency rules) in software design models expressed in the Unified Modeling Language (UML). The assessment is based on significance criteria that software designers can specify and associate with specific consistency rules. These criteria define characteristics that the model elements involved in the violation of a rule should have for the inconsistency to be significant, and they are specified in a formal language derived from the Object Constraint Language (OCL). The satisfiability of the criteria by individual model elements is measured by belief functions defined by the framework. The measures generated by these functions are used to rank the inconsistencies caused by different model elements. The presented framework has been evaluated through a set of experiments. The results of these experiments show that criteria definable in the framework can be used to produce relatively elaborate significance-rankings of inconsistencies.
The performance of a classification model is invariably affected by the characteristics of the measurement data it is built upon. If the quality of the data is generally poor, then the classification model will demonstrate poor performance. The detection and removal of noisy instances will improve quality of the data, and consequently, the performance of the classification model. We investigate a noise handling technique that attempts to improve the quality of datasets for classification purposes by eliminating instances that are likely to be noise. Our approach uses twenty five different classification techniques to create an ensemble filter for eliminating likely noise. The basic assumption is that if a given majority of classifiers in the ensemble misclassify an instance, then it is likely to be a noisy instance. Using a relatively large number of base-level classifiers in the ensemble filter facilitates in achieving the desired level of noise removal conservativeness with several possible levels of filtering. It also provides a higher degree of confidence in the noise elimination procedure as the results are less likely to get influenced by (possibly) inappropriate learning bias of a few algorithms with twenty five base-level classifiers than with relatively smaller number of base-level classifiers. Empirical case studies of two high assurance software projects demonstrates the effectiveness of our noise elimination approach by the significant improvement achieved in classification accuracies at various levels of noise filtering.
Software metric models predict the target software metric(s), e.g., the development work effort or defect rates, for any future software project based on the project's predictor software metric(s), e.g., the project team size. Obviously, the construction of such a software metric model makes use of a data sample of such metrics from analogous past projects. However, incomplete data often appear in such data samples. Moreover, the decision on whether a particular predictor metric should be included is most likely based on an intuitive or experience-based assumption that the predictor metric has an impact on the target metric with a statistical significance. However, this assumption is usually not verifiable "retrospectively" after the model is constructed, leading to redundant predictor metric(s) and/or unnecessary predictor metric complexity. To solve all these problems, we derived a methodology consisting of the k-nearest neighbors (k-NN) imputation method, statistical hypothesis testing, and a "goodness-of-fit" criterion. This methodology was tested on software effort metric models and software quality metric models, the latter usually suffers from far more serious incomplete data. This paper documents this methodology and the tests on these two types of software metric models.
Maintainability is an important factor that developers should be concerned because two-thirds of software costs involve maintenance. Aspect-oriented programming (AOP) paradigm is aimed to increase the software maintainability. It solves code tangling and code scattering problems by introducing a new modular unit, called "aspect". Various research works are provided to support measuring the object-oriented software, but only few studies are set up to support measuring the aspect-oriented software. This paper proposes aspect-oriented software maintainability metrics and a set of aspect-oriented design guidelines to support the metrics. By combining the proposed guidelines, object-oriented design principles, and aspect-oriented design principles, the metrics are constructed according to the Factor-Strategy (FS) quality model and the Factor-Criteria-Metric (FCM) quality model. Principle violation check definitions in the form of Boolean expressions are also defined to conduct software measurement and to fulfill the metrics. Finally, the aspect-oriented software maintainability metrics are applied to detect design principle violations in fifty AspectJ systems. The results show that for all systems their hidden flaws are exposed. Moreover, the proposed metrics are used to compare the maintainability between two versions of systems written in Java and AspectJ.
The primary goal of software quality engineering is to produce a high quality software product through the use of some specific techniques and processes. One strategy is applying data mining techniques to software metric and defect data collected during the software development process to identify potential low-quality program modules. In this paper, we investigate the use of feature selection in the context of software quality estimation (also referred to as software defect prediction), where a classification model is used to predict whether program modules (instances) are fault-prone or not-fault-prone. Seven filter-based feature ranking techniques are examined. Among them, six are commonly used, and the other one, named signal to noise ratio (SNR), is rarely employed. The objective of the paper is to compare these seven techniques for various software data sets and assess their effectiveness for software quality modeling. A case study is performed on 16 software data sets, and classification models are built with five different learners and evaluated with two performance metrics. Our experimental results are summarized based on statistical tests for significance. The main conclusion is that the SNR technique performs as well as the best performer of the six commonly used techniques.
We present an analysis of the evolution of a Web application project developed with object-oriented technology and an agile process. During the development we systematically performed measurements on the source code, using software metrics that have been proved to be correlated with software quality, such as the Chidamber and Kemerer suite and Lines of Code metrics. We also computed metrics derived from the class dependency graph, including metrics derived from Social Network Analysis. The application development evolved through phases, characterized by a different level of adoption of some key agile practices — namely pair programming, test-based development and refactoring. The evolution of the metrics of the system, and their behavior related to the agile practices adoption level, is presented and discussed. We show that, in the reported case study, a few metrics are enough to characterize with high significance the various phases of the project. Consequently, software quality, as measured using these metrics, seems directly related to agile practices adoption.
The term "software entropy" refers to the tendency for software, over time, to become difficult and costly to maintain. A software system that undergoes continuous change, such as having new functionality added to its original design, will eventually become more complex and can become disorganized as it grows, losing its original design structure.
A recent study show that software degradation may be measured using the WMC expressed in terms of Shannon entropy. In this paper we extended the empirical analyses also to RFC and CBO since these CK metrics have been shown to be correlated with fault-proneness of OO classes.
We analyzed various releases of the publicly available Eclipse and Netbeans software systems, calculating the entropy of some CK metrics for every release analyzed. The validity is shown through a direct measure of software quality such as the number of detected defects. Our results display a very good correlation between the entropy of CBO and RFC and the number of bugs for Eclipse and Netbeans.
Complexity and quality metrics are in general computed on every system module while the entropy is just a scalar number that characterizes a whole system, this result suggests that the entropy of some CK metrics could be considered as a global quality metric for large software systems. Our results need, however, to be confirmed for other large software systems.
Software defect prediction models that use software metrics such as code-level measurements and defect data to build classification models are useful tools for identifying potentially-problematic program modules. Effectiveness of detecting such modules is affected by the software measurements used, making data preprocessing an important step during software quality prediction. Generally, there are two problems affecting software measurement data: high dimensionality (where a training dataset has an extremely large number of independent attributes, or features) and class imbalance (where a training dataset has one class with relatively many more members than the other class). In this paper, we present a novel form of ensemble learning based on boosting that incorporates data sampling to alleviate class imbalance and feature (software metric) selection to address high dimensionality. As we adopt two different sampling methods (Random Undersampling (RUS) and Synthetic Minority Oversampling (SMOTE)) in the technique, we have two forms of our new ensemble-based approach: selectRUSBoost and selectSMOTEBoost. To evaluate the effectiveness of these new techniques, we apply them to two groups of datasets from two real-world software systems. In the experiments, four learners and nine feature selection techniques are employed to build our models. We also consider versions of the technique which do not incorporate feature selection, and compare all four techniques (the two different ensemble-based approaches which utilize feature selection and the two versions which use sampling only). The experimental results demonstrate that selectRUSBoost is generally more effective in improving defect prediction performance than selectSMOTEBoost, and that the techniques with feature selection do help for getting better prediction than the techniques without feature selection.
The basic measurements for software quality control and management are the various project and software metrics collected at various states of a software development life cycle. The software metrics may not all be relevant for predicting the fault proneness of software components, modules, or releases. Thus creating the need for the use of feature (software metric) selection. The goal of feature selection is to find a minimum subset of attributes that can characterize the underlying data with results as well as, or even better than the original data when all available features are considered. As an example of inter-disciplinary research (between data science and software engineering), this study is unique in presenting a large comparative study of wrapper-based feature (or attribute) selection techniques for building defect predictors. In this paper, we investigated thirty wrapper-based feature selection methods to remove irrelevant and redundant software metrics used for building defect predictors. In this study, these thirty wrappers vary based on the choice of search method (Best First or Greedy Stepwise), leaner (Naïve Bayes, Support Vector Machine, and Logistic Regression), and performance metric (Overall Accuracy, Area Under ROC (Receiver Operating Characteristic) Curve, Area Under the Precision-Recall Curve, Best Geometric Mean, and Best Arithmetic Mean) used in the defect prediction model evaluation process. The models are trained using the three learners and evaluated using the five performance metrics. The case study is based on software metrics and defect data collected from a real world software project.
The results demonstrate that Best Arithmetic Mean is the best performance metric used within the wrapper. Naïve Bayes performed significantly better than Logistic Regression and Support Vector Machine as a wrapper learner on slightly and less imbalanced datasets. We also recommend Greedy Stepwise as a search method for wrappers. Moreover, comparing to models built with full datasets, the performances of defect prediction models can be improved when metric subsets are selected through a wrapper subset selector.
Defect prediction is very challenging in software development practice. Classification models are useful tools that can help for such prediction. Classification models can classify program modules into quality-based classes, e.g. fault-prone (fp) or not-fault-prone (nfp). This facilitates the allocation of limited project resources. For example, more resources are assigned to program modules that are of poor quality or likely to have a high number of faults based on the classification. However, two main problems, high dimensionality and class imbalance, affect the quality of training datasets and therefore classification models. Feature selection and data sampling are often used to overcome these problems. Feature selection is a process of choosing the most important attributes from the original dataset. Data sampling alters the dataset to change its balance level. Another technique, called boosting (building multiple models, with each model tuned to work better on instances misclassified by previous models), is found to also be effective for resolving the class imbalance problem.
In this study, we investigate an approach for combining feature selection with this ensemble learning (boosting) process. We focus on two different scenarios: feature selection performed prior to the boosting process and feature selection performed inside the boosting process. Ten individual base feature ranking techniques, as well as an ensemble ranker based on the ten, are examined and compared over the two scenarios. We also employ the boosting algorithm to construct classification models without performing feature selection and use the results as the baseline for further comparison. The experimental results demonstrate that feature selection is important and needed prior to the learning process. In addition, the ensemble feature ranking method generally has better or similar performance than the average of the base ranking techniques, and more importantly, the ensemble method exhibits better robustness than most base ranking techniques. As for the two scenarios, the results show that applying feature selection inside boosting performs better than using feature selection prior to boosting.
Software quality modeling is the process of using software metrics from previous iterations of development to locate potentially faulty modules in current under-development code. This has become an important part of the software development process, allowing practitioners to focus development efforts where they are most needed. One difficulty encountered in software quality modeling is the problem of high dimensionality, where the number of available software metrics is too large for a classifier to work well. In this case, many of the metrics may be redundant or irrelevant to defect prediction results, thereby selecting a subset of software metrics that are the best predictors becomes important. This process is called feature (metric) selection. There are three major forms of feature selection: filter-based feature rankers, which uses statistical measures to assign a score to each feature and present the user with a ranked list; filter-based feature subset evaluation, which uses statistical measures on feature subsets to find the best feature subset; and wrapper-based subset selection, which builds classification models using different subsets to find the one which maximizes performance. Software practitioners are interested in which feature selection methods are best at providing the most stable feature subset in the face of changes to the data (here, the addition or removal of instances). In this study we select feature subsets using fifteen feature selection methods and then use our newly proposed Average Pairwise Tanimoto Index (APTI) to evaluate the stability of the feature selection methods. We evaluate the stability of feature selection methods on a pair of subsamples generated by our fixed-overlap partitions algorithm. Four different levels of overlap are considered in this study. 13 software metric datasets from two real-world software projects are used in this study. Results demonstrate that ReliefF (RF) is the most stable feature selection method and wrapper based feature subset selection shows least stability. In addition, as the overlap of partitions increased, the stability of the feature selection strategies increased.
Unnecessary repeated codes, also known as code clones, have not been well documented and are difficult to maintain. Code clones may become an important problem in the software development cycle, since any detected error must be fixed in all occurrences. This condition significantly increases software maintenance costs and requires effort/duration for understanding the code. This research introduces a novel methodology to minimize or prevent the code cloning problem in software projects. In particular, this manuscript is focused on the detection of structural code clones, which are defined as similarity in software structure such as design patterns. Our proposed methodology provides a solution to the class-level structural code clone detection problem. We introduce a novel software architecture that provides unification of different software quality analysis tools that take measurements for software metrics for structural code clone detection. We present an empirical evaluation of our approach and investigate its practical usefulness. We conduct a user study using human judges to detect structural code clones in three different open-source software projects. We apply our methodology to the same projects and compare results. The results show that our proposed solution is able to show high consistency compared with the results reached by the human judges. The outcome of this study also indicates that a uniform structural code clone detection system can be built on top of different software quality tools, where each tool takes measurements of different object-oriented software metrics.
Object-oriented software systems contain large number of modules which make unit testing, integration testing, and system testing very difficult and challenging. While the aim of unit testing is to show that individual modules are working properly and the aim of the system testing is to determine whether the whole system meets its specifications, the aim of integration testing is to uncover errors in the interactions between system modules. However, it is generally impossible to test all connections between modules because of time and budget constraints. Thus, it is important to focus the testing on the connections presumed to be more error-prone.
The goal of this work is to guide software testers where in a software system to focus when performing integration testing to save time and resources. This paper proposes a new approach to predict and rank error-prone connections. We use method level metrics that capture both dependencies and internal complexity of methods. We performed experiments on several Java applications and used error seeding techniques for evaluation. The experimental results showed that our approach is effective for selecting the test focus in integration testing.
Testing is a vital part of achieving good-quality software. Deploying untested code can cause system crashes and unexpected behavior. To reduce these problems, testing should evolve with coding. In addition, test suites should not remain static throughout the software versions. Since whenever software gets updated, new functionalities are added, or existing functionalities are changed, test suites should be updated along with the software. Software repositories contain valuable information about the software systems. Access to older versions and differentiating adjacent versions’ source code and acceptance test changes can provide information about the evolution process of the software. This research proposes a method and implementation to analyze 21 open-source real-world projects hosted on GitHub regarding the co-evolution of both software and its acceptance test suites. Related projects are retrieved from repositories, their versions are analyzed, graphs are created, and analysis related to the co-evolution process is performed. Observations show that the source code is getting updated more frequently than the acceptance tests. They indicate a pattern that source code and acceptance tests do not evolve together. Moreover, the analysis showed that a few acceptance tests test most of the functionalities that take a significant line of code.
Association rule mining is a highly effective approach to data analysis for datasets of varying sizes, accommodating diverse feature values. Nevertheless, deriving practical rules from datasets with numerical variables presents a challenge, as these variables must be discretized beforehand. Quantitative association rule mining addresses this issue, allowing the extraction of valuable rules. This paper introduces an extension to quantitative association rules, incorporating a two-variable function in their consequent part. The use of correlation functions, statistical test functions, and error functions is also introduced. We illustrate the utility of this extension through three case studies employing software engineering datasets. In case study 1, we successfully pinpointed the conditions that result in either a high or low correlation between effort and software size, offering valuable insights for software project managers. In case study 2, we effectively identified the conditions that lead to a high or low correlation between the number of bugs and source lines of code, aiding in the formulation of software test planning strategies. In case study 3, we applied our approach to the two-step software effort estimation process, uncovering the conditions most likely to yield low effort estimation errors.
The use of empirical data to understand and improve software products and software engineering processes is gaining ever increasing attention. Empirical data from products and processes is needed to help an organization understand and improve its way of doing business in the software domain. Additional motivation for collecting and using data is provided by the need to conform to guidelines and standards which mandate measurement, specifically the SEI’s Capability Maturity Model and ISO 9000–3. Some software engineering environments (SEEs) offer automated support for collecting and, in a few cases, using empirical data. Measurement will clearly play a significant role in future SEEs. The paper surveys the trend towards supporting measurement in SEEs and gives details about several existing research and commercial software systems.
Software testing methodologies and metrics are key issues to improve and assure software quality. They are important areas in the research of Software Engineering. However, not many focus on the testing criteria and metrics evaluation of a class hierarchy. This paper introduces an important concept named URI (Unit Repeated Inheritance) to realize integrated object-oriented testing and object-oriented metrics. The approach describes an ILT (Inheritance Level Technique) method as a guide to test object-oriented software errors and measure the software complexity of a class hierarchy. The proposed technique shows that inheritance has a close relation to object-oriented software complexity and reveals that overuse of repeated(multiple) inheritance will increase software complexity and be prone to implicit software errors. Also, two test criteria: intra level first and inter level first are presented.
Software metrics serve as a significant issue to improve software quality. It is an important research of software engineering. In line with the methodologies of object-oriented analysis and design widely developed, many software metrics techniques have been proposed. However, not many focus on the metrics evaluation of an inheritance hierarchy. In this paper, we propose a novel concept named unit repeated inheritance (URI) in Z notation to realize object-oriented software metrics. The approach describes an inheritance level technique (ILT) method as a guide to measure the software complexity of an inheritance hierarchy. The measures of inheritance metrics are formed based on the proposed mechanism. Also, we use Lex and Yacc to construct a windowing tool which is used in conjunction with a conventional C++ programming environment to assist a programmer to analyze and measure his/her C++ programs.
Predicting the quality of system modules prior to software testing and operations can benefit the software development team. Such a timely reliability estimation can be used to direct cost-effective quality improvement efforts to the high-risk modules. Tree-based software quality classification models based on software metrics are used to predict whether a software module is fault-prone or not fault-prone. They are white box quality estimation models with good accuracy, and are simple and easy to interpret.
An in-depth study of calibrating classification trees for software quality estimation using the SPRINT decision tree algorithm is presented. Many classification algorithms have memory limitations including the requirement that datasets be memory resident. SPRINT removes all of these limitations and provides a fast and scalable analysis. It is an extension of a commonly used decision tree algorithm, CART, and provides a unique tree pruning technique based on the Minimum Description Length (MDL) principle. Combining the MDL pruning technique and the modified classification algorithm, SPRINT yields classification trees with useful accuracy. The case study used consists of software metrics collected from a very large telecommunications system. It is observed that classification trees built by SPRINT are more balanced and demonstrate better stability than those built by CART.
Biggerstaff and Richter suggest that there are four fundamental subtasks associated with operationalizing the reuse process [1]. They are finding reusable components, understanding these components, modifying these components, and composing components. Each of these sub-problems can be re-expressed as a knowledge acquisition sub-problem relative to producing a new representation for the components that make them more suitable for future reuse.
In this paper, we express the first two subtasks for the software reuse activity, as described by Biggerstaff and Richter, as a problem in Machine Learning. From this perspective, the goal of software reuse is to learn to recognize reusable software in terms of code structure, run-time behavior, and functional specification. The Partial Metrics (PM) System supports the acquisition of reusable software at three different levels of granularity: the system level, the procedural level, and the code segment level. Here, we describe how the system extracts procedural knowledge from an example Pascal software system that satisfies a set of structural, behavioral, and functional constraints. These constraints are extracted from a set of positive and negative examples using inductive learning techniques. The constraints are expressed quantitatively in terms of various quality models and metrics. The general characteristics of learned constraints that were extracted from a variety of applications libraries are discussed.
This paper presents a discussion of significant issues in selection of a standardized set of the “best” software metrics to support a software reuse program. This discussion illustrates the difficulty in selection of a standardized set of reuse metrics because the “best” reuse metrics are determined by unique characteristics of each reuse application. An example of the selection of a single set of reuse metrics for a specific management situation is also presented.
In a broad spectrum, software metrics play a vital role in attribute assessment, which successively moves software projects. The metrics measure gives many crucial facets of the system, enhancing the system quality of software developed. Moreover, maintenance is the correction process that works out in the software system once the software is initially made. The noteworthy characteristic of any software is ‘change,’ and as a result, additional concern ought to be taken in developing software. So, the software is expected to be modified effortlessly (maintainable). Predicting software maintainability is still challenging, and accurate prediction models with low error rates are required. Since there are so many modern programming languages on the horizon. To accurately measure software maintainability, new techniques have to been introduced. This paper proposes a maintainability index (MI) by considering various software metrics by which the error gets minimized. It also intends to adopt a renowned optimization algorithm, namely Firefly (FF), for the optimum result. The proposed Base Model-FF is compared to other traditional models like BM-Differential Evolution (BM-DE), BM-Artificial Bee Colony (BM-ABC), BM-Particle Swarm Optimization (BM-PSO), and BM-Genetic Algorithm (BM- GA) in terms of performance metrics like Differential ratio, correlation coefficient, and Random Mean Square Error (RMSE).
Embedded-computer systems have become essential to life in modern society. For example, the backbone of society's information infrastructure is telecommunications. Embedded systems must have highly reliable software, so that we avoid the severe consequences of failures, intolerable down-time, and expensive repairs in remote locations. Moreover, today's fast-moving technology marketplace mandates that embedded systems evolve, resulting in multiple software releases embedded in multiple products.
Software quality models can be valuable tools for software engineering of embedded systems, because some software-enhancement techniques are so expensive or time-consuming that it is not practical to apply them to all modules. Targeting such enhancement techniques is an effective way to reduce the likelihood of faults discovered in the field. Research has shown software metrics to be useful predictors of software faults. A software quality model is developed using measurements and fault data from a past release. The calibrated model is then applied to modules currently under development. Such models yield predictions on a module-by-module basis.
This paper examines the Classification And Regression Trees (CART) algorithm for building tree-based models that predict which software modules have high risk of faults to be discovered during operations. CART is attractive because it emphasizes pruning to achieve robust models. This paper presents details on the CART algorithm in the context of software engineering of embedded systems. We illustrate this approach with a case study of four consecutive releases of software embedded in a large telecommunications system. The level of accuracy achieved in the case study would be useful to developers of an embedded system. The case study indicated that this model would continue to be useful over several releases as the system evolves.
Inspection is widely believed to be one of the most cost-effective methods for detection of defects in the work products produced during software development. However, the inspection process, by its very nature, is labor intensive and for delivering value, they have to be properly executed and controlled. While controlling the inspection process, the inspection module size is a key control parameter. Larger module size can lead to an increased leakage of defects which increases the cost since rework in the subsequent phases is more expensive. Small module size reduces the defect leakage but increases the number of inspections. In this paper, we formulate a cost model for an inspection process using which the total cost can be minimized. We then use the technique of Design of Experiments to study how the optimum module size varies with some of the key parameters of the inspection process, and determine the optimum module size for different situations.
A rule-based classification model is presented to identify high-risk software modules. It utilizes the power of rough set theory to reduce the number of attributes, and the equal frequency binning algorithm to partition the values of the attributes. As a result, a set of conjuncted Boolean predicates are formed. The model is inherently influenced by the practical needs of the system being modeled, thus allowing the analyst to determine which rules are to be used for classifying the fault-prone and not fault-prone modules. The proposed model also enables the analyst to control the number of rules that constitute the model. Empirical validation of the model is accomplished through a case study of a large legacy telecommunications system. The ease of rule interpretation and the transparency of the functional aspects of the model are clearly demonstrated. It is concluded that the new model is effective in achieving the software quality classification.
Classification, an important data mining function that assigns class label to items in a collection, is of practical applications in various domains. In software engineering, for instance, a common classification problem is to determine the quality of a software item. In such a problem, software metrics represent the independent features while the fault proneness represents the class label. With many classification problems, one must often deal with the presence of irrelevant features in the feature space. That, coupled with class imbalance, renders the task of discriminating one class from another rather difficult. In this study, we empirically evaluate our proposed wrapper-based feature ranking where nine performance metrics aided by a particular learner and a methodology are considered. We examine five learners and take three different approaches, each in conjunction with one of three different methodologies: 3-fold Cross-Validation, 3-fold Cross-Validation Risk Impact, and a combination of the two. In this study, we consider two sets of software engineering datasets. To evaluate the classifier performance after feature selection has been applied, we use Area Under Receiver Operating Characteristic curve as the performance evaluator. We investigate the performance of feature selection as we vary the three factors that form the foundation of the wrapper-based feature ranking. We show that the performance is conditioned by not only the choice of methodology but also the learner. We also evaluate the effect of sampling on wrapper-based feature ranking. Finally, we provide guidance as to which software metrics are relevant in software defect prediction problems and how the number of software metrics can be selected when using wrapper-based feature ranking.
This paper presents a method proposal for estimation of software reliability before the implementation phase. The method is based upon that a formal specification technique is used and that it is possible to develop a tool performing dynamic analysis, i.e., locating semantic faults in the design. The analysis is performed with both applying a usage profile as input as well as doing a full analysis, i.e., locate all faults that the tool can find. The tool must provide failure data in terms of time since the last failure was detected. The mapping of the dynamic failures to the failures encountered during statistical usage testing and operation is discussed. The method can be applied either on the software specification or as a step in the development process by applying it on the software design. The proposed method allows for software reliability estimations that can be used both as a quality indicator, and for planning and controlling resources, development times, etc. at an early stage in the development of software systems.
Software measurement and modeling is intended to improve quality by predicting quality factors, such as reliability, early in the life cycle. The field of software measurement generally assumes that attributes of software products early in the life cycle are somehow related to the amount of information in those products, and thus, are related to the quality that eventually results from the development process.
Kolmogorov complexity and information theory offer a way to quantify the amount of information in a finite object, such as a program, in a unifying framework. Based on these principles, we propose a new synthetic measure of information composed from a set of conventional primitive metrics in a module. Since not all information is equally relevant to fault-insertion, we also consider components of the overall information content. We present a model for fault-insertion based on a nonhomogeneous Poisson process and Poisson regression. This approach is attractive, because the underlying assumptions are appropriate for software quality data. This approach also gives insight into design attributes that affect fault insertion.
A validation case study of a large sample of modules from a very large telecommunications system provides empirical evidence that the components of synthetic module complexity can be useful in software quality modeling. A large telecommunications system is an example of a computer system with rigorous software quality requirements.
Reliable software is mandatory for complex mission-critical systems. Classifying modules as fault-prone, or not, is a valuable technique for guiding development processes, so that resources can be focused on those parts of a system that are most likely to have faults.
Logistic regression offers advantages over other classification modeling techniques, such as interpretable coefficients. There are few prior applications of logistic regression to software quality models in the literature, and none that we know of account for prior probabilities and costs of misclassification. A contribution of this paper is the application of prior probabilities and costs of misclassification to a logistic regression-based classification rule for a software quality model.
This paper also contributes an integrated method for using logistic regression in software quality modeling, including examples of how to interpret coefficients, how to use prior probabilities, and how to use costs of misclassifications. A case study of a major subsystem of a military, real-time system illustrates the techniques.
Due to the central role that conceptual data models play in the design of databases, it is crucial to assure their quality since the early phases of database life cycle. For assessing (and if it is necessary improving) conceptual data model quality it is necessary to dispose of quantitative and objective measures in order to avoid bias in the quality evaluation process. It is in this context that software measurement can help IS designers to make better decision during design activities. The main interest of this article is to provide a state-of-the-art measures for conceptual data models.
This article provides an overview of the basic concepts and state of the art of software measurement. Software measurement is an emerging field of software engineering, since it may provide support for planning, controlling, and improving the software development process, as needed in any industrial development process. Due to the human-intensive nature of software development and its relative novelty, some aspects of software measurement are probably closer to measurement for the social sciences than measurement for the hard sciences. Therefore, software measurement faces a number of challenges whose solution requires both innovative techniques and borrowings from other disciplines. Over the years, a number of techniques and measures have been proposed and assessed via theoretical and empirical analyses. This shows the theoretical and practical interest of the software measurement field, which is constantly evolving to provide new, better techniques to support existing and more recent software engineering development methods.
Improving field performance of telecommunication systems is a key objective of both telecom suppliers and operators, as an increasing amount of business critical systems worldwide are relying on dependable telecommunication. Early defect detection improves field performance in terms of reduced field failure rates and reduced intrinsic downtime. Cost-effective software project management will focus resources towards intensive validation of those areas with highest criticality. This article outlines techniques for identifying such critical areas in software systems. It concentrates on the practical application of criticality-based predictions in industrial development projects, namely the selection of a classification technique and the use of the results in directing management decisions. The first part is comprehensively comparing and evaluating five common classification techniques (Pareto classification, classification trees, factor-based discriminant analysis, fuzzy classification, neural networks) for identifying critical components. Results from a large-scale industrial switching project are included to show the practical benefits. Knowing which technique should be applied to the second area gains even more attention: What are the impacts for practical project management within given resource and time constraints? Several selection criteria based on the results of a combined criticality and history analysis are provided together with concrete implementation decisions.
The use of empirical data to understand and improve software products and software engineering processes is gaining ever increasing attention. Empirical data from products and processes izs needed to help an organization understand and improve its way of doing business in the software domain. Additional motivation for collecting and using data is provided by the need to conform to guidelines and standards which mandate measurement, specifically the SEI's Capability Maturity Model and ISO 9000-3. Some software engineering environments (SEEs) offer automated support for collecting and, in a few cases, using empirical data. Measurement will clearly play a significant role in future SEEs. The paper surveys the trend towards supporting measurement in SEEs and gives details about several existing research and commercial software systems.
Because highly reliable software is becoming an essential ingredient in many systems, software developers apply various techniques to discover faults early in development, such as more rigorous reviews, more extensive testing, and strategic assignment of key personnel. Our goal is to target reliability enhancement activities to those modules that are most likely to have problems. This paper presents a methodology that incorporates genetic programming for predicting the order of software modules based on the expected number of faults. This is the first application of genetic programming to software engineering that we know of. We found that genetic programming can be used to generate software quality models whose inputs are software metrics collected earlier in development, and whose output is a prediction of the number of faults that will be discovered later in development or during operations. We established ordinal evaluation criteria for models, and conducted an industrial case study of software from a military communications system. Case study results were sufficiently good to be useful to a project for choosing modules for extra reliability enhancement treatment.