Please login to be able to save your searches and receive alerts for new content matching your search criteria.
"Knowledge discovery in data bases" (KDD) for software engineering is a process for finding useful information in the large volumes of data that are a byproduct of software development, such as data bases for configuration management and for problem reporting. This paper presents guidelines for extracting innovative process metrics from these commonly available data bases. This paper also adapts the Classification And Regression Trees algorithm, CART, to the KDD process for software engineering data. To our knowledge, this algorithm has not been used previously for empirical software quality modeling. In particular, we present an innovative way to control the balance between misclassification rates. A KDD case study of a very large legacy telecommunications software system found that variables derived from source code, configuration management transactions, and problem reporting transactions can be useful predictors of software quality. The KDD process discovered that for this software development environment, out of forty software attributes, only a few of the predictor variables were significant. This resulted in a model that predicts whether modules are likely to have faults discovered by customers. Software developers need such predictions early in development to target software enhancement techniques to the modules that need improvement the most.
Most complex systems today contain software, and systems failures activated by software faults can provide lessons for software development practices and software quality assurance. This paper presents an analysis of software-related failures of medical devices that caused no death or injury but led to recalls by the manufacturers. The analysis categorizes the failures by their symptoms and faults, and discusses methods of preventing and detecting faults in each category. The nature of the faults provides lessons about the value of generally accepted quality practices for prevention and detection methods applied prior to system release. It also provides some insight into the need for formal requirements specification and for improved testing of complex hardware-software systems.
Building on our earlier work in detecting high risk software modules in object-oriented systems, we extend the two group discriminant classification model to three risk groups. First, we give an overview of the discriminant modeling methodology. Using traditional and object-oriented software product measures collected from a commercial system, we develop two discriminant fault models. One model incorporates only traditional measures while the other model includes both traditional and object-oriented measures. The independent variables of both models are principal components derived from the observed software measures. The models are used to classify the modules comprising the system into three groups: high, medium, and low risk. Quality of fit and classification performance of both models are reported.
We show that for this case study, the addition of the object-oriented measures enhances the model by reducing the overall misclassification rate and significantly reducing the misclassifications in the medium group. Last of all, we tender a cost based method to determine under what condition a three group model is superior to the simpler two group model. Our results suggest that additional case studies are needed to help develop a clearer picture of three group discriminant models and the utility of object-oriented software measures in general.
Software refactorization is a process of changing program's source code structure without changing its functionality. The purpose of the refactorization is to make program's source code easier to understand and maintain, which in turn influence the fact that in a long term such code should have fewer errors (be more reliable). In recent years many works described refactorization, but till now there are no researches, which would assess long term influence of refactoring on reliability. In this work we try to depict our fundamental study on software systems reliability improvement in context of refactoring. We tried to find the answer to the question: What are benefits of using refactorization as far as reliability is concerned?
In this paper, we investigate a dynamic software quality model that incorporates software process and software product measures as covariates. Furthermore, the model is not based on execution time between failures. Instead, the method relies on data commonly available from simple problem tracking and source code control systems. Fault counts, testing effort, and code churn measures are collected from each build during the system test phase of a large telecommunications software system. We use this data to predict the number of faults to expect from one build to the next. The technique we use is called time series analysis and forecasting. The methodology assumes that future predictions are based on the history of past failures and related covariates. We show that the quality model incorporating testing effort as a covariate is better than the quality model derived from fault counts alone.
Due to various reasons such as ever increasing demands of the customer or change in the environment or detection of a bug, changes are incorporated in a software. This results in multiple versions or evolving nature of a software. Identification of parts of a software that are more prone to changes than others is one of the important activities. Identifying change prone classes will help developers to take focused and timely preventive actions on the classes of the software with similar characteristics in the future releases. In this paper, we have studied the relationship between various object oriented (OO) metrics and change proneness. We collected a set of OO metrics and change data of each class that appeared in two versions of an open source dataset, 'Java TreeView', i.e., version 1.1.6 and version 1.0.3. Besides this, we have also predicted various models that can be used to identify change prone classes, using machine learning and statistical techniques and then compared their performance. The results are analyzed using Area Under the Curve (AUC) obtained from Receiver Operating Characteristics (ROC) analysis. The results show that the models predicted using both machine learning and statistical methods demonstrate good performance in terms of predicting change prone classes. Based on the results, it is reasonable to claim that quality models have a significant relevance with OO metrics and hence can be used by researchers for early prediction of change prone classes.
Software Quality has many parameters that govern its value. Of them, usually, Reliability has gained much attention of researchers and practitioners. However, today’s ever-demanding environment poses severe challenges in front of software creators as to continue treating Reliability as one of the most important attributes for governing software quality when other important parameters like re-usability, security and resilience to name a few are also available. Evaluating, ranking and selecting the most approximate attribute to govern the software quality is a complex concern, which technically requires a multi-criteria decision-making environment. Through this paper, we have proposed an Intuitionistic Fuzzy Set-based TOPSIS approach to showcase why reliability is one of the most preferable parameters for governing software quality. In order to collate individual opinions of decision makers; software developers of various firms were administered for rating the importance of various criteria and alternatives.
This paper presents a method proposal for estimation of software reliability before the implementation phase. The method is based upon that a formal specification technique is used and that it is possible to develop a tool performing dynamic analysis, i.e., locating semantic faults in the design. The analysis is performed with both applying a usage profile as input as well as doing a full analysis, i.e., locate all faults that the tool can find. The tool must provide failure data in terms of time since the last failure was detected. The mapping of the dynamic failures to the failures encountered during statistical usage testing and operation is discussed. The method can be applied either on the software specification or as a step in the development process by applying it on the software design. The proposed method allows for software reliability estimations that can be used both as a quality indicator, and for planning and controlling resources, development times, etc. at an early stage in the development of software systems.
This paper introduces a statistical model for analyzing and assessing human-performance parameters such as software quality and productivity. We may refer to this model as a growth curve model because, for suitably chosen values of its constants, we can fit a learning curve to data with trend characteristics. Using data from a programming professional's experience with a range of exercises, we demonstrate the use of this model for assessing software quality and productivity. The model uses a nonhomogeneous autoregressive process, in which Bayesian statistical procedures have been developed. To obtain the desired capability, the model uses more complex statistical methods and requires more extensive initial condition estimates than methods based on exponential smoothing. As a consequence, this model should generally provide superior facilities for tracking and measuring overall trends. Since this is a trend-following model, its projections will tend to excel for highly trended data. On the other hand, since each new value is projected by applying the current trend information to the most recent data, its projections will not be very accurate in highly variable situations. The methodology of this paper is general and can be applied to situations other than those of software quality and productivity.
In this work, we present a genetic algorithm to optimize predictive models used to estimate software quality characteristics. Software quality assessment is crucial in the software development field since it helps reduce cost, time and effort. However, software quality characteristics cannot be directly measured but they can be estimated based on other measurable software attributes (such as coupling, size and complexity). Software quality estimation models establish a relationship between the unmeasurable characteristics and the measurable attributes. However, these models are hard to generalize and reuse on new, unseen software as their accuracy deteriorates significantly. In this paper, we present a genetic algorithm that adapts such models to new data. We give empirical evidence illustrating that our approach out-beats the machine learning algorithm C4.5 and random guess.
Software inspection is a proven method that enables the detection and removal of defects in software artifacts as soon as these artifacts are created. It usually involves activities in which a team of qualified personnel determines whether the created artifact is of sufficient quality. Detected quality deficiencies are subsequently corrected. In this way, an inspection cannot only contribute towards software quality improvement, but also lead to significant budget and time benefits. These advantages have already been demonstrated in many software development projects and organizations.
After Fagan's seminal paper presented in 1976, the body of work in software inspection has greatly increased and matured. This survey is to provide an overview of the large body of contributions in the form of incremental improvements and/or new methodologies that have been proposed to leverage and amplify the benefits of inspections within software development and even maintenance projects. To structure this large volume of work, it introduces, as a first step, the core concepts and relationships that together embody the field of software inspection. In a second step, the survey discusses the inspection-related work in the context of the presented taxonomy.
The survey is beneficial for researchers as well as practitioners. Researchers can use the presented survey taxonomy to evaluate existing work in this field and identify new research areas. Practitioners, on the other hand, get information on the reported benefits of inspections. Moreover, they find an explanation of the various methodological variations and get guidance on how to instantiate the various taxonomy dimensions for the purpose of tailoring and performing inspections in their software projects.
Improving field performance of telecommunication systems is a key objective of both telecom suppliers and operators, as an increasing amount of business critical systems worldwide are relying on dependable telecommunication. Early defect detection improves field performance in terms of reduced field failure rates and reduced intrinsic downtime. Cost-effective software project management will focus resources towards intensive validation of those areas with highest criticality. This article outlines techniques for identifying such critical areas in software systems. It concentrates on the practical application of criticality-based predictions in industrial development projects, namely the selection of a classification technique and the use of the results in directing management decisions. The first part is comprehensively comparing and evaluating five common classification techniques (Pareto classification, classification trees, factor-based discriminant analysis, fuzzy classification, neural networks) for identifying critical components. Results from a large-scale industrial switching project are included to show the practical benefits. Knowing which technique should be applied to the second area gains even more attention: What are the impacts for practical project management within given resource and time constraints? Several selection criteria based on the results of a combined criticality and history analysis are provided together with concrete implementation decisions.
“Knowledge” is one of the main results of software engineering, software projects and software process improvement. During software engineering projects, developers learn to apply certain technologies and how to solve particular development problems. During the process of software improvement developers and managers learn how effective and efficient their development processes are, and how to improve these processes. As “learning” is so important in software practice, it is logical to examine it more closely. What is learning? How does learning take place? Is it possible to improve the conditions of learning?
This chapter presents an overview of learning theories and the application of these theories in the software-engineering domain. It is not our intention to be complete; our objective is to show how established learning theories can help to facilitate learning in software development practice.
Because highly reliable software is becoming an essential ingredient in many systems, software developers apply various techniques to discover faults early in development, such as more rigorous reviews, more extensive testing, and strategic assignment of key personnel. Our goal is to target reliability enhancement activities to those modules that are most likely to have problems. This paper presents a methodology that incorporates genetic programming for predicting the order of software modules based on the expected number of faults. This is the first application of genetic programming to software engineering that we know of. We found that genetic programming can be used to generate software quality models whose inputs are software metrics collected earlier in development, and whose output is a prediction of the number of faults that will be discovered later in development or during operations. We established ordinal evaluation criteria for models, and conducted an industrial case study of software from a military communications system. Case study results were sufficiently good to be useful to a project for choosing modules for extra reliability enhancement treatment.
We propose an approach to approximate reasoning by systems of intelligent agents based on the paradigm of rough mereology. In this approach, the knowledge of each agent is formalized as an information system (a data table) from which similarity measures on objects manipulated by this agent are inferred. These similarity measures are based on rough mereological inclusions which formally render degrees for one object to be a part of another. Each agent constructs in this way its own rough mereological logic in which it is possible to express approximate statements of the type: an object x satisfies a predicate Ψ in degree r. The agents communicate by means of mereological functors (connectives among distinct rough mereological logics) propagating similarity measures from simpler to more complex agents; establishing these connectives is the main goal of negotiations among agents. The presented model of approximate reasoning entails such models of approximate reasoning like fuzzy controllers, neural networks etc. Our approach may be termed analytic, in the sense that all basic constructs are inferred from data.