Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Significant effort is being put into developing industrial applications for artificial intelligence (AI), especially those using machine learning (ML) techniques. Despite the intensive support for building ML applications, there are still challenges when it comes to evaluating, assuring, and improving the quality or dependability. The difficulty stems from the unique nature of ML, namely, system behavior is derived from training data not from logical design by human engineers. This leads to black-box and intrinsically imperfect implementations that invalidate many principles and techniques in traditional software engineering. In light of this situation, the Japanese industry has jointly worked on a set of guidelines for the quality assurance of AI systems (in the Consortium of Quality Assurance for AI-based Products and Services) from the viewpoint of traditional quality-assurance engineers and test engineers. We report on the second version of these guidelines, which cover a list of quality evaluation aspects, catalogue of current state-of-the-art techniques, and domain-specific discussions in five representative domains. The guidelines provide significant insights for engineers in terms of methodologies and designs for tests driven by application-specific requirements.
In software architecture design, we explore design alternatives and make decisions about adoption or rejection of a design from a web of complex and often uncertain information. Different architectural design decisions may lead to systems that satisfy the same set of functional requirements but differ in certain quality attributes. In this paper, we propose a Bayesian Network based approach to rational architectural design. Our Bayesian Network helps software architects record and make design decisions. We can perform both qualitative and quantitative analysis over the Bayesian Network to understand how the design decisions influence system quality attributes, and to reason about rational design decisions. We use the KWIC (Key Word In Context) example to illustrate the principles of our approach.
The video game industry is becoming increasingly important due to its revenues and growing capabilities. User eXperience (UX) is an important factor which contributes to the acceptance of a video game. The UX is usually assessed at the end of the development process, and for this reason it is difficult to ensure an adequate level of interactive experience between computer game and players. Cancelation of projects or even bankruptcy of a company can be caused by bad management of UX. In this paper, we propose the game experience management (GEM), a method to evaluate, manage, measure and track the UX from early stages of computer game development. In order to compare the proposal against a method comprised by conventional approaches, teams of master degree students were formed for developing six tower defense games for teaching basic multiplication operations; surveys were conducted to compare the UX of games. In this setting, we find that games developed with GEM significantly improve UX by increasing the puppetry and consequently reducing player frustration.
In this work, we present a genetic algorithm to optimize predictive models used to estimate software quality characteristics. Software quality assessment is crucial in the software development field since it helps reduce cost, time and effort. However, software quality characteristics cannot be directly measured but they can be estimated based on other measurable software attributes (such as coupling, size and complexity). Software quality estimation models establish a relationship between the unmeasurable characteristics and the measurable attributes. However, these models are hard to generalize and reuse on new, unseen software as their accuracy deteriorates significantly. In this paper, we present a genetic algorithm that adapts such models to new data. We give empirical evidence illustrating that our approach out-beats the machine learning algorithm C4.5 and random guess.
Many research studies in the past have shown that the distribution of bugs in software systems follows the Pareto principle. Some studies have also proposed the Pareto distribution (PD) to model bugs in software systems. However, several other probability distributions such as the Weibull, Bounded Generalized Pareto, Double Pareto (DP), Log Normal and Yule–Simon distributions have also been proposed and each of them has been evaluated for their fitness to model bugs in different studies. We investigate this problem further by making use of information theoretic (criterion-based) approaches to model selection by which several issues like overfitting, etc., that are prevalent in previous works, can be handled elegantly. By strengthening the model selection procedure and studying a large collection of fault data, the results are made more accurate and stable. We conduct experiments on fault data from 74 releases of various open source and proprietary software systems and observe that the DP distribution outperforms all others with statistical significance in the case of proprietary projects. For open source software systems, the top three performing distributions are DP, Bounded Generalized Pareto, Weibull models and they are significantly better than all others though there is no significant difference amongst three of them.
Evaluating software modules for inclusion in a Drupal website is a crucial and complex task that currently requires manual assessment of a number of module facets. This study applied data-mining techniques to identify quality-related metrics associated with highly popular and unpopular Drupal modules. The data-mining approach produced a set of important metrics and thresholds that highlight a strong relationship between the overall perceived reliability of a module and its popularity. Areas for future research into open-source software quality are presented, including a proposed module evaluation tool to aid developers in selecting high-quality modules.
Model refactoring enhances the platform-independent design models of software aiming at smoother impact of requirement changes, thereby improving the design quality and assisting in their evolution and maintenance. This study reports a systematic literature review of refactoring techniques particularly in the domain of models of object-oriented software systems. The study used the standard systematic literature review method based on a comprehensive set of 58 articles from a total of 1200 articles published in leading journals, premier conferences, workshops and books. The primary studies were thoroughly analyzed on various aspects of model refactoring process. Identification of methodologies and classification on the basis of model transformation systems, refactoring operations and their application, model behavior specification and preservation, model quality expression, model consistency management and automation of process is reported. This study shows that inadequate model-based approaches for behavior preservation, synchronized model enhancement and empirical evaluation of the proposed refactoring techniques are major obstacles in fully automated model refactoring process.
The end users' expectations for the software services are increasing. With new technologies, e.g. cloud computing and software as a service trend, the quality of the services through a full product life cycle, together with the cost of services, are key factors in making the competitive advantage in new markets. In most cases in today's business, there is always the Service Level Agreement (SLA) that has to be maintained as well. Therefore, there is a clear need for taking into consideration the complexity of the programmed solution, that is the basis of the given service, probability of an error occurrence, and all costs that will be associated with operating the service. In this paper, the important elements of the quality of the software that is considered a service to a business function are analyzed. The proposal of the six step method, which could be utilized in continuous quality improvements, is given. With this method, it is possible to better understand the root reasons for a specific service's behavior. Understanding service behavior is the key prerequisite for service improvement. The method encapsulates systematic approach in comparative analysis of the defined parameters of each service, with the same parameters of other services that belong to the same rank. These relative comparisons are proposed to be done in drill-in cycles until getting to the point of clear understanding of what makes one service better than the other and vice versa, in the given business environments. Each single conclusion on each level of cycled comparative measurements would trigger improvement action at that level. This method also provides the management with better tools to perform adequate human resources capacity planning for each software service for the given time frame. An example of the method utilization on one selected use case is given as well.
We present a study of 600 Java software networks with the aim of characterizing the relationship among their defectiveness and community metrics. We analyze the community structure of such networks, defined as their topological division into subnetworks of densely connected nodes. A high density of connections represents a higher level of cooperation between classes, so a well-defined division in communities could indicate that the software system has been designed in a modular fashion and all its functionalities are well separated. We show how the community structure can be an indicator of well-written, high quality code by retrieving the communities of the analyzed systems and by ranking their division in communities through the built-in metric called modularity. We found that the software systems with highest modularity possess the majority of bugs, and tested whether this result is related to some confounding effect. We found two power laws relating the maximum defect density with two different metrics: the number of detected communities inside a software network and the clustering coefficient. We finally found a linear correlation between clustering coefficient and number of communities. Our results can be used to make predictive hypotheses about software defectiveness of future releases of the analyzed systems.
Software engineers are able to measure the quality of their code using a variety of metrics that can be derived directly from analyzing the source code. These internal quality metrics are valuable to engineers, but the organizations funding the software development effort find external quality metrics such as defect rates and time to develop features more valuable. Unfortunately, external quality metrics can only be calculated after costly software has been developed and deployed for end-users to utilize. Here, we present a method for mining data from freely available open source codebases written in Java to train a Random Forest classifier to predict which files are likely to be external quality hotspots based on their internal quality metrics with over 75% accuracy. We also used the trained model to predict hotspots for a Java project whose data was not used to train the classifier and achieved over 75% accuracy again, demonstrating the method’s general applicability to different projects.
The earliest design decisions often have a significant impact on software quality and are the most costly to revoke. One of the challenges in architecture design is to reduce the frequency of retrofit problems in software designs; not being able to improve the quality of a system cost effectively, a problem which frequently occurs during late stages. Software architecture assessment is essential in the design of a high quality system. However, assessing the effect of individual design decisions with respect to quality is often complicated by the fact that it is hard to identify exactly how particular qualities and quality factors are improved or impaired by design decisions. In this paper we present a framework that formalizes some of the relationships between software architecture and software quality; it compiles existing design knowledge (quality improving patterns) in a format suitable for architecture assessment. This framework may prevent the retrofit problem and can assist in reasoning about intra- and inter- quality tradeoffs. We illustrate our framework by creating an instance for it for the qualities usability, security and safety.
Software metric models predict the target software metric(s), e.g., the development work effort or defect rates, for any future software project based on the project's predictor software metric(s), e.g., the project team size. Obviously, the construction of such a software metric model makes use of a data sample of such metrics from analogous past projects. However, incomplete data often appear in such data samples. Moreover, the decision on whether a particular predictor metric should be included is most likely based on an intuitive or experience-based assumption that the predictor metric has an impact on the target metric with a statistical significance. However, this assumption is usually not verifiable "retrospectively" after the model is constructed, leading to redundant predictor metric(s) and/or unnecessary predictor metric complexity. To solve all these problems, we derived a methodology consisting of the k-nearest neighbors (k-NN) imputation method, statistical hypothesis testing, and a "goodness-of-fit" criterion. This methodology was tested on software effort metric models and software quality metric models, the latter usually suffers from far more serious incomplete data. This paper documents this methodology and the tests on these two types of software metric models.
Increasing demand for security commercial products requires an improvement of methods for evaluating their software quality. Existing standards offer general frameworks but more specific models which reflect the perception of experts and customers as well as the particular characteristics of this type of products are needed. This article presents a method for generating domain-oriented software quality models for specific types of applications. It is applied to the generation of a model for security COTS products based on systematic review of standards, related literature and conclusions of evaluation experiences as well as the statistical analysis of information collected from 203 security experts and practitioners. Results reveal interesting conclusions on the importance given by users to the different quality characteristics of security commercial software products.
The primary goal of software quality engineering is to produce a high quality software product through the use of some specific techniques and processes. One strategy is applying data mining techniques to software metric and defect data collected during the software development process to identify potential low-quality program modules. In this paper, we investigate the use of feature selection in the context of software quality estimation (also referred to as software defect prediction), where a classification model is used to predict whether program modules (instances) are fault-prone or not-fault-prone. Seven filter-based feature ranking techniques are examined. Among them, six are commonly used, and the other one, named signal to noise ratio (SNR), is rarely employed. The objective of the paper is to compare these seven techniques for various software data sets and assess their effectiveness for software quality modeling. A case study is performed on 16 software data sets, and classification models are built with five different learners and evaluated with two performance metrics. Our experimental results are summarized based on statistical tests for significance. The main conclusion is that the SNR technique performs as well as the best performer of the six commonly used techniques.
User Experience (UX) is an important attribute for the success and quality of a software application. UX explores how an application is used and the emotional and behavioral consequences of such use. Although several UX evaluation methods allow understanding the reasons for a poor UX, some of them are tedious or too intrusive, making the evaluation unpleasant. This paper presents the Method for the Assessment of eXperience (MAX), which through cards and a board assists software engineers in gathering UX data while motivating users to report their experience. We conducted two pilot studies to verify the feasibility of MAX, which showed that the method is useful for evaluating the UX of finished/prototyped applications from the point of view of users and software engineers.
On the surface, one might think that revealing the factors that impact on software product usability and the success of an entire project would be relatively simple; however, reported evidence from practitioners and scholars frequently shows the opposite. The aim of this study was to determine factors with a positive (negative) impact on delivering usability in a software product and the success (failure) of an entire project. This paper presents the results of our study, where 11 factors were identified and described, along with an outline of 11 goal-oriented rules incorporating the expertise and knowledge of project managers. The elaborated body of knowledge, positively evaluated by IT professionals, would seem to be a valuable asset during the risk analysis performed before the kick-off of a project as well as in understanding the notion of usability.
The performance of a classification model is invariably affected by the characteristics of the measurement data it is built upon. If the quality of the data is generally poor, then the classification model will demonstrate poor performance. The detection and removal of noisy instances will improve quality of the data, and consequently, the performance of the classification model. We investigate a noise handling technique that attempts to improve the quality of datasets for classification purposes by eliminating instances that are likely to be noise. Our approach uses twenty five different classification techniques to create an ensemble filter for eliminating likely noise. The basic assumption is that if a given majority of classifiers in the ensemble misclassify an instance, then it is likely to be a noisy instance. Using a relatively large number of base-level classifiers in the ensemble filter facilitates in achieving the desired level of noise removal conservativeness with several possible levels of filtering. It also provides a higher degree of confidence in the noise elimination procedure as the results are less likely to get influenced by (possibly) inappropriate learning bias of a few algorithms with twenty five base-level classifiers than with relatively smaller number of base-level classifiers. Empirical case studies of two high assurance software projects demonstrates the effectiveness of our noise elimination approach by the significant improvement achieved in classification accuracies at various levels of noise filtering.
The term "software entropy" refers to the tendency for software, over time, to become difficult and costly to maintain. A software system that undergoes continuous change, such as having new functionality added to its original design, will eventually become more complex and can become disorganized as it grows, losing its original design structure.
A recent study show that software degradation may be measured using the WMC expressed in terms of Shannon entropy. In this paper we extended the empirical analyses also to RFC and CBO since these CK metrics have been shown to be correlated with fault-proneness of OO classes.
We analyzed various releases of the publicly available Eclipse and Netbeans software systems, calculating the entropy of some CK metrics for every release analyzed. The validity is shown through a direct measure of software quality such as the number of detected defects. Our results display a very good correlation between the entropy of CBO and RFC and the number of bugs for Eclipse and Netbeans.
Complexity and quality metrics are in general computed on every system module while the entropy is just a scalar number that characterizes a whole system, this result suggests that the entropy of some CK metrics could be considered as a global quality metric for large software systems. Our results need, however, to be confirmed for other large software systems.
"Knowledge discovery in data bases" (KDD) for software engineering is a process for finding useful information in the large volumes of data that are a byproduct of software development, such as data bases for configuration management and for problem reporting. This paper presents guidelines for extracting innovative process metrics from these commonly available data bases. This paper also adapts the Classification And Regression Trees algorithm, CART, to the KDD process for software engineering data. To our knowledge, this algorithm has not been used previously for empirical software quality modeling. In particular, we present an innovative way to control the balance between misclassification rates. A KDD case study of a very large legacy telecommunications software system found that variables derived from source code, configuration management transactions, and problem reporting transactions can be useful predictors of software quality. The KDD process discovered that for this software development environment, out of forty software attributes, only a few of the predictor variables were significant. This resulted in a model that predicts whether modules are likely to have faults discovered by customers. Software developers need such predictions early in development to target software enhancement techniques to the modules that need improvement the most.
Software inspection is a proven method that enables the detection and removal of defects in software artifacts as soon as these artifacts are created. It usually involves activities in which a team of qualified personnel determines whether the created artifact is of sufficient quality. Detected quality deficiencies are subsequently corrected. In this way, an inspection cannot only contribute towards software quality improvement, but also lead to significant budget and time benefits. These advantages have already been demonstrated in many software development projects and organizations.
After Fagan's seminal paper presented in 1976, the body of work in software inspection has greatly increased and matured. This survey is to provide an overview of the large body of contributions in the form of incremental improvements and/or new methodologies that have been proposed to leverage and amplify the benefits of inspections within software development and even maintenance projects. To structure this large volume of work, it introduces, as a first step, the core concepts and relationships that together embody the field of software inspection. In a second step, the survey discusses the inspection-related work in the context of the presented taxonomy.
The survey is beneficial for researchers as well as practitioners. Researchers can use the presented survey taxonomy to evaluate existing work in this field and identify new research areas. Practitioners, on the other hand, get information on the reported benefits of inspections. Moreover, they find an explanation of the various methodological variations and get guidance on how to instantiate the various taxonomy dimensions for the purpose of tailoring and performing inspections in their software projects.