Loading [MathJax]/jax/output/CommonHTML/jax.js
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

SEARCH GUIDE  Download Search Tip PDF File

  Bestsellers

  • articleNo Access

    PERFORMANCE EVALUATION OF IMPUTATION METHODS FOR INCOMPLETE DATASETS

    In this study, we compare the performance of four different imputation strategies ranging from the commonly used Listwise Deletion to model based approaches such as the Maximum Likelihood on enhancing completeness in incomplete software project data sets. We evaluate the impact of each of these methods by implementing them on six different real-time software project data sets which are classified into different categories based on their inherent properties. The reliability of the constructed data sets using these techniques are further tested by building prediction models using stepwise regression. The experimental results are noted and the findings are finally discussed.

  • articleNo Access

    An Integrated CRO and FLANN Based Classifier for a Non-Imputed and Inconsistent Dataset

    This paper presents an integrated approach by considering chemical reaction optimization (CRO) and functional link artificial neural networks (FLANNs) for building a classifier from the dataset with missing value, inconsistent records, and noisy instances. Here, imputation is carried out based on the known value of two nearest neighbors to address dataset plagued with missing values. The probabilistic approach is used to remove the inconsistency from either of the datasets like original or imputed. The resulting dataset is then given as an input to boosted instance selection approach for selection of relevant instances to reduce the size of the dataset without loss of generality and compromising classification accuracy. Finally, the transformed dataset (i.e., from non-imputed and inconsistent dataset to imputed and consistent dataset) is used for developing a classifier based on CRO trained FLANN. The method is evaluated extensively through a few bench-mark datasets obtained from University of California, Irvine (UCI) repository. The experimental results confirm that our preprocessing tasks along with integrated approach can be a promising alternative tool for mitigating missing value, inconsistent records, and noisy instances.

  • articleNo Access

    USING COLLABORATIVE FILTERING FOR DEALING WITH MISSING VALUES IN NUCLEAR SAFEGUARDS EVALUATION

    Nuclear safeguards evaluation aims to verify that countries are not misusing nuclear programs for nuclear weapons purposes. Experts of the International Atomic Energy Agency (IAEA) carry out an evaluation process in which several hundreds of indicators are assessed according to the information obtained from different sources, such as State declarations, on-site inspections, IAEA non-safeguards databases and other open sources. These assessments are synthesized in a hierarchical way to obtain a global assessment. Much information and many sources of information related to nuclear safeguards are vague, imprecise and ill-defined. The use of the fuzzy linguistic approach has provided good results to deal with such uncertainties in this type of problems. However, a new challenge on nuclear safeguards evaluation has attracted the attention of researchers. Due to the complexity and vagueness of the sources of information obtained by IAEA experts and the huge number of indicators involved in the problem, it is common that they cannot assess all of them appearing missing values in the evaluation, which can bias the nuclear safeguards results. This paper proposes a model based on collaborative filtering (CF) techniques to impute missing values and provides a trust measure that indicates the reliability of the nuclear safeguards evaluation with the imputed values.

  • articleNo Access

    ON BARGAINING BASED POINT SOLUTION TO COOPERATIVE TU GAMES

    Consider the cooperative coalition games with side payments. Bargaining sets are calculated for all possible coalition structures to obtain a collection of imputations rather than single imputation. Our aim is to obtain a single payoff vector, which is acceptable by all players of the game under grand coalition. Though Shapely value is a single imputation, it is based on fair divisions rather than bargaining considerations. So, we present a method to obtain a single imputation based on bargaining considerations.

  • articleNo Access

    Adaptive Deep Incremental Learning — Assisted Missing Data Imputation for Streaming Data

    With the rapid explosion of the data streams from the applications, ensuring accurate data analysis is essential for effective real-time decision making. Nowadays, data stream applications often confront the missing values that affect the performance of the classification models. Several imputation models have adopted the deep learning algorithms for estimating the missing values; however, the lack of parameter and structure tuning in classification, degrade the performance for data imputation. This work presents the missing data imputation model using the adaptive deep incremental learning algorithm for streaming applications. The proposed approach incorporates two main processes: enhancing the deep incremental learning algorithm and enhancing deep incremental learning-based imputation. Initially, the proposed approach focuses on tuning the learning rate with both the Adaptive Moment Estimation (Adam) along with Stochastic Gradient Descent (SGD) optimizers and tuning the hidden neurons. Secondly, the proposed approach applies the enhanced deep incremental learning algorithm to estimate the imputed values in two steps: (i) imputation process to predict the missing values based on the temporal-proximity and (ii) generation of complete IoT dataset by imputing the missing values from both the predicted values. The experimental outcomes illustrate that the proposed imputation model effectively transforms the incomplete dataset into a complete dataset with minimal error.

  • articleNo Access

    Time Series Imputation via L1 Norm-Based Singular Spectrum Analysis

    Missing values in time series data is a well-known and important problem which many researchers have studied extensively in various fields. In this paper, a new nonparametric approach for missing value imputation in time series is proposed. The main novelty of this research is applying the L1 norm-based version of Singular Spectrum Analysis (SSA), namely L1-SSA which is robust against outliers. The performance of the new imputation method has been compared with many other established methods. The comparison is done by applying them to various real and simulated time series. The obtained results confirm that the SSA-based methods, especially L1-SSA can provide better imputation in comparison to other methods.

  • articleNo Access

    Knowledge Discovery in a Recommender System: The Matrix Factorization Approach

    Two famous matrix factorization techniques, the Singular Value Decomposition (SVD) and the Nonnegative Matrix Factorization (NMF), are popularly used by recommender system applications. Recommender system data matrices have many missing entries, and to make them suitable for factorization, the missing entries need to be filled. For matrix completion, we use mean, median and mode as three different cases of imputation. The natural clusters produced after factorization are used to formulate simple out-of-sample extension algorithms and methods to generate recommendation for a new user. Two cluster evaluation measures, Normalized Mutual Information (NMI) and Purity are used to evaluate the quality of clusters.

  • articleNo Access

    Missing Data Imputation Using Socio-hawk Optimization-based Deep Neural Network

    Missing data remain the common issue experienced in the real-world environment, which leads to deviation in data analysis and mining. Therefore, in order to lessen the consequences of missing data caused by human mistake, missing data imputation must be used in data processing. The traditional imputation model fails to satisfy the evaluation requirement due to its poor stability and low accuracy. Further, these models compromise the imputation accuracy of the increasing number of missing information. Hence, in this research, an optimized missing data imputation model is proposed using the Socio-hawk optimization Deep Neural Network (DNN). In this research, the DNN extracts the important features from the data, in which the missing data are estimated with an arbitrary missing pattern. It is stated that whenever the hyperparameters are tuned properly, the DNN’s performance is improved. The key here is the efficient training of DNN using the suggested Socio-hawk optimization, which improves the imputation model’s accuracy. To determine how well the suggested imputation model imputes missing data, it is compared to other methods. As a result, the paper’s primary contribution is to effectively train DNN using the suggested Socio-hawk optimization that reduces the error rate of the imputation model. The experimental evaluation shows that the proposed missing data imputation model attains a high performance at 90%, which provides 1.0595, 1.9919, and 0.9421 of MAE, MSE, and MAPE.

  • articleNo Access

    A deep imputation and inference framework for estimating personalized and race-specific causal effects of genomic alterations on PSA

    Prostate Specific Antigen (PSA) level in the serum is one of the most widely used markers in monitoring prostate cancer (PCa) progression, treatment response, and disease relapse. Although significant efforts have been taken to analyze various socioeconomic and cultural factors that contribute to the racial disparities in PCa, limited research has been performed to quantitatively understand how and to what extent molecular alterations may impact differential PSA levels present at varied tumor status between African–American and European–American men. Moreover, missing values among patients add another layer of difficulty in precisely inferring their outcomes. In light of these issues, we propose a data-driven, deep learning-based imputation and inference framework (DIIF). DIIF seamlessly encapsulates two modules: an imputation module driven by a regularized deep autoencoder for imputing critical missing information and an inference module in which two deep variational autoencoders are coupled with a graphical inference model to quantify the personalized and race-specific causal effects. Large-scale empirical studies on the independent sub-cohorts of The Cancer Genome Atlas (TCGA) PCa patients demonstrate the effectiveness of DIIF. We further found that somatic mutations in TP53, ATM, PTEN, FOXA1, and PIK3CA are statistically significant genomic factors that may explain the racial disparities in different PCa features characterized by PSA.

  • articleNo Access

    Imputation for Single-cell RNA-seq Data with Non-negative Matrix Factorization and Transfer Learning

    Single-cell RNA sequencing (scRNA-seq) has been proven to be an effective technology for investigating the heterogeneity and transcriptome dynamics due to the single-cell resolution. However, one of the major problems for data obtained by scRNA-seq is excessive zeros in the count matrix, which hinders the downstream analysis enormously. Here, we present a method that integrates non-negative matrix factorization and transfer learning (NMFTL) to impute the scRNA-seq data. It borrows gene expression information from the additional dataset and adds graph-regularized terms to the decomposed matrices. These strategies not only maintain the intrinsic geometrical structure of the data itself but also further improve the accuracy of estimating the expression values by adding the transfer term in the model. The real data analysis result demonstrates that the proposed method outperforms the existing matrix-factorization-based imputation methods in recovering dropout entries, preserving gene-to-gene and cell-to-cell relationships, and in the downstream analysis, such as cell clustering analysis, the proposed method also has a good performance. For convenience, we have implemented the “NMFTL” method with R scripts, which could be available at https://github.com/FocusPaka/NMFTL.

  • articleNo Access

    Comparison methods of estimating missing data in real data time series

    Missing data are encountered in many researches and they are also found in well-conducted and controlled studies. Missing data can reduce the statistical strength of a study and may produce biased estimates, leading to invalid conclusions. This study is focused on the problems and types of missing data, together with the techniques for their approach. The mechanisms by which the missing data are obtained and the methods to study these data are illustrated. We have dealt with the multiple imputations as a very efficient method of imputing the missing data and applying these methods in some simulation cases and in real data time series. We have also prepared and adapted the scripts in the programming language R to conduct the simulations. The proposed mice and Amelia packages for imputing the missing values provide fairly good approximations even in the case of real data.

  • chapterOpen Access

    Assessment of Imputation Methods for Missing Gene Expression Data in Meta-Analysis of Distinct Cohorts of Tuberculosis Patients

    The growth of publicly available repositories, such as the Gene Expression Omnibus, has allowed researchers to conduct meta-analysis of gene expression data across distinct cohorts. In this work, we assess eight imputation methods for their ability to impute gene expression data when values are missing across an entire cohort of Tuberculosis (TB) patients. We investigate how varying proportions of missing data (across 10%, 20%, and 30% of patient samples) influence the imputation results, and test for significantly differentially expressed genes and enriched pathways in patients with active TB. Our results indicate that truncating to common genes observed across cohorts, which is the current method used by researchers, results in the exclusion of important biology and suggest that LASSO and LLS imputation methodologies can reasonably impute genes across cohorts when total missingness rates are below 20%.

  • chapterOpen Access

    Imputation of race and ethnicity categories using genetic ancestry from real-world genomic testing data

    The incompleteness of race and ethnicity information in real-world data (RWD) hampers its utility in promoting healthcare equity. This study introduces two methods—one heuristic and the other machine learning-based—to impute race and ethnicity from genetic ancestry using tumor profiling data. Analyzing de-identified data from over 100,000 cancer patients sequenced with the Tempus xT panel, we demonstrate that both methods outperform existing geolocation and surname-based methods, with the machine learning approach achieving high recall (range: 0.859-0.993) and precision (range: 0.932-0.981) across four mutually exclusive race and ethnicity categories. This work presents a novel pathway to enhance RWD utility in studying racial disparities in healthcare.

  • chapterNo Access

    THE ISSUE OF MISSING VALUES, THEIR PRESENCE AND MANAGEMENT: A RELEVANT DEMONSTRATION OF DATA ANALYSIS IN MARKETING USING CaRBS

    Missing values are an often-alleged incumbency to the effectiveness of successful data analysis. Their presence able to be explained or not may be the issue, the very least acknowledged. This study discusses the extant issues of the presence of the missing values in data analysis, with particular attention to their management, including imputation. Following this discussion, the nascent Classification and Ranking Belief Simplex (CaRBS) system for data analysis (object classification) is presented which has the distinction of not requiring the a priori consideration (management) of any missing values present. Moreover, they are treated as ignorant values and retained in the analysis, a facet of CaRBS being associated with the notion of uncertain reasoning. A problem on the classification of standard and economy food products is considered, with knowledge on their inherent nutrient levels used in their discernment. The visualisation of the intermediate and final results offered by the CaRBS system allows a clear demonstration of the effects of the presence of missing values, within an object classification context.

  • chapterNo Access

    COOPERATIVE GAME THEORY AND WATER RESOURCES — PRINCIPLES

      Our main focus in this and the next chapter is the application of Cooperative Game Theory (CGT) models to international water resource issues. In this chapter we will justify the use of CGT in water resource problems, and in particular, in international conflict-cooperation cases. The chapter reviews several important CGT concepts and demonstrates their use and calculation. After reading this chapter you will have a good grasp of basic CGT concepts and be able to apply them at both conceptual and empirical levels to simple cases.