Rolling bearing is a key component with the high fault rate in the rotary machines, and its fault diagnosis is important for the safe and healthy operation of the entire machine. In recent years, the deep learning has been widely used for the mechanical fault diagnosis. However, in the process of equipment operation, its state data always presents unbalanced. Number of effective data in different states is different and usually the gap is large, which makes it difficult to directly conduct deep learning. This paper proposes a new data enhancement method combining the resampling and Conditional Wasserstein Generative Adversarial Networks-Gradient Penalty (CWGAN-GP), and uses the gray images-based Convolutional Neural Network (CNN) to realize the intelligent fault diagnosis of rolling bearings. First, the resampling is used to expand the small number of samples to a large level. Second, the conditional label in Conditional Generative Adversarial Networks (CGAN) is combined with WGAN-GP to control the generated samples. Meanwhile, the Maximum Mean Discrepancy (MMD) is used to filter the samples to obtain the high-quality expanded data set. Finally, CNN is used to train the obtained dataset and carry out the fault classification. In the experiment, a single, compound and mixed fault cases of rolling bearings are successively simulated. For each case, the different sets considering the imbalance ratio of data are constructed, respectively. The results show that the method proposed significantly improves the fault diagnosis accuracy of rolling bearings, which provides a feasible way for the intelligent diagnosis of mechanical component with the complex fault modes and unbalanced small data.
Inverse algorithms are used to assess EEG source parameters. This involves identifying unknown voxels in hundreds of different regions, giving an incomplete picture of the brain. There are no uniform solutions since the same sensor output may come from many source configurations. To overcome the lack of uniqueness, one must consider previous information and parameters inherent in the source. Our goal is to predict the location of brain sources from the recorded EEG signal without any prior knowledge of sources. We applied a particle filter to locate the brain sources in this article. The degeneracy of particle weights limits the particle filter’s performance. Various resampling techniques are suggested to address this problem. The performance of the branching resampling approach is compared to a systematic resampling method for brain source localization. To perform assessment and comparison studies, both simulated and real EEG data are used.
Images play an important role in transmitting visual information in our life. It could lead to severe consequences if images are manipulated or tampered maliciously. Digital forensics is an important research area to secure multimedia information. Many forensics technologies are applied to protect our community from the abuse of digital information. In many cases, after tampering, attackers could apply operations such as resampling, JPEG compression, blurring, etc. to cover the traces of tampering. Therefore, it is necessary to detect these manipulations in image forensics before exposing forgeries. In this paper, we propose to employ the prediction error filters, ConvNeXt blocks and convolution modules to classify images with different compression quality factors and resampling rates. By tracing the inconsistencies of resampling rates and compression quality factors, it could provide supplementary information for forensics researchers to expose possible forgeries. The proposed method could achieve great classification performance regardless of the interpolation algorithms. Also, it is highly robust against JPEG compression. In addition, the proposed method can be applied for estimating quality factors of JPEG compression.
Highly reliable software systems rarely fail during tests because they are usually designed with fault-tolerant mechanisms and tested comprehensively. It is usually difficult to obtain sufficient failure data to carry out reliability measurements by using traditional software reliability models. These models are typically based on probabilistic statistics, and the measurement accuracy cannot be guaranteed with insufficient failure data. We propose a nonparametric bootstrap (NBP) resampling method and six parametric bootstrap (PB) resampling methods to construct software reliability models for small sample conditions based on commonly used models, i.e., the Jelinski–Moranda (J–M), Goel–Okumoto (G–O), Musa–Okumoto (M–O), Schneidewind, Duane and Littlewood-Verrall models. The bootstrap is a statistical procedure that resamples a single dataset to create many simulated samples. Our experimental results on fourteen failure datasets collected from industry and academia show that the proposed models improve by 10.2–18.0% failure time prediction accuracy, 24.7–30.7% curve fitting accuracy, and 7.7–42.9% reliability measurement accuracy compared with the original models. Furthermore, our approaches achieve 58.3–91.1% better failure time prediction accuracy in the case of small sample conditions compared to state-of-the-art machine learning and neural network-based methods. Overall, our approaches can perform more accurate reliability measurements than the original models even in scenarios with limited failure data.
We investigate the structure of dynamical correlations on reconstructed attractors which were obtained by time-delay embedding of periodic, quasi-periodic and chaotic time series. Within the specific sampling of the invariant density by a finite number of vectors which results from embedding, we identify two separate levels of sampling, corresponding to two different types of dynamical correlations, each of which produces characteristic artifacts in correlation dimension estimation: the well-known trajectory bias and a characteristic oscillation due to periodic sampling. For the second artifact we propose random sampling as a new correction method which is shown to provide improved sampling and to reduce dynamical correlations more efficiently than it has been possible by the standard Theiler correction. For accurate numerical analysis of correlation dimension in a bootstrap framework both corrections should be combined. For tori and the Lorenz attractor we also show how to construct time-delay embeddings which are completely free of any dynamical correlations.
We present different tests for phase synchronization which improve the procedures currently used in the literature. This is accomplished by using a two-sample test setup and by utilizing insights and methods from directional statistics and bootstrap theory. The tests differ in the generality of the situation in which they can be applied as well as in their complexity, including computational cost. A modification of the resampling technique of the bootstrap is introduced, making it possible to fully utilize data from time series.
In this paper we discuss two modifications of the surrogate data method based on phase randomization, see [Theiler et al., 1992]. By construction, phase randomized surrogates are circular stationary. In this respect they differ from the original time series. This can cause level inaccuracies of surrogate data tests. We will illustrate this. These inaccuracies are caused by end to end mismatches of the original time series. In this paper we will discuss two approaches to remedy this problem: resampling from subsequences without end to end mismatches and data tapering. Both methods can be understood as attempts to make non-circular data approximately circular. We will show that the first method works quite well for a large range of applications whereas data tapering leads only to improvements in some examples but can be very unstable otherwise.
A question has been raised in several publications as to whether or not the expression levels or their logarithms for different genes are normally distributed. To answer this question would require a large data set where both biological variability and technological noise are present. An earlier attempt to test this assumption was limited to technical replicates and did not take multiplicity of tests into account when assessing the net results of goodness-of-fit testing. Therefore, the problem calls for further exploration. We applied several statistical tests to a large set of high-density oligonucleotide microarray data in order to systematically test for log-normality of expression levels for all the reporter genes. The multiple testing aspect of the problem was addressed by designing a pertinent resampling procedure. The results of testing did not reject normality of log-intensities in the non-normalized data under study. However, the global log-normality hypothesis was rejected beyond all reasonable doubt when the data were normalized by the quantile normalization procedure. Our results are consistent with the hypothesis that non-normalized expression levels of different genes are approximately log-normally distributed. The quantile normalization causes dramatic changes in the shape of marginal distributions of log-intensities which may be an indication that this procedure interferes not only in the technological noise but the true biological signal as well. This possibility invites a special investigation.
Resampling methods are among the best approaches to determine the number of clusters in prototype-based clustering. The core idea is that with the right choice for the number of clusters basically the same cluster structures should be obtained from subsamples of the given data set, while a wrong choice should produce considerably varying cluster structures. In this paper I give an overview how such resampling approaches can be transferred to fuzzy and probabilistic clustering. I study several cluster comparison measures, which can be parameterized with t-norms, and report experiments that provide some guidance which of them may be the best choice.
In this paper, we employ the resampling method to reduce the sample errors and increase the robustness of the classic mean variance model. By comparing the performances of the classic mean variance portfolio and the resampled portfolio, we show that the resampling method can enhance the investment efficiency. Through an empirical study of Chinese investors who invest in both Chinese market and other twelve major financial markets, we show that the resampling method helps to improve the performance of the mean variance model.
This paper introduces a new functional optimization approach to portfolio optimization problems by treating the unknown weight vector as a function of past values instead of treating them as fixed unknown coefficients in the majority of studies. We first show that the optimal solution, in general, is not a constant function. We give the optimal conditions for a vector function to be the solution, and hence give the conditions for a plug-in solution (replacing the unknown mean and variance by certain estimates based on past values) to be optimal. After showing that the plug-in solutions are sub-optimal in general, we propose gradient-ascent algorithms to solve the functional optimization for mean–variance portfolio management with theorems for convergence provided. Simulations and empirical studies show that our approach can perform significantly better than the plug-in approach.
The particle filter is known to be a powerful tool for the estimation of moving targets guided by nonlinear dynamics and sensors. The filter, however, is known to suffer from degeneracy — a feature of one particle gathering all the weight, thus causing the filter to completely diverge. Degeneracy problems become more evident when the sensors are accurate and/or the target maneuvers greatly. The resampling step in the particle filter is critical because it avoids degeneracy of particles by eliminating the wasteful use of particles that do not contribute to the posterior probability density function. The conventional resampling methods, despite being unbiased in approximating the posterior density, involve exhaustive and sequential communication within the particles and thus are computationally expensive. Hence conventional resampling is a major bottleneck for fast implementation of particle filters for real-time tracking. In this paper, we propose a new approach of filtering that requires resampling of only a minimum number of the most important particles that contribute to the posterior density. Minimizing the resampling operation to over a few important particles substantially accelerates the filtering process. We show the merits of the proposed method via simulations using a nonlinear example.
The sequential Monte Carlo, also called the Bayesian particle filter, approximates a posterior probability density function of a latent target state from noisy sensor measurements using a set of Monte Carlo samples. These samples are predicted using an importance density function and then updated using the Bayes’s rule. The updated samples and their corresponding weights provide an estimate of the latent state. The said filtering process is iterated over time for tracking dynamic target states. It is critical to have enough particles in regions of the target state space that contribute to the posterior. The auxiliary and the improved auxiliary particle filters accomplish this by a process that mimics drawing from an importance density that leverages the incoming observation into the sampling step. However these filters are known to fail when the sensor measurements are highly informative and the diffusion over the state transition is large. This paper presents an improvement to the auxiliary particle filter by taking two support points that act as limits in a univariate state space within which particles are samples. The choice of the limits is adaptive. The proposed method is successfully tested using a nonlinear model using simulations.
The high-throughput correlated DNA methylation (DNAmeth) dataset generated from Illumina Infinium Human Methylation 27 (IIHM 27K) BeadChip assay. In the DNAmeth data, there are several CpG sites for every gene, and these grouped CpG sites are highly correlated. Most of the current filtering-based ranking (FBR) methods do not consider the group correlation structures. Obtaining the significant features with the FBR methods and applying these features to the classifiers to attain the best classification accuracy in highly correlated DNAmeth data is a challenging task. In this research, we introduce a resampling of group least absolute shrinkage and selection operator (glasso) FBR method capable of ignoring the unrelated features in the data considering the group correlation among the features. The various classifiers, such as random forests (RF), Naive Bayes (NB), and support vector machines (SVM) with the significant CpGs obtained from the proposed resampling of group lasso-based ranking (RGLR) method helped to boost the classification accuracy. Through simulated and experimental prostate DNAmeth data, we showed that higher performance of accuracy, sensitivity, specificity, and geometric mean is achieved by ignoring the unimportant CpG sites through the RGLR method.
Although class imbalance learning and online learning have been extensively studied in the literature separately, online class imbalance learning that considers the challenges of both fields has not drawn much attention. It deals with data streams having very skewed class distributions, such as fault diagnosis of real-time control monitoring systems and intrusion detection in computer networks. To fill in this research gap and contribute to a wide range of real-world applications, this paper first formulates online class imbalance learning problems. Based on the problem formulation, a new online learning algorithm, sampling-based online bagging (SOB), is proposed to tackle class imbalance adaptively. Then, we study how SOB and other state-of-the-art methods can benefit a class of fault detection data under various scenarios and analyze their performance in depth. Through extensive experiments, we find that SOB can balance the performance between classes very well across different data domains and produce stable G-mean when learning constantly imbalanced data streams, but it is sensitive to sudden changes in class imbalance, in which case SOB's predecessor undersampling-based online bagging (UOB) is more robust.
Feature selection is a preprocessing step that plays a crucial role in the domain of machine learning and data mining. Feature selection methods have been shown to be effective in removing redundant and irrelevant features, improving the learning algorithm’s prediction performance. Among the various methods of feature selection based on redundancy, the fast correlation-based filter (FCBF) is one of the most effective. In this paper, we developed a novel extension of FCBF, called resampling FCBF (RFCBF) that combines resampling technique to improve classification accuracy. We performed comprehensive experiments to compare the RFCBF with other state-of-the-art feature selection methods using three competitive classifiers (K-nearest neighbor, support vector machine, and logistic regression) on 12 publicly available datasets. The experimental results show that the RFCBF algorithm yields significantly better results than previous state-of-the-art methods in terms of classification accuracy and runtime.
A linear time invariant state space model is proposed for the production and decay of two epimers (R) and (S) of a hopane released from oil bearing rock during laboratory pyrolysis. Concentrations of R and S are measured over time. The parameters to be estimated are: the initial amounts of precursors X, for R, and Y, for S; the rate constants for the production of R and S; the rate constants for the decay of R and S; and the rate constants for the two-way epimerization between both X and Y and R and S. It is shown that the model is locally identifiable. The parameters are estimated by numerical integration of the rate equations, alternated with a derivative free least squares constrained, optimisation routine. Asymptotic standard errors and covariances of parameters are given and compared with those obtained from a resampling approach (parametric bootstrap). An alternative fitting procedure, based on estimating derivatives of the concentrations of R and S by fitting splines, is implemented and compared with that based on integration of the rate equations. The rate constants are important for elucidation of the reaction pathways, and the estimates of initial concentrations of X and Y have potential for inferring the yields of oil bearing rock.
A new approach, to analysis of statistical data collected in contingent valuation surveys, is introduced. In data collecting, randomly sampled respondents may freely self-select any interval that contains their willingness-to-pay (WTP-) point. Here, the presence of self-selected intervals, with rounded ends, is an essential feature. We avoid put restrictions on dependencies between the selfselected intervals and their associated WTP-points. Two-step sampling designs are proposed. Based on an estimated coverage probability, a practical rule to stop sampling in the first step is suggested. Then division intervals are generated and they are used on the second step of data collecting. A recursion is applied to maximize the log likelihood corresponding to data collected on the second step. The consistent maximum likelihood (ML-) estimates of WTPdistribution, projected on the division intervals, are found. Their accuracies may be consistently estimated by resampling methods.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.