This paper is mainly tutorial in nature and discusses architectures for oversampling converters with a particular emphasis on those which are well suited for high frequency input signal bandwidths. The first part of the paper looks at various architectures for discrete-time modulators and looks at their performance when attempting high speed operation. The second part of this paper presents some recent advancements in time-interleaved oversampling converters. The next section describes the design and challenges in continuous-time modulators. Finally, conclusions are made and a brief summary of the recent state of the art of high-speed converters is presented.
Earthquake prediction based on extreme imbalanced precursor data is a challenging task for standard algorithms. Since even if an area is in an earthquake-prone zone, the proportion of days with earthquakes per year is still a minority. The general method is to generate more artificial data for the minority class that is the earthquake occurrence data. But the most popular oversampling methods generate synthetic samples along line segments that join minority class instances, which is not suitable for earthquake precursor data. In this paper, we propose a Safe Zone Synthetic Minority Oversampling Technique (SZ-SMOTE) oversampling method as an enhancement of the SMOTE data generation mechanism. SZ-SMOTE generates synthetic samples with a concentration mechanism in the hyper-sphere area around each selected minority instances. The performance of SZ-SMOTE is compared against no oversampling, SMOTE and its popular modifications adaptive synthetic sampling (ADASYN) and borderline SMOTE (B-SMOTE) on six different classifiers. The experiment results show that the quality of earthquake prediction using SZ-SMOTE as oversampling algorithm significantly outperforms that of using the other oversampling algorithms.
Traditional classifiers are trapped by the class-imbalanced problem due to the fact that they are biased toward the majority class. Oversampling methods can improve imbalanced classification by creating synthetic minority class samples. Noise generation has been a great challenge in oversampling methods. Filtering-based and direction-change methods are proposed against noise generation. Yet, the adopted noise filters in filtering-based methods are biased to the majority class. Besides, the k-nearest neighbor (KNN)-based interpolation in filtering-based and direction-change methods is susceptible to abnormal samples (e.g. outliers, noise or unsafe borderline samples). To overcome noise generation while solving the above shortcomings of filtering-based and direction-change methods, this work presents a new synthetic minority oversampling technique based on local means-based KNN (SMOTE-LMKNN). In SMOTE-LMKNN, the local mean-based KNN (LMKNN) is first introduced to describe the local characteristic of imbalanced data. Second, a new LMKNN-based noise filter is proposed to remove noise and unsafe borderline samples. Third, the interpolation between a base sample and its LMKNN is proposed to create synthetic minority class samples. Empirical results of extensive experiments with 18 data sets show that SMOTE-LMKNN is competent compared with seven popular oversampling methods in training KNN classifier and classification and regression tree (CART).
The classification is usually degraded due to the imbalanced class distribution. Synthetic minority oversampling technique (SMOTE) has been successful in improving imbalanced classification and has received great praise. Overgeneralization is one of the most challenges in SMOTE. Although multiple SMOTE-based variations are proposed against overgeneralization, they still have the following shortcomings: (a) creating too many synthetic samples in high-density regions; (b) removing suspicious noise directly instead of modifying them; (c) relying on many parameters. This paper proposes a new SMOTE based on adaptive noise optimization and fast search for local sets (SMOTE-ANO-FLS) to overcome the overgeneralization and the shortcomings of existing works. First, SMOTE-ANO-FLS uses the k-D tree to fast search the local sets for each sample. Second, a new noise detection method based on local sets and the imbalanced ratio is proposed to detect suspicious noise. Third, a new adaptive noise optimization method is proposed to modify detected suspicious noise instead of removing them. Finally, a new probability weight based on local sets is proposed to help create more synthetic minority class samples in borderline and sparse regions. The effectiveness of SMOTE-ANO-FLS is proven by employing 7 oversampling methods and random forest on the extensive synthetic and real data sets.
This paper presents a new design for the optimization of analog to digital sigma delta-coupling noise annotations. The proposed technique increases the level of the signal-to quantization-noise-ratio (SQNR) by improving the zero of the noise transfer function, which leads to an increase in signal to noise. In the new SQNR design, the modulator rises up to 8.5 db. The proposed structure of the first-order dual-channel sigma-delta modulator is also presented.
Imbalanced data sets in the class distribution is common to many real world applications. As many classifiers tend to degrade their performance over the minority class, several approaches have been proposed to deal with this problem. In this paper, we propose two new cluster-based oversampling methods, SOI-C and SOI-CJ. The proposed methods create clusters from the minority class instances and generate synthetic instances inside those clusters. In contrast with other oversampling methods, the proposed approaches avoid creating new instances in majority class regions. They are more robust to noisy examples (the number of new instances generated per cluster is proportional to the cluster's size). The clusters are automatically generated. Our new methods do not need tuning parameters, and they can deal both with numerical and nominal attributes. The two methods were tested with twenty artificial datasets and twenty three datasets from the UCI Machine Learning repository. For our experiments, we used six classifiers and results were evaluated with recall, precision, F-measure, and AUC measures, which are more suitable for class imbalanced datasets. We performed ANOVA and paired t-tests to show that the proposed methods are competitive and in many cases significantly better than the rest of the oversampling methods used during the comparison.
In the context of machine learning, an imbalanced classification problem states to a dataset in which the classes are not evenly distributed. This problem commonly occurs when attempting to classify data in which the distribution of labels or classes is not uniform. Using resampling methods to accumulate samples or entries from the minority class or to drop those from the majority class can be considered the best solution to this problem. The focus of this study is to propose a framework pattern to handle any imbalance dataset for fraud detection. For this purpose, Undersampling (Random and NearMiss) and oversampling (Random, SMOTE, BorderLine SMOTE) were used as resampling techniques for the concentration of our experiments for balancing an evaluated dataset. For the first time, a large-scale unbalanced dataset collected from the Kaggle website was used to test both methods for detecting fraud in the Tunisian company for electricity and gas consumption. It was also evaluated with four machine learning classifiers: Logistic Regression (LR), Naïve Bayes (NB), Random Forest, and XGBoost. Standard evaluation metrics like precision, recall, F1-score, and accuracy have been used to assess the findings. The experimental results clearly revealed that the RF model provided the best performance and outperformed all other matched classifiers with attained a classification accuracy of 89% using NearMiss undersampling and 99% using Random oversampling.
For the problem of class-imbalance in the operation monitoring data of wind turbine (WT) pitch connecting bolts, an improved Borderline-SMOTE oversampling method based on “two-step decision” with adaptive selection of synthetic instances (TSDAS-SMOTE) is proposed. Then, TSDAS-SMOTE is combined with XGBoost to construct a WT pitch connection bolt fault detection model. TSDAS-SMOTE generates new samples by “two-step decision making” to avoid the problem of class–class boundary blurring that Borderline-SMOTE tends to cause when oversampling. First, the nearest neighbor sample characteristics are perceived by the fault class samples in the first decision step. If the characteristics of this fault class sample are different from the characteristics of all its nearest neighbor samples, the fault class sample is identified as interference and filtered. Second, the faulty class samples in the boundary zone are extracted as synthetic instances to generate new samples adaptively. Finally, the normal class samples in the boundary zone are used to perceive the unqualified new generated samples in the boundary zone based on the minimum Euclidean distance characteristics, and these unqualified samples are eliminated. For the second step of decision making, since the first step decision removes some of the newly generated samples, the remaining fault class samples without interference samples and boundary zone samples are used as synthetic instances to continue adaptively generating new samples. Thus, a balanced data set with clear class–class boundary zone is obtained, which is then used to train a WT pitch connection bolt fault detection model based on the XGBoost algorithm. The experimental results show that compared with six popular oversampling methods such as Borderline-SMOTE, Cluster-SMOTE, K-means-SMOTE, etc., the fault detection model constructed by the proposed oversampling method is better than the compared fault detection models in terms of missed alarm rate (MAR) and false alarm rate (FAR). Therefore, it can well achieve the fault detection of large WT pitch connection bolts.
Machine learning (ML) is a branch of computer science that is rapidly gaining popularity within the healthcare arena due to its ability to explore large datasets to discover useful patterns that can be interepreted for decision-making and prediction. ML techniques are used for the analysis of clinical parameters and their combinations for prognosis, therapy planning and support and patient management and wellbeing. In this research, we investigate a crucial problem associated with medical applications such as autism spectrum disorder (ASD) data imbalances in which cases are far more than just controls in the dataset. In autism diagnosis data, the number of possible instances is linked with one class, i.e. the no ASD is larger than the ASD, and this may cause performance issues such as models favouring the majority class and undermining the minority class. This research experimentally measures the impact of class imbalance issue on the performance of different classifiers on real autism datasets when various data imbalance approaches are utilised in the pre-processing phase. We employ oversampling techniques, such as Synthetic Minority Oversampling (SMOTE), and undersampling with different classifiers including Naive Bayes, RIPPER, C4.5 and Random Forest to measure the impact of these on the performance of the models derived in terms of area under curve and other metrics. Results pinpoint that oversampling techniques are superior to undersampling techniques, at least for the toddlers’ autism dataset that we consider, and suggest that further work should look at incorporating sampling techniques with feature selection to generate models that do not overfit the dataset.
An oversampling-based randomized algorithm is introduced for sorting n keys of any abstract data type on the p processors of a latency-tolerant parallel system such as a bulk-synchronous parallel computer. The algorithm is asymptotically, for large n, optimal in computation and communication compared to the best available sequential sorting algorithm, even when constant factors are taken into consideration. Its parallel time is within a (1 + o(1))/p multiplicative factor of the corresponding sequential method for sorting, improving upon other approaches for a wider range of values of p relative to n. It also improves upon other randomized sorting algorithms that have been developed for latency-tolerant models of computation on the amount of parallel slack (ratio n over p) required to achieve optimal speedup and also, on the associated failure probability. For values of p closer to n than other latency-tolerant randomized approaches, it can be turned into a PRAM-like algorithm but for such cases a speedup of O(p) rather than p/(1 + o(1)) is then achievable. Although the general framework of the proposed algorithm relies on the well-established prior idea of sample-based sorting, there are some novel features in its design such as the choice of splitter size, the way keys are split, and the handling of the base case of the recursive sorting algorithm that contribute to its performance.
In imbalanced learning, most supervised learning algorithms often fail to account for data distribution and present learning that is biassed towards the majority leading to unfavorable classification performance, particularly for the minority class samples. To tackle this problem, the ADASYN algorithm adaptively allocates weights to the minority class examples. A significant weight improves the possibility for the minority class sample serving as a seed sample in the synthetic sample generation process. However, it does not consider the noisy examples. Thus, this paper presents a modified method of ADASYN (M-ADASYN) for learning imbalanced datasets with noisy samples. M-ADASYN considers the distribution of minority class and creates noise-free minority examples by eliminating the noisy samples based on proximity to the original minority and majority class samples. The experimental outcomes confirm that the predictive performance of M-ADASYN is better than KernelADASYN, ADASYN, and SMOTE algorithm.
This paper is mainly tutorial in nature and discusses architectures for oversampling converters with a particular emphasis on those which are well suited for high frequency input signal bandwidths. The first part of the paper looks at various architectures for discrete-time modulators and looks at their performance when attempting high speed operation. The second part of this paper presents some recent advancements in time-interleaved oversampling converters. The next section describes the design and challenges in continuous-time modulators. Finally, conclusions are made and a brief summary of the recent state of the art of high-speed converters is presented.
The stock reviews are important professional advice for stock traders. Based on the massive stock reviews available online, an automatic approach which can identify the emotional tendency hidden in stock reviews will provide tremendous help to stock traders. A sentiment classification strategy is proposed for stock reviews analysis. Firstly, stock reviews are collected from the Internet and construct a labeled corpus. Secondly, a feature extraction approach is designed which can effectively map the corpus into a structured matrix. Finally, four classifiers are evaluated on multiple criteria. The synthetic oversampling approach is utilized to overcome the imbalanced distribution of positive and negative reviews. The experimental results demonstrate that the SVM algorithms coupled with SMOTE can achieve the highest performance for sentiment classification on the real world Chinese stock review corpus.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.