Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Data clustering has been widely used in many areas, such as data mining, statistics, machine learning and so on. A variety of clustering approaches have been proposed so far, but most of them are not qualified to quickly cluster a large-scale high-dimensional database. This paper is devoted to a novel data clustering approach based on a generalized particle model (GPM). The GPM transforms the data clustering process into a stochastic process over the configuration space on a GPM array. The proposed approach is characterized by the self-organizing clustering and many advantages in terms of the insensitivity to noise, quality robustness to clustered data, suitability for high-dimensional and massive data sets, learning ability, openness and easier hardware implementation with the VLSI systolic technology. The analysis and simulations have shown the effectiveness and good performance of the proposed GPM approach to data clustering.
Recently kernel methods in support vector machines have widely been used in machine learning algorithms to obtain nonlinear models. Clustering is an unsupervised learning method which divides whole data set into subgroups, and popular clustering algorithms such as c-means are employing kernel methods. Other kernel-based clustering algorithms have been inspired from kernel c-means. However, the formulation of kernel c-means has a high computational complexity. This paper gives an alternative formulation of kernel-based clustering algorithms derived from competitive learning clustering. This new formulation obviously uses sequential updating or on-line learning to avoid high computational complexity. We apply kernel methods to related algorithms: learning vector quantization and self-organizing map. We moreover consider kernel methods for sequential c-means and its fuzzy version by the proposed formulation.
This paper applies a ridge estimation approach in an existing partial logistic regression model with exact predictors, intuitionistic fuzzy responses, intuitionistic fuzzy coefficients and intuitionistic fuzzy smooth function to improve an existing intuitionistic fuzzy partial logistic regression model in the presence of multicollinearity. For this purpose, ridge methodology is also involved to estimate the parametric intuitionistic fuzzy coefficients and nonparametric intuitionistic fuzzy smooth function. Some common goodness-of-fit criteria are also used to examine the performance of the proposed regression model. The potential application of the proposed method are illustrated and compared with the intuitionistic partial logistic regression model through two numerical examples. The results clearly indicate the proposed ridge method is quite efficient in model’s performances when there is multicollinearity among the predictors.
This paper discusses the asymptotic property of estimators for optimal portfolios when the returns are vector-valued locally stationary processes. First, we derive the asymptotic distribution of a nonparametric portfolio estimator based on the kernel method. Optimal bandwidth and kernel function are given by minimizing the mean squares error of it. Next, assuming parametric models for non-Gaussian locally stationary processes, we prove the LAN theorem, and propose a parametric portfolio estimator ĝ based on a quasi-maximum likelihood estimator. Then it is shown that ĝ is asymptotically efficient based on the LAN. Numerical studies are provided to investigate the accuracy of the portfolio estimators parametrically and nonparametrically. They illuminate some interesting features of them.
We propose a bias corrected regularization kernel ranking (BCRKR) method and characterize the asymptotic bias and variance of the estimated ranking score function. The results show that BCRKR has smaller asymptotic bias than the traditional regularization kernel ranking (RKR) method. The variance of BCRKR has the same order of decay as that of RKR when the sample size goes to infinity. Therefore, BCRKR is expected to be as effective as RKR and its smaller bias favors its use in block wise data analysis such as distributed learning for big data. The proofs make use of a concentration inequality of integral operator U-statistic.
An unsupervised feature selection method is proposed for analysis of datasets of high dimensionality. The least square error (LSE) of approximating the complete dataset via a reduced feature subset is proposed as the quality measure for feature selection. Guided by the minimization of the LSE, a kernel least squares forward selection algorithm (KLS-FS) is developed that is capable of both linear and non-linear feature selection. An incremental LSE computation is designed to accelerate the selection process and, therefore, enhances the scalability of KLS-FS to high-dimensional datasets. The superiority of the proposed feature selection algorithm, in terms of keeping principal data structures, learning performances in classification and clustering applications, and robustness, is demonstrated using various real-life datasets of different sizes and dimensions.
Nonnegative matrix factorization (NMF) is a linear approach for extracting localized feature of facial image. However, NMF may fail to process the data points that are nonlinearly separable. The kernel extension of NMF, named kernel NMF (KNMF), can model the nonlinear relationship among data points and extract nonlinear features of facial images. KNMF is an unsupervised method, thus it does not utilize the supervision information. Moreover, the extracted features by KNMF are not sparse enough. To overcome these limitations, this paper proposes a supervised KNMF called block kernel NMF (BKNMF). A novel objective function is established by incorporating the intra-class information. The algorithm is derived by making use of the block strategy and kernel theory. Our BKNMF has some merits for face recognition, such as highly sparse features and orthogonal features from different classes. We theoretically analyze the convergence of the proposed BKNMF. Compared with some state-of-the-art methods, our BKNMF achieves superior performance in face recognition.
Principal component analysis (PCA) may be the most popular dimension reduction method. In this paper, the learning scheme of kernel PCA methods is established. Moreover, for the uncentered case, we introduce the error representation, and prove the comparison theorem that the learning error can be bounded by the excess generalization error. Under the condition that the positive eigenvalues of LK are all single, the satisfied error bound O(n−12) is deduced.
High accuracy is paramount when predicting biochemical characteristics using Quantitative Structural-Property Relationships (QSPRs). Although existing graph-theoretic kernel methods combined with machine learning techniques are efficient for QSPR model construction, they cannot distinguish topologically identical chiral compounds which often exhibit different biological characteristics. In this paper, we propose a new method that extends the recently developed tree pattern graph kernel to accommodate stereoisomers. We show that Support Vector Regression (SVR) with a chiral graph kernel is useful for target property prediction by demonstrating its application to a set of human vitamin D receptor ligands currently under consideration for their potential anti-cancer effects.
In pattern recognition, feature extraction techniques are widely employed to reduce the dimensionality of data and to enhance the discriminatory information. In this paper, we present an extension of KFD method based on the data-dependent kernel, called the adaptive kernel discriminant analysis (AKDA), for feature extraction and pattern classification. AKDA is more adaptive to the input data than KDA owing to the optimization of projection from input space to feature space with the data-dependent kernel, which enhances the performance of KDA. Experimental results on ORL, Yale and MNIST database show that the proposed AKDA gives the higher performance than KDA.
Suppose the random vector (X, Y) satisfies the heteroscedastic regression model Y = m(X) + σ(X)ε, where m(·) = E(Y∣·), σ2(·) = Var(Y∣·) and ε (with mean zero and variance one) is independent of X. The response Y is subject to random right censoring and the covariate X is completely observed. New goodness-of-fit testing procedures for m and σ2(·) are proposed. They are based on a modified integrated regression function technique which uses the method of [Heuchenne and Van Keilegom, 2006b] to construct new versions of functions of the data points. Asymptotic representations of the processes are obtained and weak convergence to gaussian processes is deduced.
A simple multiple kernel learning is proposed in this paper, named simple multiple kernel learning (SMKL). Different from general multiple kernel learning (MKL), SMKL constructs a new function space spanned by multiple kernels with a group of parameters for learning problem and then defines a new inner product on this solution space with a redefined symmetric positive definite matrix. We also propose a SMKL algorithm in a general-purpose framework for classification. The SMKL algorithm has a closed-form solution for optimization procedure without additional iterative operation appeared in MKL algorithm. Our proposed SMKL relieves the computational cost of general MKL algorithms. We implement experiments on two benchmark datasets to demonstrate the benefits of the proposed method.