Abstract
Distance metric learning and nonlinear dimensionality reduction are intrinsically related, since they are both different perspectives of the same fundamental problem: to learn compact and meaningful data representations for classification and visualization. In this paper, we propose a graph-based generalization of Semi-Supervised Dimensionality Reduction (SSDR) algorithm that uses stochastic distances (Kullback-Leibler, Bhattacharyya and Cauchy-Schwarz divergences) to compute the similarity between local multivariate Gaussian distributions along the K Nearest Neighbors (KNN) graph build from the samples in the input high-dimensional space. In summary, there are two variants of the proposed method: one which uses only a fraction of the labeled samples (10%) and another that also uses a clustering method (Gaussian Mixture Models) to estimate the labels of the minimum spanning tree of the KNN graph, incorporating more information into the process. Experimental results with several real datasets show that the proposed method is able to improve the classification accuracy of several supervised classifiers and also the quality of the obtained clusters (Silhouette Coefficients) in comparison to the regular SSDR algorithm, making it a viable alternative for pattern classification problems.