Processing math: 100%
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

SEARCH GUIDE  Download Search Tip PDF File

  • articleNo Access

    Approximation theory, computing, and deep learning on the Wasserstein space

    The challenge of approximating functions in infinite-dimensional spaces from finite samples is widely regarded as formidable. In this study, we delve into the challenging problem of the numerical approximation of Sobolev-smooth functions defined on probability spaces. Our particular focus centers on the Wasserstein distance function, which serves as a relevant example. In contrast to the existing body of literature focused on approximating efficiently pointwise evaluations, we chart a new course to define functional approximants by adopting three machine learning-based approaches:

    • (1)Solving a finite number of optimal transport problems and computing the corresponding Wasserstein potentials.
    • (2)Employing empirical risk minimization with Tikhonov regularization in Wasserstein Sobolev spaces.
    • (3)Addressing the problem through the saddle point formulation that characterizes the weak form of the Tikhonov functional’s Euler–Lagrange equation.

    As a theoretical contribution, we furnish explicit and quantitative bounds on generalization errors for each of these solutions. In the proofs, we leverage the theory of metric Sobolev spaces and we combine it with techniques of optimal transport, variational calculus, and large deviation bounds. In our numerical implementation, we harness appropriately designed neural networks to serve as basis functions. These networks undergo training using diverse methodologies. This approach allows us to obtain approximating functions that can be rapidly evaluated after training. Consequently, our constructive solutions significantly enhance at equal accuracy the evaluation speed, surpassing that of state-of-the-art methods by several orders of magnitude. This allows evaluations over large datasets several times faster, including training, than traditional optimal transport algorithms. Moreover, our analytically designed deep learning architecture slightly outperforms the test error of state-of-the-art CNN architectures on datasets of images.

  • articleNo Access

    A Novel Diversity-Based Semi-Supervised Learning Framework with Related Theoretical Analysis

    We present a new co-training style framework and combine it with ensemble learning to further improve the generalization ability. By employing different strategies to combine co-training with ensemble learning, two learning algorithms, Sequential Ensemble Co-Learning (SECL) and Parallel Ensemble Co-Learning (PECL) are developed. Furthermore, we propose a weighted bagging method in PECL to generate an ensemble of diverse classifiers at the end of co-training. Finally, based on the voting margin, an upper bound on the generalization error of multi-classifier voting systems is given in the presence of both classification noise and distribution noise. Experimental results on six datasets show that our method performs better than other compared algorithms.

  • articleNo Access

    Full error analysis for the training of deep neural networks

    Deep learning algorithms have been applied very successfully in recent years to a range of problems out of reach for classical solution paradigms. Nevertheless, there is no completely rigorous mathematical error and convergence analysis which explains the success of deep learning algorithms. The error of a deep learning algorithm can in many situations be decomposed into three parts, the approximation error, the generalization error, and the optimization error. In this work we estimate for a certain deep learning algorithm each of these three errors and combine these three error estimates to obtain an overall error analysis for the deep learning algorithm under consideration. In particular, we thereby establish convergence with a suitable convergence speed for the overall error of the deep learning algorithm under consideration. Our convergence speed analysis is far from optimal and the convergence speed that we establish is rather slow, increases exponentially in the dimensions, and, in particular, suffers from the curse of dimensionality. The main contribution of this work is, instead, to provide a full error analysis (i) which covers each of the three different sources of errors usually emerging in deep learning algorithms and (ii) which merges these three sources of errors into one overall error estimate for the considered deep learning algorithm.

  • articleNo Access

    Regularized modal regression with data-dependent hypothesis spaces

    Modal regression aims at learning the conditional mode function, which is different from the traditional least-squares for approximating the conditional mean function. Due to its robust to complex noise and outliers, modal regression has attracted increasing attention recently in statistics and machine learning community. However, most of the previous modal regression models are limited to learning framework with data-independent hypothesis spaces. Usually, the data-dependent hypothesis spaces can provide much flexibility and adaptivity for many learning problems. By employing data-dependent hypothesis spaces, we propose a new regularized modal regression and establish its generalization error analysis. Data experiments demonstrate the competitive performance of the proposed model over the related least-squares regression.

  • articleNo Access

    Comparison theorems on large-margin learning

    This paper studies the binary classification problem associated with a family of Lipschitz convex loss functions called large-margin unified machines (LUMs), which offers a natural bridge between distribution-based likelihood approaches and margin-based approaches. LUMs can overcome the so-called data piling issue of support vector machine in the high-dimension and low-sample size setting, while their theoretical analysis from the perspective of learning theory is still lacking. In this paper, we establish some new comparison theorems for all LUM loss functions which play a key role in the error analysis of large-margin learning algorithms. Based on the obtained comparison theorems, we further derive learning rates for regularized LUMs schemes associated with varying Gaussian kernels, which maybe of independent interest.

  • articleNo Access

    Optimality of the rescaled pure greedy learning algorithms

    We propose the Rescaled Pure Greedy Learning Algorithm (RPGLA) for solving the kernel-based regression problem. The computational complexity of the RPGLA is less than the Orthogonal Greedy Learning Algorithm (OGLA) and Relaxed Greedy Learning Algorithm (RGLA). We obtain the convergence rates of the RPGLA for continuous kernels. When the kernel is infinitely smooth, we derive a convergence rate that can be arbitrarily close to the best rate O(m1) under a mild assumption of the regression function.

  • chapterNo Access

    ANALYSIS AND ESTIMATION OF THE VARIANCE OF CROSS-VALIDATION ESTIMATORS OF THE GENERALIZATION ERROR: A SHORT REVIEW

    We briefly review research on the estimation of variance of cross validation estimators of the generalization error of computer algorithms. A general methodology for analyzing and estimating the variance of cross validation estimators of the generalization error is also discussed in some detail.

  • chapterNo Access

    A Comparison of Estimators for the Variance of Cross-Validation Estimators of the Generalization Error of Computer Algorithms

    This paper studies variance estimators of cross-validation estimators of the generalization error. Three estimators are discussed, and their performance is evaluated in a variety of data models and data sizes. It is shown that the standard error associated with the moment approximation estimator is smaller than that associated with the other two. The effect of training and test set size on these estimators is discussed.