Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Using second-order optimization methods for training deep neural networks (DNNs) has attracted many researchers. A recently proposed method, Eigenvalue-corrected Kronecker Factorization (EKFAC), proposed an interpretation by viewing natural gradient update as a diagonal method and corrects the inaccurate re-scaling factor in the KFAC eigenbasis. What’s more, a new method to approximate the natural gradient called Trace-restricted Kronecker-factored Approximate Curvature (TKFAC) is also proposed, in which the Fisher information matrix (FIM) is approximated as a constant multiplied by the Kronecker product of two matrices and the traces can be kept equal before and after the approximation. In this work, we combine the ideas of these two methods and propose Trace-restricted Eigenvalue-corrected Kronecker Factorization (TEKFAC). The proposed method not only corrects the inexact re-scaling factor under the Kronecker-factored eigenbasis, but also considers the new approximation method and the effective damping technique adopted by TKFAC. We also discuss the differences and relationships among the related Kronecker-factored approximations. Empirically, our method outperforms SGD with momentum, Adam, EKFAC and TKFAC on several DNNs.
Under the Bayesian Ying–Yang (BYY) harmony learning theory, a harmony function has been developed on a BI-directional architecture of the BYY system for Gaussian mixture with an important feature that, via its maximization through a general gradient rule, a model selection can be made automatically during parameter learning on a set of sample data from a Gaussian mixture. This paper further proposes the conjugate and natural gradient rules to efficiently implement the maximization of the harmony function, i.e. the BYY harmony learning, on Gaussian mixture. It is demonstrated by simulation experiments that these two new gradient rules not only work well, but also converge more quickly than the general gradient ones.
Although several highly accurate blind source separation algorithms have already been proposed in the literature, these algorithms must store and process the whole data set which may be tremendous in some situations. This makes the blind source separation infeasible and not realisable on VLSI level, due to a large memory requirement and costly computation. This paper concerns the algorithms for solving the problem of tremendous data sets and high computational complexity, so that the algorithms could be run on-line and implementable on VLSI level with acceptable accuracy. Our approach is to partition the observed signals into several parts and to extract the partitioned observations with a simple activation function performing only the "shift-and-add" micro-operation. No division, multiplication and exponential operations are needed. Moreover, obtaining an optimal initial de-mixing weight matrix for speeding up the separating time will be also presented. The proposed algorithm is tested on some benchmarks available online. The experimental results show that our solution provides comparable efficiency with other approaches, but lower space and time complexity.