Keyword: Natural Gradient : Search

Anywhere

Advanced Search

SEARCH GUIDE

Results: 1 - 3of3

Follow results:

refine search

Filters

per page:

Sort: Relevance

Context for search term 1Search term 1*

All Dates

LastSelect static range

Custom Range

Select starting monthSelect starting year

Select ending monthSelect ending year

Advanced

Search name	Searched On	Run search
Keyword: Natural Gradient (3)	31 Mar 2025	Run
Keyword: Cone-realization (1)	31 Mar 2025	Run
Keyword: Quantum K-theory (3)	31 Mar 2025	Run
Keyword: Q-A Maps (1)	31 Mar 2025	Run
Keyword: Room Temperature (10)	31 Mar 2025	Run

articleNo Access
Eigenvalue-Corrected Natural Gradient Based on a New Approximation
- Kaixin Gao,
- Zheng-Hai Huang,
- Xiaolei Liu,
- Min Wang,
- Shuangling Wang,
- Zidong Wang,
- Dachuan Xu, and
- Fan Yu
Asia-Pacific Journal of Operational Research01 Feb 2023
Preview Abstract
Using second-order optimization methods for training deep neural networks (DNNs) has attracted many researchers. A recently proposed method, Eigenvalue-corrected Kronecker Factorization (EKFAC), proposed an interpretation by viewing natural gradient update as a diagonal method and corrects the inaccurate re-scaling factor in the KFAC eigenbasis. What’s more, a new method to approximate the natural gradient called Trace-restricted Kronecker-factored Approximate Curvature (TKFAC) is also proposed, in which the Fisher information matrix (FIM) is approximated as a constant multiplied by the Kronecker product of two matrices and the traces can be kept equal before and after the approximation. In this work, we combine the ideas of these two methods and propose Trace-restricted Eigenvalue-corrected Kronecker Factorization (TEKFAC). The proposed method not only corrects the inexact re-scaling factor under the Kronecker-factored eigenbasis, but also considers the new approximation method and the effective damping technique adopted by TKFAC. We also discuss the differences and relationships among the related Kronecker-factored approximations. Empirically, our method outperforms SGD with momentum, Adam, EKFAC and TKFAC on several DNNs.
articleNo Access
CONJUGATE AND NATURAL GRADIENT RULES FOR BYY HARMONY LEARNING ON GAUSSIAN MIXTURE WITH AUTOMATED MODEL SELECTION
- JINWEN MA,
- BIN GAO,
- YANG WANG, and
- QIANSHENG CHENG
International Journal of Pattern Recognition and Artificial Intelligence01 Aug 2005
Preview Abstract
Under the Bayesian Ying–Yang (BYY) harmony learning theory, a harmony function has been developed on a BI-directional architecture of the BYY system for Gaussian mixture with an important feature that, via its maximization through a general gradient rule, a model selection can be made automatically during parameter learning on a set of sample data from a Gaussian mixture. This paper further proposes the conjugate and natural gradient rules to efficiently implement the maximization of the harmony function, i.e. the BYY harmony learning, on Gaussian mixture. It is demonstrated by simulation experiments that these two new gradient rules not only work well, but also converge more quickly than the general gradient ones.
articleNo Access
BLIND SEPARATION OF MIXED KURTOSIS SIGNED SIGNALS USING PARTIAL OBSERVATIONS AND LOW COMPLEXITY ACTIVATION FUNCTIONS
International Journal of Computational Intelligence and Applications01 Jun 2004
Preview Abstract
Although several highly accurate blind source separation algorithms have already been proposed in the literature, these algorithms must store and process the whole data set which may be tremendous in some situations. This makes the blind source separation infeasible and not realisable on VLSI level, due to a large memory requirement and costly computation. This paper concerns the algorithms for solving the problem of tremendous data sets and high computational complexity, so that the algorithms could be run on-line and implementable on VLSI level with acceptable accuracy. Our approach is to partition the observed signals into several parts and to extract the partitioned observations with a simple activation function performing only the "shift-and-add" micro-operation. No division, multiplication and exponential operations are needed. Moreover, obtaining an optimal initial de-mixing weight matrix for speeding up the separating time will be also presented. The proposed algorithm is tested on some benchmarks available online. The experimental results show that our solution provides comparable efficiency with other approaches, but lower space and time complexity.