No Access

Convergence analysis of deep residual networks

Wentao Huang

https://orcid.org/0000-0002-3181-0546

School of Mathematics (Zhuhai), Sun Yat-sen University, Zhuhai, Guangdong 519082, P. R. China

E-mail Address: huangwt55@mail2.sysu.edu.cn

Search for more papers by this author

and

Haizhang Zhang

https://orcid.org/0000-0002-8241-3145

School of Mathematics (Zhuhai), Sun Yat-sen University, Zhuhai, Guangdong 519082, P. R. China

E-mail Address: zhhaizh2@sysu.edu.cn

Corresponding author.

Search for more papers by this author

https://doi.org/10.1142/S021953052350029XCited by:1 (Source: Crossref)

Abstract

Various powerful deep neural network architectures have made great contributions to the exciting successes of deep learning in the past two decades. Among them, deep Residual Networks (ResNets) are of particular importance because they demonstrated great usefulness in computer vision by winning the first place in many deep learning competitions. Also, ResNets are the first class of neural networks in the development history of deep learning that are really deep. It is of mathematical interest and practical meaning to understand the convergence of deep ResNets. We aim at studying the convergence of deep ResNets as the depth tends to infinity in terms of the parameters of the networks. Toward this purpose, we first give a matrix–vector description of general deep neural networks with shortcut connections and formulate an explicit expression for the networks by using the notion of activation matrices. The convergence is then reduced to the convergence of two series involving infinite products of non-square matrices. By studying the two series, we establish a sufficient condition for pointwise convergence of ResNets. We also conduct experiments on benchmark machine learning data to illustrate the potential usefulness of the results.

Keywords:

AMSC: 68T07, 40A30

References

1. B. Adcock and N. Dexter, The gap between theory and practice in function approximation with deep neural networks, SIAM J. Math. Data Sci. 3(2) (2021) 624–655. Crossref, Google Scholar
2. M. Blanchard and M. A. Bennouna, The representation power of neural networks: Breaking the curse of dimensionality, preprint (2020), arXiv:2012.05451 [cs.LG]. Google Scholar
3. Y. Cho and L. Saul, Kernel methods for deep learning, in Proc. 22nd Int. Conf. Neural Information Processing Systems (Curran Associates, 2009), pp. 342–350. Google Scholar
4. A. Daniely, R. Frostig and Y. Singer, Toward deeper understanding of neural networks: The power of initialization and a dual view on expressivity, in Proc. 30th Int. Conf. Neural Information Processing Systems (Curran Associates, 2016), pp. 2261–2269. Google Scholar
5. I. Daubechies, R. DeVore, S. Foucart, B. Hanin and G. Petrova, Nonlinear approximation and (deep) ReLU networks, Constr. Approx. 55(1) (2022) 127–172. Crossref, Web of Science, Google Scholar
6. R. DeVore, B. Hanin and G. Petrova, Neural network approximation, Acta Numer. 30 (2021) 327–444. Crossref, Web of Science, Google Scholar
7. D. Elbrächter, D. Perekrestenko, P. Grohs and H. Bölcskei, Deep neural network approximation theory, IEEE Trans. Inform. Theory 67 (2021) 2581–2623. Crossref, Web of Science, Google Scholar
8. Z. Fang, H. Feng, S. Huang and D.-X. Zhou, Theory of deep convolutional neural networks II: Spherical analysis, Neural Netw. 131 (2020) 154–162. Crossref, Web of Science, Google Scholar
9. H. Feng, S. Hou, L.-Y. Wei and D.-X. Zhou, CNN models for readability of Chinese texts, Math. Found. Comput. 5(4) (2022) 351–362. Crossref, Web of Science, Google Scholar
10. H. Feng, S. Huang and D.-X. Zhou, Generalization analysis of CNNs for classification on spheres, IEEE Trans. Neural Netw. Learn. Syst. 34 (2021) 6200–6213. Crossref, Web of Science, Google Scholar
11. B. G. Folland, Real Analysis: Modern Techniques and Their Applications, Vol. 40 (John Wiley & Sons, 1999). Google Scholar
12. I. Goodfellow, Y. Bengio and A. Courville, Deep Learning (MIT Press, 2016). Google Scholar
13. M. Hardt and T. Ma, Identity matters in deep learning, preprint (2017), arXiv:1611.04231 [cs.LG]. Google Scholar
14. J. He, L. Li and J. Xu, Approximation properties of deep ReLU CNNs, Res. Math. Sci. 9(3) (2022) 1–24. Crossref, Web of Science, Google Scholar
15. K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CV PR) (IEEE, 2016), pp. 770–778. Crossref, Google Scholar
16. K. He, X. Zhang, S. Ren and J. Sun, Identity mappings in deep residual networks, in European Conf. Computer Vision (Springer, 2016), pp. 630–645. Crossref, Google Scholar
17. Z. Hu and H. Huang, On the random conjugate kernel and neural tangent kernel, in Proc. 38th Int. Conf. Machine Learning (PMLR, 2021), pp. 4359–4368. Google Scholar
18. K. Huang, Y. Wang, M. Tao and T. Zhao, Why do deep residual networks generalize better than deep feedforward networks? — A neural tangent kernel perspective, in Proc. 34th Int. Conf. Neural Information Processing Systems (Curran Associates, 2020), pp. 2698–2709. Google Scholar
19. A. Jacot, F. Gabriel and C. Hongler, Neural tangent kernel: Convergence and generalization in neural networks, in Proc. 32nd Conf. Neural Information Processing Systems (Curran Associates, 2018), pp. 8580–8589. Google Scholar
20. A. Krizhevsky, I. Sutskever and G. E. Hinton, ImageNet classification with deep convolutional neural networks, Commun. ACM 60(6) (2017) 84–90. Crossref, Web of Science, Google Scholar
21. Y. Lecun, Y. Bengio and G. Hinton, Deep learning, Nature 521(7553) (2015) 436–444. Crossref, Web of Science, Google Scholar
22. J. Lee, Y. Bahri, R. Novak, S. S. Schoenholz, J. Pennington and J. Sohl-Dickstein, Deep neural networks as Gaussian processes, preprint (2017), arXiv:1711.00165 [stat.ML]. Google Scholar
23. H. Lin and S. Jegelka, ResNet with one-neuron hidden layers is a universal approximator, in Proc. 32nd Int. Conf. Neural Information Processing Systems (Curran Associates, 2018), pp. 6172–6181. Google Scholar
24. T. Mao, Z. Shi and D.-X. Zhou, Theory of deep convolutional neural networks III: Approximating radial functions, Neural Netw. 144 (2021) 778–790. Crossref, Web of Science, Google Scholar
25. T. Mao, Z. Shi and D.-X. Zhou, Approximating functions with multi-features by deep convolutional neural networks, Anal. Appl. 21 (2022) 1–33. Google Scholar
26. A. G. D. G. Matthews, M. Rowland, J. Hron, R. E. Turner and Z. Ghahramani, Gaussian process behaviour in wide deep neural networks, preprint (2018), arXiv:1804.11271 [stat.ML]. Google Scholar
27. H. Montanelli and Q. Du, New error bounds for deep networks using sparse grids, SIAM J. Math. Data Sci. 1(1) (2019) 78–92. Crossref, Google Scholar
28. H. Montanelli and H. Yang, Error bounds for deep ReLU networks using the Kolmogorov–Arnold superposition theorem, Neural Netw. 129 (2020) 1–6. Crossref, Web of Science, Google Scholar
29. R. M. Neal, Bayesian Learning for Neural Networks, Vol. 118 (Springer Science & Business Media, 2012). Google Scholar
30. T. Poggio, H. Mhaskar, L. Rosasco, B. Miranda and Q. Liao, Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review, Int. J. Automat. Comput. 14(5) (2017) 503–519. Crossref, Web of Science, Google Scholar
31. B. Poole, S. Lahiri, M. Raghu, J. Sohl-Dickstein and S. Ganguli, Exponential expressivity in deep neural networks through transient chaos, in Proc. 30th Int. Conf. Neural Information Processing Systems (Curran Associates, 2016), pp. 3368–3376. Google Scholar
32. S. S. Schoenholz, J. Gilmer, S. Ganguli and J. Sohl-Dickstein, Deep information propagation, preprint (2016), arXiv:1611.01232 [stat.ML]. Google Scholar
33. Z. Shen, H. Yang and S. Zhang, Deep network approximation characterized by number of neurons, Commun. Comput. Phys. 28(5) (2020) 1768–1811. Crossref, Web of Science, Google Scholar
34. Z. Shen, H. Yang and S. Zhang, Deep network with approximation error being reciprocal of width to power of square root of depth, Neural Comput. 33(4) (2021) 1005–1036. Crossref, Web of Science, Google Scholar
35. Z. Shen, H. Yang and S. Zhang, Optimal approximation rate of ReLU networks in terms of width and depth, J. Math. Pures Appl. 157 (2022) 101–135. Crossref, Web of Science, Google Scholar
36. K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, preprint (2014), arXiv:1409.1556 [cs.CV]. Google Scholar
37. J. Sokolić, R. Giryes, G. Sapiro and M. R. D. Rodrigues, Robust large margin deep neural networks, IEEE Trans. Signal Process. 65(16) (2017) 4265–4280. Crossref, Web of Science, Google Scholar
38. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke and A. Rabinovich, Going deeper with convolutions, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR) (IEEE, 2015), pp. 1–9. Crossref, Google Scholar
39. Y. Wang, A mathematical introduction to generative adversarial nets (GAN), preprint (2020), arXiv:2009.00169 [cs.LG]. Google Scholar
40. E. Weinan and Q. Wang, Exponential convergence of the deep neural network approximation for analytic functions, Sci. China Math. 61 (2018) 1733–1740. Crossref, Web of Science, Google Scholar
41. C. Williams, Computing with infinite networks, in Proc. 9th Int. Conf. Neural Information Processing Systems (MIT Press, 1996), pp. 295–301. Google Scholar
42. Y. Xu and H. Zhang, Convergence of deep ReLU networks, preprint (2021), arXiv:2107.12530 [cs.LG]. Google Scholar
43. Y. Xu and H. Zhang, Convergence of deep convolutional neural networks, Neural Netw. 153 (2022) 553–563. Crossref, Web of Science, Google Scholar
44. D. Yarotsky, Error bounds for approximations with deep ReLU networks, Neural Netw. 94 (2017) 103–114. Crossref, Web of Science, Google Scholar
45. D.-X. Zhou, Universality of deep convolutional neural networks, Appl. Comput. Harmon. Anal. 48(2) (2020) 787–794. Crossref, Web of Science, Google Scholar
46. D.-X. Zhou, Theory of deep convolutional neural networks: Downsampling, Neural Netw. 124 (2020) 319–327. Crossref, Web of Science, Google Scholar