The MgNet[1] is an abstract and unified mathematical framework which simultaneously recovers some residual neural network (ResNet)[2][3] type convolutional neural networks (CNNs) and multigrid methods[4][5] for solving discretized partial differential equations (PDEs). As a CNN model, MgNet can be obtained by making some very minor modifications of a classic geometric multigrid method. Actually, connections between ResNet and classical multigrid methods were acknowledged in the original paper of ResNet[2] from the viewpoint how residuals are applied in both methods. MgNet[1] makes such a connection more direct and clear, and it makes it possible to directly obtain a class of efficient CNN models by simply making some very minor modification of a typical multigrid cycle but keeping the identically same algorithm structure.

Main structure and connections with ResNetEdit

One core concept in MgNet, motivated by our research in algebraic multigrid methods,[5] is the distinction between the so-called data and feature spaces (that are dual to each other). Based on this new concept, MgNet and a further research (Juncai He; Yuyan Chen; Jinchao Xu (2019). "Constrained Linear Data-feature Mapping for Image Classification". arXiv:1911.10428v1 [eess.IV].CS1 maint: uses authors parameter (link)) proposes the constrained data-feature mapping model in every grid as

 

where   belongs to the data space and   belongs to the feature space such that

 .

The feature extraction process can then be obtained through an iterative procedure for solving the above system in each grids. For example, if the single step residual correction scheme is applied for the above system, it becomes

 

with  .

If the residual of the above iterative  is further considered, it becomes

 

This is almost the exact basic block scheme in Pre-act ResNet,[3] which has the form

 

The next figure shows the pseudocode of MgNet:

One thing important to note is that the special MgNet Algorithm 1 is identical to a multigrid cycle[4][5] if the boxed nonlinear operations are removed in the algorithm.

SummaryEdit

By revealing such a direct connection between CNN and multigrid method, this opens up a new door to the design and study of deep learning models from a more mathematical viewpoint and in particular the rich mathematical techniques developed for multigrid method can be applied in the study of deep learning.

ReferencesEdit

  1. ^ a b He, Juncai; Xu, Jinchao (July 2019). "MgNet: A unified framework of multigrid and convolutional neural network". Science China Mathematics. 62 (7): 1331–1354. arXiv:1901.10415. Bibcode:2019arXiv190110415H. doi:10.1007/s11425-019-9547-2. ISSN 1674-7283.
  2. ^ a b Sun, Jian; Ren, Shaoqing; Zhang, Xiangyu; He, Kaiming (2015-12-10). "Deep Residual Learning for Image Recognition". arXiv:1512.03385v1. Bibcode:2015arXiv151203385H. Cite journal requires |journal= (help)
  3. ^ a b Sun, Jian; Ren, Shaoqing; Zhang, Xiangyu; He, Kaiming (2016-03-16). "Identity Mappings in Deep Residual Networks". arXiv:1603.05027v3. Cite journal requires |journal= (help)
  4. ^ a b Xu, Jinchao. (1992-12-01). "Iterative Methods by Space Decomposition and Subspace Correction". SIAM Review. 34 (4): 581–613. doi:10.1137/1034116. ISSN 0036-1445.
  5. ^ a b c Zikatanov, Ludmil; Xu, Jinchao (May 2017). "Algebraic multigrid methods *". Acta Numerica. 26: 591–721. arXiv:1611.01917. doi:10.1017/S0962492917000083. ISSN 0962-4929.