A Graph Neural Network (GNN) is a class of artificial neural networks for processing data that can be represented as graphs[1][2][3].

Basic building blocks of a Graph Neural Network (GNN). Permutation equivariant layer. Local pooling layer. Global pooling (or readout) layer. Colors indicate features.

In the more general subject of "Geometric Deep Learning", existing neural network architectures can be interpreted as GNNs operating on suitably defined graphs[4]. Convolutional Neural Networks, in the context of computer vision, can be seen as a GNN applied to graphs structured as grids of pixels. Transformers, in the context of natural language processing, can be seen as GNNs applied to complete graphs whose nodes are words in a sentence.

The key design element of GNNs is the use of pairwise message passing, such that graph nodes iteratively update their representations by exchanging information with their neighbors. Since their inception, several different GNN architectures have been proposed[1][5][6][7], which implement different flavors of message passing[4]. As of 2022, whether is it possible to define GNN architectures "going beyond" message passing, or if every GNN can be built on message passing over suitably defined graphs, is an open research question[8].

Relevant application domains for GNNs include social networks[9], citation networks[10], molecular biology[11], physics[12] and NP-hard combinatorial optimization problems[13].

Several open source libraries implementing Graph Neural Networks are available, such as PyTorch Geometric [14] (PyTorch), Tensorflow GNN[15] (Tensorflow), and jGraph[16] (Google JAX).

Architecture

edit

The architecture of a generic GNN implements the following fundamental layers[4]:

  • Permutation Equivariant: a permutation equivariant layer maps a representation of a graph into an updated representation of the same graph. In literature, permutation equivariant layers are implemented via pairwise message passing between graph nodes[4][8]. Intuitively, in a message passing layer, nodes update their representations by aggregating the messages received from their immediate neighbours. As such, each message passing layer increases the receptive field of the GNN by one hop.
  • Global Pooling: a global pooling layer, also known as readout layer, provides fixed-size representation of the whole graph. The global pooling layer must be permutation invariant, such that permutations in the ordering of graph nodes and edges do not alter the final output[19]. Examples include element-wise sum, mean or maximum.

It has been demonstrated that GNNs cannot be more expressive than the Weisfeiler–Lehman Graph Isomorphism Test[20][21]. In practice, this means that there exist different graph structures (e.g., molecules with the same atoms but different bonds) that cannot be distinguished by GNNs. More powerful GNNs operating on higher-dimension geometries such as simplicial complexes can be designed[22]. As of 2022, whether or not future architectures will overcome the message passing primitive is an open research question[8].

 
Non-isomorphic graphs that cannot be distinguished by a GNN due to the limitations of the Weisfeiler-Lehman Graph Isomorphism Test. Colors indicate node features.

Message Passing Layers

edit
 
Node representation update in a Message Passing Neural Network (MPNN) layer. Node   receives messages sent by all of his immediate neighbours   to  . Messages are computing via the message function  , which accounts for the features of both senders and receiver.

Message Passing layers are permutation-equivariant layers mapping a graph into an updated representation of the same graph. Formally, they can be expressed as Message Passing Neural Networks (MPNNs)[4].

Let   be a graph, where   is the node set and   is the edge set. Let   neighbourhood of node  . Additionally, let   be the features of node  , and   be the features of edge  . An MPNN layer can be expressed as follows[4]:

 

Where   and   are differentiable functions (e.g., artificial neural networks), and   is a permutation invariant aggregation operator that can accept an arbitrary number of inputs (e.g, element-wise sum, mean, or max). In particular,   and   are referred to as update and message functions, respectively. Intuitively, in an MPNN computational block, graph nodes update their representations by aggregating the messages received from their neighbours.

The outputs of one or more MPNN layers are node representations   for each node   in the graph. Node representations can be employed for any downstream task, such as node/graph classification or edge prediction.

Graph nodes in an MPNN update their representation aggregating information from their immediate neighbours. As such, stacking   MPNN layers means that one node will be able to communicate with nodes that are at most   "hops" away. In principle, to ensure that every node receives information from every other node, one would need to stack a number of MPNN layers equal to the graph diameter. However, stacking many MPNN layers may cause issues such as oversmoothing[23] and oversquashing[24]. Oversmoothing refers to the issue of node representations becoming indistinguishable. Oversquashing refers to the bottleneck that is created by squeezing long-range dependencies into fixed-size representations.

Other "flavours" of MPNN have been developed in the literature[4], such as Graph Convolutional Networks[5] and Graph Attention Networks[7], whose definitions can be expressed in terms of the MPNN formalism.

Graph Convolutional Network

edit

The Graph Convolutional Network (GCN) was first introduced by Thomas Kipf and Max Welling in 2017[5].

A GCN layer defines a first-order approximation of a localized spectral filter on graphs. GCNs can be understood as a generalization of Convolutional Neural Networks to graph-structured data.

The formal expression of a GCN layer reads as follows:

 

Where   is the matrix of node representations  ,   is the matrix of node features  ,   is an activation function (e.g., ReLU),   is the graph adjacency matrix with the addition of self-loops,  is the normalized degree matrix, and   is a matrix of trainable parameters.

In particular, let   be the graph adjacency matrix: then, one can define   and  , where   denotes the identity matrix. This normalization ensures that the eigenvalues of   are bounded in the range  , avoiding numerical instabilities and exploding/vanishing gradients.

A limitation of GCNs is that they do not allow multidimensional edge features  [5]. It is however possible to associate scalar weights   to each edge by imposing  , i.e., by setting each nonzero entry in the adjacency matrix equal to the weight of the corresponding edge.

Graph Attention Network

edit

The Graph Attention Network (GAT) was introduced by Petar Veličković et al. in 2018[7].

A multi-head GAT layer can be expressed as follows:

 

Where   is the number of attention heads,   denotes vector concatenation,   is an activation function (e.g., ReLU),   are attention coefficients, and   is a matrix of trainable parameters for the  -th attention head.

For the final GAT layer, the outputs from each attention head are averaged before the application of the activation function. Formally, the final GAT layer can be written as:

 

Attention in Machine Learning is a technique that mimics cognitive attention. In the context of learning on graphs, the attention coefficient   measures how important is node   to node  .

Normalized attention coefficients are computed as follows:

 

Where   is a vector of learnable weights,   indicates transposition, and   is a modified ReLU activation function.

A GCN can be seen as a special case of a GAT where attention coefficients are not learnable, but fixed and equal to the edge weights  .

Local Pooling Layers

edit

Local pooling layers coarsen the graph via downsampling. In literature, several learnable local pooling strategies have been proposed[19].

Top-k Pooling

edit

The Top-k Pooling layer [17] can be formalised as follows:

 
 
 
 

Where   is the matrix of node features,   is the graph adjacency matrix,   denotes element-wise matrix multiplication, and   is a learnable projection vector. The projection vector   computes a scalar projection value for each graph node. The nodes with the top-k highest projection scores are retained in the new adjacency matrix  . The   operation makes the projection vector   trainable by backpropagation, which otherwise would produce discrete outputs[17].

Self-Attention Pooling

edit

The Self-Attention Pooling layer [18] can be formalised as follows:

 
 
 
 

Where   is the matrix of node features,   is the graph adjacency matrix,   denotes element-wise matrix multiplication, and   is a generic permutation equivariant GNN layer (e.g., GCN, GAT, MPNN).

The Self-Attention Pooling layer can be seen as an extension of the Top-k Pooling layer. Differently from Top-k Pooling, the self-attention scores computed in Self-Attention Pooling account both for the graph features and the graph topology.

Applications

edit

Protein Folding

edit

Graph Neural Networks are one of the main building blocks of AlphaFold, an artificial intelligence program developed by [[Google]'s DeepMind for solving the protein folding problem in biology. AlphaFold achieved first place in several CASP competitions[25][26].

Social Networks

edit

Social networks are a major application domain for GNNs due to their natural representation as social graphs. GNNs are used to develop recommender systems based on both social relations and item relations[27][9].

Combinatorial Optimization

edit

GNNs are used as fundamental building blocks for several combinatorial optimization algorithm [28]. Examples include computing superior or competitive chip placements with handcrafted human solutions[29], and improving expert-designed bracing rules in Branch and Bound[30].

References

edit
  1. ^ a b Scarselli, Franco; Gori, Marco; Tsoi, Ah Chung; Hagenbuchner, Markus; Monfardini, Gabriele (2009). "The Graph Neural Network Model". IEEE Transactions on Neural Networks. 20 (1): 61–80. doi:10.1109/TNN.2008.2005605. ISSN 1941-0093. PMID 19068426. S2CID 206756462.
  2. ^ Sanchez-Lengeling, Benjamin; Reif, Emily; Pearce, Adam; Wiltschko, Alex (2021-09-02). "A Gentle Introduction to Graph Neural Networks". Distill. 6 (9): e33. doi:10.23915/distill.00033. ISSN 2476-0757.
  3. ^ Daigavane, Ameya; Ravindran, Balaraman; Aggarwal, Gaurav (2021-09-02). "Understanding Convolutions on Graphs". Distill. 6 (9): e32. doi:10.23915/distill.00032. ISSN 2476-0757. S2CID 239678898.
  4. ^ a b c d e f g Bronstein, Michael M.; Bruna, Joan; Cohen, Taco; Veličković, Petar (May 4, 2021). "Geometric Deep Learning: Grids, Groups, Graphs Geodesics and Gauges". arXiv. arXiv:2104.13478.
  5. ^ a b c d Kipf, Thomas N; Welling, Max (2016). "Semi-supervised classification with graph convolutional networks". International Conference on Learning Representations. 5 (1): 61–80. arXiv:1609.02907. doi:10.1109/TNN.2008.2005605. PMID 19068426. S2CID 206756462.
  6. ^ Hamilton, William; Ying, Rex; Leskovec, Jure (2017). "Inductive Representation Learning on Large Graphs" (PDF). Neural Information Processing Systems. 31. arXiv:1706.02216 – via Stanford.
  7. ^ a b c Veličković, Petar; Cucurull, Guillem; Casanova, Arantxa; Romero, Adriana; Liò, Pietro; Bengio, Yoshua (2018-02-04). "Graph Attention Networks". International Conference on Learning Representations. 6. arXiv:1710.10903.
  8. ^ a b c Veličković, Petar (2022). "Message passing all the way up". arXiv. arXiv:2202.11097.
  9. ^ a b Ying, Rex; He, Ruining; Chen, Kaifeng; Eksombatchai, Pong; Hamilton, William L.; Leskovec, Jure (2018). "Graph Convolutional Neural Networks for Web-Scale Recommender Systems". ACM SIGKDD Conference on Knowledge Discovery and Data Mining: 974–983. arXiv:1806.01973. doi:10.1145/3219819.3219890. ISBN 9781450355520. S2CID 46949657.
  10. ^ "Stanford Large Network Dataset Collection". snap.stanford.edu. Retrieved 2021-07-05.
  11. ^ Gilmer, Justin; Schoenholz, Samuel S.; Riley, Patrick F.; Vinyals, Oriol; Dahl, George E. (2017-07-17). "Neural Message Passing for Quantum Chemistry". International Conference on Machine Learning. PMLR: 1263–1272. arXiv:1704.01212.
  12. ^ Qasim, Shah Rukh; Kieseler, Jan; Iiyama, Yutaro; Pierini, Maurizio Pierini (2019). "Learning representations of irregular particle-detector geometry with distance-weighted graph networks". The European Physical Journal C. 79 (7). doi:10.1140/epjc/s10052-019-7113-9. S2CID 254109845.
  13. ^ Li, Zhuwen; Chen, Qifeng; Koltun, Vladlen (2018). "Combinatorial optimization with graph convolutional networks and guided tree search". Neural Information Processing Systems. 31: 537--546. arXiv:1810.10659.
  14. ^ Matthias, Fey; Lenssen, Jan E. (2019). "Fast Graph Representation Learning with PyTorch Geometric". ICLR Workshop on Representation Learning on Graphs and Manifolds. arXiv:1903.02428.
  15. ^ "Tensorflow GNN". GitHub. Retrieved 30 June 2022.
  16. ^ "jraph". GitHub. Retrieved 30 June 2022.
  17. ^ a b c Gao, Hongyang; Ji, Shuiwang Ji (2019). "Graph U-Nets". International Conference on Machine Learning. arXiv:1905.05178.
  18. ^ a b Lee, Junhyun; Lee, Inyeop; Kang, Jaewoo (2019). "Self-Attention Graph Pooling". International Conference of Machine Learning. arXiv:1904.08082.
  19. ^ a b Liu, Chuang; Zhan, Yibing; Li, Chang; Du, Bo; Wu, Jia; Hu, Wenbin; Liu, Tongliang; Tao, Dacheng (2022). "Graph Pooling for Graph Neural Networks: Progress, Challenges, and Opportunities". arXiv. arXiv:2204.07321.
  20. ^ Douglas, B. L. (2011-01-27). "The Weisfeiler–Lehman Method and Graph Isomorphism Testing". arXiv:1101.5211 [math.CO].
  21. ^ Xu, Keyulu; Hu, Weihua; Leskovec, Jure; Jegelka, Stefanie (2019-02-22). "How Powerful are Graph Neural Networks?". International Conference on Learning Representations. 7. arXiv:1810.00826.
  22. ^ Bodnar, Christian; Frasca, Fabrizio; Guang Wang, Yu; Otter, Nina; Montúfar, Guido; Liò, Pietro; Bronstein, Michael (2021). "Weisfeiler and Lehman Go Topological: Message Passing Simplicial Networks". arXiv. arXiv:2103.03212.
  23. ^ Chen, Deli; Lin, Yankai; Li, Wei; Li, Peng; Zhou, Jie; Sun, Xu (2020). "Measuring and Relieving the Over-smoothing Problem for Graph Neural Networks from the Topological View". AAAI Conference on Artificial Intelligence. 34 (4): 3438–3445. arXiv:1909.03211. doi:10.1609/aaai.v34i04.5747. S2CID 202539008.
  24. ^ Alon, Uri; Yahav, Eran (2021). "On the Bottleneck of Graph Neural Networks and its Practical Implications". International Conference on Learning Representations. arXiv:2006.05205.
  25. ^ Sample, Ian (2 December 2018). "Google's DeepMind predicts 3D shapes of proteins". The Guardian. Retrieved 30 November 2020.
  26. ^ "DeepMind's protein-folding AI has solved a 50-year-old grand challenge of biology". MIT Technology Review. Retrieved 30 November 2020.
  27. ^ Fan, Wenqi; Ma, Yao; Li, Qing; He, Yuan; Zhao, Eric; Tang, Jiliang; Yin, Dawei (2019). "Graph Neural Networks for Social Recommendation". The Web Conference: 417–426. arXiv:1902.07243. doi:10.1145/3308558.3313488. hdl:10397/81232. ISBN 9781450366748. S2CID 67769538.
  28. ^ Cappart, Quentin; Chételat, Didier; Khalil, Elias; Lodi, Andrea; Morris, Christopher; Veličković, Petar (2021). "Combinatorial optimization and reasoning with graph neural networks". arXiv. arXiv:2102.09544.
  29. ^ Mirhoseini, Azalia; Goldie, Anna; Yazgan, Mustafa; Jiang, Joe Wenjie; Songhori, Ebrahim; Wang, Shen; Lee, Young-Joon; Johnson, Eric; Pathak, Omkar; Nazi, Azade; Pak, Jiwoo; Tong, Andy; Srinivasa, Kavya; Hang, William; Tuncer, Emre; Le, Quoc V.; Laudon, James; Ho, Richard; Carpenter, Roger; Dean, Jeff (2021). "A graph placement methodology for fast chip design". Nature. 594 (7862): 207–212. doi:10.1038/s41586-021-03544-w. PMID 34108699. S2CID 235395490.
  30. ^ Gasse, Maxime; Chételat, Didier; Ferroni, Nicola; Charlin, Laurent; Lodi, Andrea (2019). "Exact Combinatorial Optimization with Graph Convolutional Neural Networks". Conference and Workshop on Neural Information Processing Systems. arXiv:1906.01629.
Cite error: A list-defined reference named "defferrard2017" is not used in the content (see the help page).

Category:Machine learning Category:Semisupervised learning Category:Supervised learning Category:Unsupervised learning Category:Artificial neural networks Category:Graph algorithms