# Clustering coefficient

In graph theory, a clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster together. Evidence suggests that in most real-world networks, and in particular social networks, nodes tend to create tightly knit groups characterised by a relatively high density of ties; this likelihood tends to be greater than the average probability of a tie randomly established between two nodes (Holland and Leinhardt, 1971;[1] Watts and Strogatz, 1998[2]).

Two versions of this measure exist: the global and the local. The global version was designed to give an overall indication of the clustering in the network, whereas the local gives an indication of the embeddedness of single nodes.

## Global clustering coefficient

The global clustering coefficient is based on triplets of nodes. A triplet is three nodes that are connected by either two (open triplet) or three (closed triplet) undirected ties. A triangle graph therefore includes three closed triplets, one centered on each of the nodes (n.b. this means the three triplets in a triangle come from overlapping selections of nodes). The global clustering coefficient is the number of closed triplets (or 3 x triangles) over the total number of triplets (both open and closed). The first attempt to measure it was made by Luce and Perry (1949).[3] This measure gives an indication of the clustering in the whole network (global), and can be applied to both undirected and directed networks (often called transitivity, see Wasserman and Faust, 1994, page 243[4]).

The global clustering coefficient is defined as:

${\displaystyle C={\frac {\mbox{number of closed triplets}}{\mbox{number of all triplets (open and closed)}}}}$ .

The number of closed triplets has also been referred to as 3 × triangles in the literature, so:

${\displaystyle C={\frac {3\times {\mbox{number of triangles}}}{\mbox{number of all triplets}}}}$ .

A generalisation to weighted networks was proposed by Opsahl and Panzarasa (2009),[5] and a redefinition to two-mode networks (both binary and weighted) by Opsahl (2009).[6]

## Local clustering coefficient

Example local clustering coefficient on an undirected graph. The local clustering coefficient of the blue node is computed as the proportion of connections among its neighbours which are actually realised compared with the number of all possible connections. In the figure, the blue node has three neighbours, which can have a maximum of 3 connections among them. In the top part of the figure all three possible connections are realised (thick black segments), giving a local clustering coefficient of 1. In the middle part of the figure only one connection is realised (thick black line) and 2 connections are missing (dotted red lines), giving a local cluster coefficient of 1/3. Finally, none of the possible connections among the neighbours of the blue node are realised, producing a local clustering coefficient value of 0.

The local clustering coefficient of a vertex (node) in a graph quantifies how close its neighbours are to being a clique (complete graph). Duncan J. Watts and Steven Strogatz introduced the measure in 1998 to determine whether a graph is a small-world network.

A graph ${\displaystyle G=(V,E)}$  formally consists of a set of vertices ${\displaystyle V}$  and a set of edges ${\displaystyle E}$  between them. An edge ${\displaystyle e_{ij}}$  connects vertex ${\displaystyle v_{i}}$  with vertex ${\displaystyle v_{j}}$ .

The neighbourhood ${\displaystyle N_{i}}$  for a vertex ${\displaystyle v_{i}}$  is defined as its immediately connected neighbours as follows:

${\displaystyle N_{i}=\{v_{j}:e_{ij}\in E\lor e_{ji}\in E\}.}$

We define ${\displaystyle k_{i}}$  as the number of vertices, ${\displaystyle |N_{i}|}$ , in the neighbourhood, ${\displaystyle N_{i}}$ , of a vertex.

The local clustering coefficient ${\displaystyle C_{i}}$  for a vertex ${\displaystyle v_{i}}$  is then given by the proportion of links between the vertices within its neighbourhood divided by the number of links that could possibly exist between them. For a directed graph, ${\displaystyle e_{ij}}$  is distinct from ${\displaystyle e_{ji}}$ , and therefore for each neighbourhood ${\displaystyle N_{i}}$  there are ${\displaystyle k_{i}(k_{i}-1)}$  links that could exist among the vertices within the neighbourhood (${\displaystyle k_{i}}$  is the number of neighbours of a vertex). Thus, the local clustering coefficient for directed graphs is given as [2]

${\displaystyle C_{i}={\frac {|\{e_{jk}:v_{j},v_{k}\in N_{i},e_{jk}\in E\}|}{k_{i}(k_{i}-1)}}.}$

An undirected graph has the property that ${\displaystyle e_{ij}}$  and ${\displaystyle e_{ji}}$  are considered identical. Therefore, if a vertex ${\displaystyle v_{i}}$  has ${\displaystyle k_{i}}$  neighbours, ${\displaystyle {\frac {k_{i}(k_{i}-1)}{2}}}$  edges could exist among the vertices within the neighbourhood. Thus, the local clustering coefficient for undirected graphs can be defined as

${\displaystyle C_{i}={\frac {2|\{e_{jk}:v_{j},v_{k}\in N_{i},e_{jk}\in E\}|}{k_{i}(k_{i}-1)}}.}$

Let ${\displaystyle \lambda _{G}(v)}$  be the number of triangles on ${\displaystyle v\in V(G)}$  for undirected graph ${\displaystyle G}$ . That is, ${\displaystyle \lambda _{G}(v)}$  is the number of subgraphs of ${\displaystyle G}$  with 3 edges and 3 vertices, one of which is ${\displaystyle v}$ . Let ${\displaystyle \tau _{G}(v)}$  be the number of triples on ${\displaystyle v\in G}$ . That is, ${\displaystyle \tau _{G}(v)}$  is the number of subgraphs (not necessarily induced) with 2 edges and 3 vertices, one of which is ${\displaystyle v}$  and such that ${\displaystyle v}$  is incident to both edges. Then we can also define the clustering coefficient as

${\displaystyle C_{i}={\frac {\lambda _{G}(v)}{\tau _{G}(v)}}.}$

It is simple to show that the two preceding definitions are the same, since

${\displaystyle \tau _{G}(v)=C({k_{i}},2)={\frac {1}{2}}k_{i}(k_{i}-1).}$

These measures are 1 if every neighbour connected to ${\displaystyle v_{i}}$  is also connected to every other vertex within the neighbourhood, and 0 if no vertex that is connected to ${\displaystyle v_{i}}$  connects to any other vertex that is connected to ${\displaystyle v_{i}}$ .

### Network average clustering coefficient

As an alternative to the local clustering coefficient, the overall level of clustering in a network is measured by Watts and Strogatz[2] as the average of the local clustering coefficients of all the vertices ${\displaystyle n}$  :[7]

${\displaystyle {\bar {C}}={\frac {1}{n}}\sum _{i=1}^{n}C_{i}.}$

It is worth noting that this metric places more weight on the low degree nodes, while the transitivity ratio places more weight on the high degree nodes. In fact, a weighted average where each local clustering score is weighted by ${\displaystyle k_{i}(k_{i}-1)}$  is identical to the global clustering coefficient[original research?].

A graph is considered small-world, if the graph has a small mean-shortest path length that scales with the natural log of the number of nodes in the network, ${\displaystyle \ln {N}}$  [8]. For example, a random graph is small-world, while a lattice is not, and scale-free graphs are often considered ultra-small world, as their mean-shortest path length scales with ${\displaystyle \ln {\ln {N}}}$ .

A generalisation to weighted networks was proposed by Barrat et al. (2004),[9] and a redefinition to bipartite graphs (also called two-mode networks) by Latapy et al. (2008)[10] and Opsahl (2009).[6]

Alternative generalisations to weighted and directed graphs have been provided by Fagiolo (2007)[11] and Clemente and Grassi (2018)[12].

This formula is not, by default, defined for graphs with isolated vertices; see Kaiser (2008)[13] and Barmpoutis et al.[14] The networks with the largest possible average clustering coefficient are found to have a modular structure, and at the same time, they have the smallest possible average distance among the different nodes.[14]

## References

1. ^ P. W. Holland and S. Leinhardt (1971). "Transitivity in structural models of small groups". Comparative Group Studies. 2 (2): 107–124. doi:10.1177/104649647100200201.
2. ^ a b c D. J. Watts and Steven Strogatz (June 1998). "Collective dynamics of 'small-world' networks". Nature. 393 (6684): 440–442. Bibcode:1998Natur.393..440W. doi:10.1038/30918. PMID 9623998.
3. ^ R. D. Luce and A. D. Perry (1949). "A method of matrix analysis of group structure". Psychometrika. 14 (1): 95–116. doi:10.1007/BF02289146. PMID 18152948.
4. ^ Stanley Wasserman, Katherine Faust, 1994. Social Network Analysis: Methods and Applications. Cambridge: Cambridge University Press.
5. ^ Tore Opsahl and Pietro Panzarasa (2009). "Clustering in Weighted Networks". Social Networks. 31 (2): 155–163. doi:10.1016/j.socnet.2009.02.002.
6. ^ a b Tore Opsahl (2009). "Clustering in Two-mode Networks". Conference and Workshop on Two-Mode Social Analysis (Sept 30-Oct 2, 2009).
7. ^ Kemper, Andreas (2009). Valuation of Network Effects in Software Markets: A Complex Networks Approach. Springer. p. 142. ISBN 9783790823660.
8. ^ http://networksciencebook.com/4#ultra-small
9. ^ Barrat, A.; Barthelemy, M.; Pastor-Satorras, R.; Vespignani, A. (2004). "The architecture of complex weighted networks". Proceedings of the National Academy of Sciences. 101 (11): 3747–3752. arXiv:cond-mat/0311416. Bibcode:2004PNAS..101.3747B. doi:10.1073/pnas.0400087101. PMC 374315. PMID 15007165.
10. ^ Latapy, M.; Magnien, C.; Del Vecchio, N. (2008). "Basic Notions for the Analysis of Large Two-mode Networks". Social Networks. 30 (1): 31–48. doi:10.1016/j.socnet.2007.04.006.
11. ^ Fagiolo, G. (2007). "Clustering in complex directed networks". Physical Review E. 76 (2 Pt 2): 026107. CiteSeerX 10.1.1.262.1006. doi:10.1103/PhysRevE.76.026107. PMID 17930104.
12. ^ Clemente, G.P.; Grassi, R. (2018). "Directed clustering in weighted networks: A new perspective". Chaos, Solitons & Fractals. 107: 26–38. arXiv:1706.07322. Bibcode:2018CSF...107...26C. doi:10.1016/j.chaos.2017.12.007.
13. ^ Kaiser, Marcus (2008). "Mean clustering coefficients: the role of isolated nodes and leafs on clustering measures for small-world networks". New Journal of Physics. 10 (8): 083042. arXiv:0802.2512. Bibcode:2008NJPh...10h3042K. doi:10.1088/1367-2630/10/8/083042.
14. ^ a b Barmpoutis, D.; Murray, R. M. (2010). "Networks with the Smallest Average Distance and the Largest Average Clustering". arXiv:1007.4031 [q-bio.MN].