User:Manudouz/sandbox/single-linkage clustering

Working example

edit

First step

edit
  • First clustering

Let us assume that we have five elements   and the following matrix   of pairwise distances between them:

a b c d e
a 0 17 21 31 23
b 17 0 30 34 21
c 21 30 0 28 39
d 31 34 28 0 43
e 23 21 39 43 0

In this example,   is the lowest value of  , so we join elements   and  .

  • First branch length estimation

Let   denote the node to which   and   are now connected. Setting   ensures that elements   and   are equidistant from  . This corresponds to the expectation of the ultrametricity hypothesis. The branches joining   and   to   then have lengths   (see the final dendrogram)

  • First distance matrix update

We then proceed to update the initial proximity matrix   into a new proximity matrix   (see below), reduced in size by one row and one column because of the clustering of   with  . Bold values in   correspond to the new distances, calculated by retaining the minimum distance between each element of the first cluster   and each of the remaining elements:

 

 

 

Italicized values in   are not affected by the matrix update as they correspond to distances between elements not involved in the first cluster.

Second step

edit
  • Second clustering

We now reiterate the three previous steps, starting from the new distance matrix   :

(a,b) c d e
(a,b) 0 21 31 21
c 21 0 28 39
d 31 28 0 43
e 21 39 43 0

Here,   and   are the lowest values of  , so we join cluster   with element   and with element  .

  • Second branch length estimation

Let   denote the node to which  ,   and   are now connected. Because of the ultrametricity constraint, the branches joining   or   to  , and   to  , and also   to   are equal and have the following total length:  

We deduce the missing branch length:   (see the final dendrogram)

  • Second distance matrix update

We then proceed to update the   matrix into a new distance matrix   (see below), reduced in size by two rows and two columns because of the clustering of   with   and with   :  

Final step

edit

The final   matrix is:

((a,b),c,e) d
((a,b),c,e) 0 28
d 28 0

So we join clusters   and  .

Let   denote the (root) node to which   and   are now connected. The branches joining   and   to   then have lengths:

 

We deduce the remaining branch length:

 

The single-linkage dendrogram

edit

 
Single Linkage Dendrogram 5S data

The dendrogram is now complete. It is ultrametric because all tips ( ,  ,  ,  , and  ) are equidistant from   :

 

The dendrogram is therefore rooted by  , its deepest node.