Working example

edit
 
Neighbor joining with 5 taxa. In this case 2 neighbor joining steps give a tree with fully resolved topology. The branches of the resulting tree are labeled with their lengths.

The working example is based on a JC69 genetic distance matrix computed from the 5S ribosomal RNA sequence alignment of five bacteria: Bacillus subtilis ( ), Bacillus stearothermophilus ( ), Lactobacillus viridescens ( ), Acholeplasma modicum ( ), and Micrococcus luteus ( ).[1][2]

First step

edit
  • First clustering

Let us assume that we have five elements   and the following matrix   of pairwise distances between them:

a b c d e
a 0 17 21 31 23
b 17 0 30 34 21
c 21 30 0 28 39
d 31 34 28 0 43
e 23 21 39 43 0
 
30.7
34.0
39.3
45.3
42.0

For each element  , we calculate   :

      (where  )
      (where   and    )

For example:

 
 

and so on for  ,  , and  .

First step

edit
  • First joining

We calculate the values of the   matrix:

 

For example, for element  :

 
 
 
 

We obtain the following values for the   matrix (the diagonal elements of the matrix are not used and are omitted here):

a b c d e
a −47.7 −49.0 −45.0 −49.7
b −47.7 −43.3 −45.3 −55.0
c −49.0 −43.3 −56.7 −42.3
d −45.0 −45.3 −56.7 −44.3
e −49.7 −55.0 −42.3 −44.3

In the example above,  . This is the smallest value of  , so we join elements   and  .

  • First branch length estimation

Let   denote the new node. By equation (2), above, the branches joining   and   to   then have lengths:

 
 
  • First distance matrix update

We then proceed to update the initial distance matrix   into a new distance matrix   (see below), reduced in size by one row and one column because of the joining of   with   into their neighbor  . Using equation (3) above, we compute the distance from   to each of the other nodes besides   and  . In this case, we obtain:

 
 
 

The resulting distance matrix   is:

u c d e
u 0 7 7 6
c 7 0 8 7
d 7 8 0 3
e 6 7 3 0

Bold values in   correspond to the newly calculated distances, whereas italicized values are not affected by the matrix update as they correspond to distances between elements not involved in the first joining of taxa.

Second step

edit
  • Second joining

The corresponding   matrix is:

u c d e
u −28 −24 −24
c −28 −24 −24
d −24 −24 −28
e −24 −24 −28

We may choose either to join   and  , or to join   and  ; both pairs have the minimal   value of  , and either choice leads to the same result. For concreteness, let us join   and   and call the new node  .

  • Second branch length estimation

The lengths of the branches joining   and   to   can be calculated:

 
 

The joining of the elements and the branch length calculation help drawing the neighbor joining tree as shown in the figure.

  • Second distance matrix update

The updated distance matrix   for the remaining 3 nodes,  ,  , and  , is now computed:

 
 
v d e
v 0 4 3
d 4 0 3
e 3 3 0

Final step

edit

The tree topology is fully resolved at this point. However, for clarity, we can calculate the   matrix. For example:

 
v d e
v −10 −10
d −10 −10
e −10 −10

For concreteness, let us join   and   and call the last node  . The lengths of the three remaining branches can be calculated:

 
 
 

The neighbor joining tree is now complete, as shown in the figure.

Conclusion: additive distances

edit

This example represents an idealized case: note that if we move from any taxon to any other along the branches of the tree, and sum the lengths of the branches traversed, the result is equal to the distance between those taxa in the input distance matrix. For example, going from   to   we have  . A distance matrix whose distances agree in this way with some tree is said to be 'additive', a property which is rare in practice. Nonetheless it is important to note that, given an additive distance matrix as input, neighbor joining is guaranteed to find the tree whose distances between taxa agree with it.

  1. ^ Erdmann VA, Wolters J (1986). "Collection of published 5S, 5.8S and 4.5S ribosomal RNA sequences". Nucleic Acids Research. 14 Suppl (Suppl): r1-59. PMC 341310. PMID 2422630.
  2. ^ Olsen GJ (1988). "Phylogenetic analysis using ribosomal RNA". Methods in Enzymology. 164: 793–812. PMID 3241556.