Kendall tau distance

The Kendall tau rank distance is a metric that counts the number of pairwise disagreements between two ranking lists. The larger the distance, the more dissimilar the two lists are. Kendall tau distance is also called bubble-sort distance since it is equivalent to the number of swaps that the bubble sort algorithm would take to place one list in the same order as the other list. The Kendall tau distance was created by Maurice Kendall.

DefinitionEdit

The Kendall tau ranking distance between two lists   and   is

 

where

  •   and   are the rankings of the element   in   and   respectively.

  will be equal to 0 if the two lists are identical and   (where   is the list size) if one list is the reverse of the other. Often Kendall tau distance is normalized by dividing by   so a value of 1 indicates maximum disagreement. The normalized Kendall tau distance therefore lies in the interval [0,1].

Kendall tau distance may also be defined as

 

where

  • P is the set of unordered pairs of distinct elements in   and  
  •   = 0 if i and j are in the same order in   and  
  •   = 1 if i and j are in the opposite order in   and  

Kendall tau distance can also be defined as the total number of discordant pairs.

Kendall tau distance in Rankings: A permutation (or ranking) is an array of N integers where each of the integers between 0 and N-1 appears exactly once. The Kendall tau distance between two rankings is the number of pairs that are in different order in the two rankings. For example, the Kendall tau distance between 0 3 1 6 2 5 4 and 1 0 3 6 4 2 5 is four because the pairs 0-1, 3-1, 2-4, 5-4 are in different order in the two rankings, but all other pairs are in the same order.[1]

If Kendall tau function is performed as   instead of   (where   and   are the rankings of   and   elements respectively), then triangular inequality is not guaranteed. The triangular inequality fails in cases where there are repetitions in the lists. So then we are not dealing with a metric anymore.

ExampleEdit

Suppose one ranks a group of five people by height and by weight:

Person A B C D E
Rank by height 1 2 3 4 5
Rank by weight 3 4 1 2 5

Here person A is tallest and third-heaviest, and so on.

In order to calculate the Kendall tau distance, pair each person with every other person and count the number of times the values in list 1 are in the opposite order of the values in list 2.

Pair Height Weight Count
(A,B) 1 < 2 3 < 4
(A,C) 1 < 3 3 > 1 X
(A,D) 1 < 4 3 > 2 X
(A,E) 1 < 5 3 < 5
(B,C) 2 < 3 4 > 1 X
(B,D) 2 < 4 4 > 2 X
(B,E) 2 < 5 4 < 5
(C,D) 3 < 4 1 < 2
(C,E) 3 < 5 1 < 5
(D,E) 4 < 5 2 < 5

Since there are four pairs whose values are in opposite order, the Kendall tau distance is 4. The normalized Kendall tau distance is

 

A value of 0.4 indicates that 40% of pairs differ in ordering between the two lists.

Computing the Kendall tau distanceEdit

Given two rankings  , it is possible to rename the items such that  . Then, the problem of computing the Kendall tau distance reduces to computing the number of inversions in   --- the number of index pairs   such that   while  . There are several algorithms for calculating this number.

  • A simple algorithm based on merge sort requires time  .[2]
  • A more advanced algorithm requires time  .[3]
  • The Python scipy.stats library contains a function for calculating Kendall's tau which also handles ties in rankings.[4]

See alsoEdit

ReferencesEdit

  1. ^ http://algs4.cs.princeton.edu/25applications/
  2. ^ Ionescu, Vlad. "calculating the number of "inversions" in a permutation". Stack overflow. Retrieved 24 February 2017.
  3. ^ Chan, Timothy M.; Pătraşcu, Mihai (2010). "Counting Inversions, Offline Orthogonal Range Counting, and Related Problems". Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms. p. 161. CiteSeerX 10.1.1.208.2715. doi:10.1137/1.9781611973075.15. ISBN 978-0-89871-701-3.
  4. ^ https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.stats.kendalltau.html. Retrieved 24 February 2017. Missing or empty |title= (help)

External linksEdit