Somers' D

In statistics, Somers’ D, sometimes incorrectly referred to as Somer’s D, is a measure of ordinal association between two possibly dependent random variables X and Y. Somers’ D takes values between when all pairs of the variables disagree and when all pairs of the variables agree. Somers’ D is named after Robert H. Somers, who proposed it in 1962.[1]

Somers’ D plays a central role in rank statistics and is the parameter behind many nonparametric methods.[2] It is also used as a quality measure of binary choice or ordinal regression (e.g., logistic regressions) and credit scoring models.

Somers’ D for sampleEdit

We say that two pairs   and   are concordant if the ranks of both elements agree, or   and   or if   and  . We say that two pairs   and   are discordant, if the ranks of both elements disagree, or if   and   or if   and  . If   or  , the pair is neither concordant nor discordant.

Let   be a set of observations of two possibly dependent random vectors X and Y. Define Kendall tau rank correlation coefficient   as

 

where   is the number of concordant pairs and   is the number of discordant pairs. Somers’ D of Y with respect to X is defined as  .[2] Note that Kendall's tau is symmetric in X and Y, whereas Somers’ D is asymmetric in X and Y.

As   quantifies the number of pairs with unequal X values, Somers’ D is the difference between the number of concordant and discordant pairs, divided by the number of pairs with X values in the pair being unequal.

Somers’ D for distributionEdit

Let two independent bivariate random variables   and   have the same probability distribution  . Again, Somers’ D, which measures ordinal association of random variables X and Y in  , can be defined through Kendall's tau

 

or the difference between the probabilities of concordance and discordance. Somers’ D of Y with respect to X is defined as  . Thus,   is the difference between the two corresponding probabilities, conditional on the X values not being equal. If X has a continuous probability distribution, then   and Kendall's tau and Somers’ D coincide. Somers’ D normalizes Kendall's tau for possible mass points of variable X.

If X and Y are both binary with values 0 and 1, then Somers’ D is the difference between two probabilities:

 

Somers' D for binary dependent variablesEdit

In practice, Somers' D is most often used when the dependent variable Y is a binary variable,[2] i.e. for binary classification or prediction of binary outcomes including binary choice models in econometrics. Methods for fitting such models include logistic and probit regression.

Several statistics can be used to quantify the quality of such models: area under the receiver operating characteristic (ROC) curve, Goodman and Kruskal's gamma, Kendall's tau (Tau-a), Somers’ D, etc. Somers’ D is probably the most widely used of the available ordinal association statistics.[3] Identical to the Gini coefficient, Somers’ D is related to the area under the receiver operating characteristic curve (AUC),[2]

 .

In the case where the independent (predictor) variable X is discrete and the dependent (outcome) variable Y is binary, Somers’ D equals

 

where   is the number of neither concordant nor discordant pairs that are tied on variable X and not on variable Y.

ExampleEdit

Suppose that the independent (predictor) variable X takes three values, 0.25, 0.5, or 0.75, and dependent (outcome) variable Y takes two values, 0 or 1. The table below contains observed combinations of X and Y:

Frequencies of
Y, X pairs
X
Y
0.25 0.5 0.75
0 3 5 2
1 1 7 6

The number of concordant pairs equals

 

The number of discordant pairs equals

 

The number of pairs tied is equal to the total number of pairs minus the concordant and discordant pairs

 

Thus, Somers’ D equals

 

ReferencesEdit

  1. ^ Somers, R. H. (1962). "A new asymmetric measure of association for ordinal variables". American Sociological Review. 27 (6). doi:10.2307/2090408. JSTOR 2090408.
  2. ^ a b c d Newson, Roger (2002). "Parameters behind "nonparametric" statistics: Kendall's tau, Somers' D and median differences". Stata Journal. 2 (1): 45–64.
  3. ^ O'Connell, A. A. (2006). Logistic Regression Models for Ordinal Response Variables. SAGE Publications.