Talk:Theil–Sen estimator
Theil–Sen estimator has been listed as one of the Mathematics good articles under the good article criteria. If you can improve it further, please do so. If it no longer meets these criteria, you can reassess it. Review: October 20, 2018. (Reviewed version). |
A fact from Theil–Sen estimator appeared on Wikipedia's Main Page in the Did you know column on 8 July 2011 (check views). The text of the entry was as follows:
|
This article is rated GA-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||||||||||||
|
tau
editQuote: "As Sen observed, this estimator is the value that makes the Kendall tau rank correlation coefficient comparing the sample data values yi with their estimated values mxi + b become approximately zero."
Really? Then the method gives an estimation (mxi + b) completely uncorrelated with the estimated variable (yi)? Olaf (talk) 00:49, 27 April 2014 (UTC)
- No, it means that roughly half the yi are greater than the corresponding mxi+b, and roughly half are less. Deltahedron (talk) 19:49, 27 April 2014 (UTC)
- No, it's not median error supposed to be equal to zero as it would be in your interpretation, it's Kendall's tau rank correlation. Counterexample: if yi = xi, then the estimator mxi + b = 1xi + 0 = xi = yi and thus the tau correlation between the estimator mxi + b and the original value yi is equal to one, instead of zero. Olaf (talk) 20:07, 27 April 2014 (UTC)
- That's not a particularly good counterexample, since the number of concordant and the number of discordant pairs are both zero, and hence tau=0. Deltahedron (talk) 20:11, 27 April 2014 (UTC)
- Let's check: y1=1, y2=2, y3=3.
- Estimations: Y1=1, Y2=2, Y3=3
- Concordant pairs:
- 1<2 and y1 < Y2
- 1<3 and y1 < Y3
- 2<3 and y2 < Y3
- Tied pairs: none
- Discordant pairs: none.
- Tau = 1
- In absence of tied ranks the tau correlation has the same property as Pearson's correlation: tau(A,A) = 1, and we have no tied ranks, if ai <> aj when i<>j
- Olaf (talk) 20:23, 27 April 2014 (UTC)
- That's not a particularly good counterexample, since the number of concordant and the number of discordant pairs are both zero, and hence tau=0. Deltahedron (talk) 20:11, 27 April 2014 (UTC)
- No, it's the residuals that are all equal and hence uncorrelated. Deltahedron (talk) 20:37, 27 April 2014 (UTC)
- Yes, and the article supposed, it's the estimated values, not their residuals. Now it's fixed ([1]). Thank you for the references. Olaf (talk) 20:43, 27 April 2014 (UTC)
- No, it's the residuals that are all equal and hence uncorrelated. Deltahedron (talk) 20:37, 27 April 2014 (UTC)
- However, what's important is what independent reliable sources say. Searching "Theil Sen" "Kendall tau" in Google Books gave me: [2], [3], [4] which support the assertion of the text (unlike the reference to Rousseeuw & Leroy (2003), pp. 67, 164 which did not). Deltahedron (talk) 20:19, 27 April 2014 (UTC)
- Ok, so it's tau correlation between estimation error and X value equal to zero, not between estimator and estimated value! (the second reference). Olaf (talk) 20:26, 27 April 2014 (UTC)
- Thanks for clearing this up. —David Eppstein (talk) 22:36, 27 April 2014 (UTC)
- Ok, so it's tau correlation between estimation error and X value equal to zero, not between estimator and estimated value! (the second reference). Olaf (talk) 20:26, 27 April 2014 (UTC)
Bias
editThe statement on unbiasedness,
The Theil–Sen estimator is an unbiased estimator of the true slope in simple linear regression
is unfounded. The corresponding source explicitly states that Sen's claim to that effect is incorrect. It should be removed. Muhali (talk) 08:38, 14 February 2017 (UTC)
- Just dug a little deeper. Their counterexample is built on asymmetric noise, which is somewhat rare, so maybe we just keep it the way it is stated now. Muhali (talk) 09:04, 14 February 2017 (UTC)
Accuracy of the estimated slope
editThe description seems to be of a kind of percentile bootstrap, but as far as I can see, this is incorrect. The procedure described here would yield a 95% interval for the sampled slopes, not (as it should) of their median. A reference for the described procedure is missing. Maybe someone has a good reference to a good way of doing this? (I don't have one handy now.) --Han691 (talk) 17:23, 19 August 2019 (UTC)
Broken Link
editThe link for reference 24 is broken. 172.56.200.238 (talk) 23:18, 15 July 2024 (UTC)
- Updated. —David Eppstein (talk) 10:10, 16 July 2024 (UTC)