Wikipedia:Reference desk/Archives/Mathematics/2023 July 7

Mathematics desk
< July 6 << Jun | July | Aug >> July 8 >
Welcome to the Wikipedia Mathematics Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


July 7

edit

Regression to the mean

edit

Regression to the mean is a pretty old concept but I couldn't see something I thought would be mentioned there. Am I missing something? The article gives an estimate that the regression coefficient for height is 2/3. Assuming the next generation has the same distribution as their parents and the distribution for each child is the same, then I work out that the standard deviation of the heights of the children for each family as sqrt(1 - (2/3)^2) or about 3/4 of the overall standard deviation. Hope I've got that right! Does this come under something else perhaps? NadVolum (talk) 22:39, 7 July 2023 (UTC)[reply]

Let's assume, to keep this tractable, that the heights of any two parents are iid random variables with the same distribution as the population. Also, for simplicity, introduce the concept of z-height, which is a linear transformation of height   to z-height   so that the z-height can be treated as having the standard normal distribution. Also, let's simply define the mid-parent height as the arithmetic mean   of the two heights of the parents of a child. Let   be the linear regression coefficient of child height (the dependent variable) with respect to mid-parent height, so we can write the child height as   in which the random variable   is the residual. By definition, the expected value of   is  ; moreover, it may be assumed to be independent of the mid-parent height. Now   We have defined z-height such that   If the height-reproducing process is stationary (the z-height   of offspring also has the standard normal distribution), also   This then implies that   Specifically, when   this comes out as   not as    --Lambiam 12:03, 8 July 2023 (UTC)[reply]
Yes I'd assumed in effect a single parent with the original distribution rather than two parents at random from it which would make the standard deviation of height for the children neary 90% of that of the population as a whole - which I think is a bit surprising. But don't you think this is interesting enough it is surprising people don't seem to have made the calculation never mind shown how it works out in a case with two parents like this? And for height I must admit it does seem to me the association is rather random rather than tall people marrying tall ones and short marrying short! In other cases like batters hitting in a second season compared to the first there would not be two parents but there may be other factors and they may be interesting. NadVolum (talk) 12:41, 8 July 2023 (UTC)[reply]
It is interesting, but see WP:NOR. Trying to orient implication in the direction of cause → effect, it may be better to interpret the relation as   giving the steady-state population variance, given the regression coefficient and residual variance. BTW, many studies have found that preferences for similar height in mating selection are reflected in a correlation between the heights of actual couples.[1]  --Lambiam 14:02, 8 July 2023 (UTC)[reply]