Wikipedia:Reference desk/Archives/Mathematics/2020 August 25

Mathematics desk
< August 24 << Jul | August | Sep >> Current desk >
Welcome to the Wikipedia Mathematics Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


August 25

edit

Tricky quadruple integral

edit

Hi folks. I'm trying to compute:

 

Actually I'm not even sure on how to start. Starting in any order, produces annoying radicals already at the first integration. Maple is able to do the first 2 integrations in x and y, and somehow gets

 

and then it's stuck there. However, I'm able to go on integrating in polar coordinates: with

 

 ,

 

the integral becomes

 

 

 

But how to justify the first step? Thanks! 82.61.195.183 (talk) 11:09, 25 August 2020 (UTC)[reply]

I haven't checked details, but it looks like Maple is completing the square in the exponent for x and y variables. Specifically, the exponent can be written Ax2+2xz+Ay2+2yt where A=(z2+t2+1). Completing the squares leads to A(x+z/A)2+A(y+t/A)2-(z2+t2)/A = A(x+z/A)2+A(y+t/A)2-(A-1)/A. A natural substitution is then X=x+z/A, Y=y+t/A. I'm only using 8.5"x11" paper though and carrying out the actual substitution takes a bit more than that. (Perhaps I just need to write smaller.) But I'm fairly sure that when all the dust has settled you'd get a double integral very similar in form to what Maple produced. I can't help thinking though that Maple isn't doing it the best way, or at least not the best way for a human. --RDBury (talk) 12:38, 26 August 2020 (UTC)[reply]

Descriptive measure for decrease in standard error with increasing sample size

edit

I am trying to figure out how many replicates I need to run for a (long, costly) simulation to get a reasonably accurate estimate of a certain output value. My current approach is that I'm running batches of increasing numbers of replicates, then plot the pooled standard error for each batch vs number of replicates. Since sample size drives down SE, this results in a flattening-out of the series with increasing reps. Now I'd like to determine the point were I can say "flat enough, more reps are not improving SE substantially".

My question: is there a standard metric for that kind of evaluation? I suppose slope of the curve, or even just eyeballing it, would do in a pinch, but if there is an accepted metric that would be better. I guess this comes down to a general test for some threshold of asymptotic behaviour... --Elmidae (talk · contribs) 14:14, 25 August 2020 (UTC)[reply]

By standard error, do you mean the (corrected) sample standard deviation? And does "pooled" mean you include the earlier batches in the sample? If you are using the standard deviation (s.d.) as an estimate for the standard error of the population, then, since the latter remains presumably constant, the estimates should not show a trend, so by "flattening out", do you mean the curve gets smoother (less wobbly)? If so, what do you mean by "slope"? For a given sample size, for repeated (independent) samples, the s.d. is itself a random variable with a normal distribution whose mean is the standard error of the population and whose own standard error is also inversely proportional with the square root of the sample size. If you take the absolute values of the differences and smooth that out, you should get something that goes down more or less inversely proportional with the square root of the sample sizes. If you fit this with   for the best fit, you can estimate how far you need to go to get your wobbles below some desired  , namely  . If "pooling" means the inclusion of earlier batches, the subsequent samples are not independent, so the theoretical analysis does not quite apply, but I expect that this will work in practice.  --Lambiam 15:40, 25 August 2020 (UTC)[reply]
Ooh, didn't I know I was being imprecise in fifteen different ways there! :) I'm attaching a figure of one of my outcomes:

 

(that's ducks in an agent-based model, BTW) What I'm plotting here is the plain standard error of the mean. I.e., I'm not after population variance, but after how well my sample mean describes the population mean. The SE is computed over a given batch of replicates (which is what I meant by "pooled" - probably not a good word choice), where size of batch (number of replicates pooled together) increases along the x axis. So, at x=30 the sample size is 30 times as large as at x=1.
Just eyeballing this graph, I'd say I'm hitting diminishing returns between 25 and 30, so using 30 reps should be fine.
Regarding independence, I fear I'm muddying the waters a little by selecting the reps in each batch randomly (but without replacement) from the total pool of 30. So the two reps at x=2 are probably different from the three reps at x=3, but the difference between x=29 and x=30 is just one additional rep. Now I'm unsure whether that is better or worse than plain adding on of reps in sequence... --Elmidae (talk · contribs) 16:12, 25 August 2020 (UTC)[reply]
I am sorry, but I'm still a bit unclear about the procedure you are following to produce an "outcome". By "rep", do you mean "representative" (a member of a sample drawn from a population)? Or does it stand for "replicate" as in the original question – but if so, replicates of what precisely? Are all values drawn from the same population, or are there parameters you vary? I thought you were plotting the "pooled standard error for each batch" along the y-axis, but then why does the label say "mean"? Where does the "total pool of 30" come from? Is it a sample, and are your batches samples drawn from this (fixed) sample?  --Lambiam 17:01, 25 August 2020 (UTC)[reply]
My procedure is this: a simulation is run, which yields (for the present purpose) a single response value. Because the simulation has stochastic components, another run of the same model (a replicate) will yield a different response value. I make 30 replicate runs in total, which constitute the replicate pool, yielding 30 response values. I then create batches of 2,3,...,30 values by sampling without replacement from this pool; batch 30 contains all available values. I compute the standard error of the mean for each batch (across 2 values in batch 2, across 30 values in batch 30). That SE is on the y axis, and the number of replicates in batch (= number of values in batch; = batch ID) of the corresponding batch is on the x axis. (Ignore the axis label of the y-value - it's a "mean" at quite another level.)
SE decreases with increasing sample size, and I'm trying to find a way to decide at which point the expense in additional replicates is no longer worth the gain in SE minimization. In the most abstract way I can formulate it, my simulation produces results across a certain outcome space, and I want to find out how many samples I need to take from that space before I hit diminishing returns in terms of an accurate sample mean. It looks to me as if that happens around 20-25 samples (replicates), so with 30 I should be fine; but having a computed metric rather than a visual inspection would be better. --Elmidae (talk · contribs) 18:32, 25 August 2020 (UTC)[reply]
Urgh... okay, I think I'm only now perceiving the circularity in having the max sample size equal to the total population. I really need a population larger than max sample size to talk about finding a point of diminishing returns. Guess I'd better spin up a few more simulations. Sheesh. --Elmidae (talk · contribs) 19:13, 25 August 2020 (UTC)[reply]
Something else. How do you compute "the standard error of the mean" of a given batch? Our article gives the formula  , but this requires you know  , the standard deviation of the population. Do you estimate the latter using the sample standard deviation s, as suggested in our article? Estimating from the graph, the y-value at 15 is about 3.18, while that at 16 equals 2.08. So that gives s15 = 3.18 × √15 = 12.32, while for the next one we have s16 = 2.08 × √16 = 8.32 or so. That is a huge jump for two estimates of the same population parameter, given that the second batch contains only one extra value. The largest reduction would occur if that extra value was precisely equal to the previous sample mean. But even then, you could not have such a large decrease. The sum of squares of the deviations from the sample mean should be the same for the 15- and 16-batches if the new value is the old (and thus also the new) sample mean. Assuming you have applied Bessel's correction, the following relationship should then hold: 14s152 = 15s162, which means that s16 could go down by at most 3.4% from s15, not the 32% we are seeing. So what is going on?  --Lambiam 20:11, 25 August 2020 (UTC)[reply]
It's the standard error of the mean as calculated by Rmisc.SummarySE [1], which IIUC is indeed based on sample SD / sqrt(population size). The jumps in the series are presumably caused by me randomly resampling the pool of 30 values for each batch, thus the 15- and 16-size batches may, in the extreme case, have only a single value in common. This now strikes me as a rather stupid approach if going up to a sample size = population size, but when I have increased the population size (max number of replicates) to 60 (that's about as high as I can go before I run out of time and memory :p) while sticking with a max sample size of 30, I think it would make sense? --Elmidae (talk · contribs) 20:29, 25 August 2020 (UTC) - But really only if I then repeated that a large number of times and averaged the results, otherwise it's just replacing one fixed sequence with another. Boy, having to explain this to someone really shows up the chasms in the concept... lemme get back to the drawing board for a bit... --Elmidae (talk · contribs) 20:59, 25 August 2020 (UTC)[reply]
Even so, with the explanation of randomly resampling, the jump in s-value is remarkable. As the sample sizes go up, s should become a better and better estimate of  , so as you are cranking up the size, the curve should increasingly look more like that of   (where actually  ), without a clear point of diminishing returns setting in – it is diminishing returns all the way through. Doing this multi-batch thing does not really give you more useful information, so I think my very first suggestion stands: Pack Up Your Wobbles in a Single Batch, and 😊, 😊, 😊.  --Lambiam 21:05, 25 August 2020 (UTC)[reply]
What, and throw out two screens of R code? Never! - Well, all right... but only tomorrow... Thanks a lot for your help! :) --Elmidae (talk · contribs) 22:03, 25 August 2020 (UTC)[reply]
Plan to throw your precious code away; you will, anyhow.  --Lambiam 10:14, 26 August 2020 (UTC)[reply]
  • Personal story: at ~3months of PhD, I had the same problem of statistics: I needed to know a good estimate of the mean of a certain physical value that could be determined experimentally with some random noise on each run. One run was cheap (but not free) to make. I made 100 runs and my advisor demanded that I prove the value was sound (i.e. that pushing the number of experiments to 200 or 500 would not change the average; also, conversely, maybe 10 was enough and I just wasted time?). At some point in the (lengthy) discussion he argued that I should go ask the opinion of a numerical simulation guy about what convergence criteria to use.
Fortunately for both our reputations, before I asked an outside opinion, I produced a graph similar to Elmidae's above (running average of the first N trials vs. N, plotted for multiple random orderings of the run results), and it dawned on me that yeah, gaussian noise, inverse square law, the estimate of the mean is going to be off by about σ/sqrt(N), end of story. My opinion of my advisor's scientific competency took a big drop that day (to his credit, he quickly accepted the 1/sqrt(N) argument, but I still think he ought to have found it himself after 10+ years of experimental work). TigraanClick here to contact me 13:17, 26 August 2020 (UTC)[reply]
Yeah, it seems to have taken me well into my postdoc to (fractionally) realize that. Or the interpretation I prefer: I knew but forgot again at some point :p --Elmidae (talk · contribs) 19:19, 26 August 2020 (UTC)[reply]