Mathematics desk
< December 27	<< Nov \| December \| Jan >>	December 29 >

Welcome to the Wikipedia Mathematics Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.

December 28

Probability of lucky streak in Poisson process

Given positive integers n, m and real $T>0$ , suppose I have iid exponential variables (rate 1) $X_{1},X_{2},\ldots ,X_{n}$ . What is the probability p that there exists an integer index $1\leq i\leq n-m$ such that $\sum _{j=i}^{i+m}X_{j}<T$ ?

Intuition: The variables represent time between successive events in a Poisson process. An index i as described represents the start of a lucky streak, where $m+1$ events happen within a span of time T (assumed to be lower than the average time it should take). I want to find the probability of finding such a streak.

If it matters, in my application $n=2(m+1)$ .

The probability v that a specific index starts a streak is easy to calculate using the Poisson or Erlang distribution. This gives a bound of $2v\leq p\leq (m+2)v$ . But because of the correlations between the different indices, I can't figure out a way to calculate the exact probability.

If a symbolic expression can't be found, I will also appreciate methods to calculate it numerically or simulate it which improve upon the most naive simulation (my T is such that the event is fairly rare, so a naive simulation will take a lot of tries to reach sufficient precision). -- Meni Rosenfeld (talk) 00:34, 28 December 2015 (UTC)[reply]

For every i the variable

∑_j=i^m+i X_j

has a gamma distribution. These distributions are identical, but only approximately independent. The condition that at least one of the sums is below the threshold means that not all the sums are above the threshold, and the latter probability is approximately equal to the product of the probabilities that any of the sums is above the threshold

P(∃i:∑_j=i^m+i X_j < T) = 1−P(∀i:∑_j=i^m+i X_j ≥ T) ≃ 1−∏_i=1^n−m P(∑_j=i^m+i X_j ≥ T).

Bo Jacoby (talk) 23:07, 29 December 2015 (UTC).[reply]

Thanks. Since we're talking about events with low probability, multiplying the negative probabilities will give more or less the same result as adding the positive probabilities. This gives rise to the

p\leq (m+2)v

bound. In my application the ratio between the approximate and real values is about 2, which is not good enough - I need a better approximation. -- Meni Rosenfeld (talk) 23:23, 29 December 2015 (UTC)[reply]

(I corrected my bounds above to ∏_i=1^n−m). What is the use of better precision than what is computed by naive simulation? What are your values for m and T? Why not use the formula 1−∏(1−P_i) rather than the approximate formula ∑ P_i ? Bo Jacoby (talk) 05:27, 30 December 2015 (UTC).[reply]

My T is 24 and m ranges between 48 and 53.

The formulas 1−∏(1−P_i and ∑ P_i are almost identical because the P's are very low (

(1-a)(1-b)\approx 1-(a+b)

for

a\approx 0,b\approx 0

). They're both approximations. One assumes the events are independent and the other assumes they're mutually exclusive, which for low-probability events is almost the same thing. And with my number they're both off by a factor of ~2.

Because the probabilities are low, a naive simulation will require many particles to obtain even one hit, and many more to get low variance on the estimate. The estimator relative variance is

{\frac {1}{pn}}

so for low p the variance is huge unless n is huge.

And I need better precision because... I need better precision. I'm trying to find the correct answer to the question.

I've tried importance sampling but that also didn't give satisfactory results. There's another thing I'm trying out now. -- Meni Rosenfeld (talk) 11:30, 30 December 2015 (UTC)[reply]

When evaluating the gamma distribution you may benefit from the nice formula

\int _{y}^{\infty }{x^{i} \over e^{x}i!}dx=\sum _{k=0}^{i}{y^{k} \over e^{y}k!}

Bo Jacoby (talk) 18:38, 30 December 2015 (UTC).[reply]

If you make n experiments having success probability p, then the unknown number k of successes has a binomial distribution, k ≃ μ±σ where μ = np and σ²μ⁻¹ = 1−p , but note that if you make n experiments giving k successes, then the unknown success probability p has a beta distribution, p ≃ μ±σ where μ = (k+1)(n+2)⁻¹ and σ²μ⁻¹ = (n−k+1)(n+2)⁻¹(n+3)⁻¹ .

Bo Jacoby (talk) 07:50, 31 December 2015 (UTC).[reply]

Answer

The probability that $\sum _{i=1}^{m}X_{i}>T$ is $\int _{T}^{\infty }{x^{m-1} \over e^{x}(m-1)!}dx=\sum _{k=0}^{m-1}{T^{k} \over e^{T}k!}$ .

The requested probability is $p=1-\left(\sum _{k=0}^{m-1}{T^{k} \over e^{T}k!}\right)^{n-m}$ .

For T=24, m=48, and n=2(m+1)=98 we have $p=1-\left(\sum _{k=0}^{47}{24^{k} \over e^{24}k!}\right)^{50}=0.000521288$

Do you agree?

Bo Jacoby (talk) 15:24, 1 January 2016 (UTC).[reply]

The stochastic variable $Y=\sum _{i=1}^{m}X_{i}=\mu +\sigma Z$ has mean value $\mu =m$ and standard deviation $\sigma =m^{2^{-1}}$ . For T=24 and m=48 we have ${(T-\mu )\sigma ^{-1}}=-2^{1}3^{2^{-1}}$ .

Assuming that Z has approximately a standard normal distribution:

p_{1}=P(Y<T)\approx (2\pi )^{-2^{-1}}\int _{-\infty }^{-2^{1}3^{2^{-1}}}e^{-2^{-1}z^{2}}dz=0.000266002752570

.

The gamma distribution gave

p_{2}=P(Y<T)\approx 1-e^{-24}\sum _{k=0}^{47}{24^{k}(k!)^{-1}}=0.000010428432773

.

The normal distribution approximation must be rejected.

  erf   =: (1 H. 1.5)@*:*2p_0.5&*%^@:*: NB. error function
  n01cdf=: -:@>:@erf@%&(%:2)            NB. cumulative normal dist
  0j15":p1=.n01cdf -2*%:3
0.000266002752570
  -.50^~-.p1
0.0132138
  50*p1
0.0133001
  0j15":p2=.-.(+/(T^k)%!k=.i.48)*^-T=.24
0.000010428432773
  -.50^~-.p2
0.000521288

Bo Jacoby (talk) 14:41, 2 January 2016 (UTC).[reply]

Mathematical algorithm

(Removed duplicate question. See WP:RDS#Which algorithm to use. -- ToE 13:39, 28 December 2015 (UTC))[reply]

Wikipedia:Reference desk/Archives/Mathematics/2015 December 28

Contents

December 28

Probability of lucky streak in Poisson process

Answer

Mathematical algorithm