User:Mct mht/Wide-sense stationary time series

Definition edit

Let {ξ_t} be a family of complex-valued random variables of mean zero indexed by t ∈ ℝ or ℤ. Such a family is said to be a wide-sense stationary stochastic process (or wide-sense stationary time series in the case of discrete time) when the covariance between any two members ξ_t and ξ_t, i.e.

R(t,s)=E(\xi _{t}\cdot {\bar {\xi }}_{s})

is finite and only depends on t - s. This implies that {ξ_t} lie in the Hilbert space L².

The function

R(t)=E(\xi _{t}\cdot {\bar {\xi }}_{0})

is called the autocovariance function of the process.

Spectral measure edit

Existence edit

The autocovariance function is by construction a positive definite function on the group ℝ (in the continuous time case) or ℤ (discrete time case). By Bochner's theorem, there exists a positive measure μ on ℝ or the unit circle T such that the Fourier transform of μ is R(t):

Examples edit

Some examples in the discrete time case:

An orthonormal sequence {ε_t} of random variables is called a white noise time series. The autocovariance funtion of is given by the Kronecker delta function on ℤ: R(t) = δ_{0 t}. The spectral measure is the Lebesgue measure dm on [0,1].

Let {a_k} be a l¹-sequence of complex numbers. A moving average time series {ξ_t} is formed by formally convolving {a_k} and the white noise {ε_t}:

\xi _{t}=\sum _{k\in \mathbb {Z} }a_{k}\epsilon _{t-k}.

The autocovariance function is given by convolution (denoted by *) between the sequence {a_k} and the entry-wise conjugate of {a_-k}:

R(t)=Cov(\xi _{0},\xi _{t})=\sum _{k\in \mathbb {Z} }{\overline {a}}_{k}a_{k+t}=((a_{k})*({{\overline {a}}_{-k}}))_{t}.

If a_k is only non-zero for 0 ≤ k ≤ p, then the process is said to be a one-sided moving average of order p. The Fourier transform of {a_k} in this case is a polynomial P. The Fourier transform of R(t) is simply the squared modulus of P:

|\sum _{k=0}^{p}a_{k}e^{-2\pi ik\lambda }|^{2}=|P(e^{-2\pi i\lambda })|^{2}.

The spectral measure is then absolutely continuous with respect to the Lebesgue measure with Radon-Nikodym derivative |P( e^{-2π i λ})|². This function is called the spectral density of the process.

An autoregressive process is a process of the form

\xi _{t}=\epsilon _{t}+\sum _{k=1}^{q}b_{k}\xi _{t-k},

where {ε_t} is a white noise process. When all zeros of the complex polynomial $\sum _{k=1}^{q}b_{k}z^{k}$ lies outside the unit disk, the stochastic difference equation defining an AR process has a wide-sense stationary solution. Consider here the Banach space consisting of sequences of L² random variables equipped with the supremum norm. Denote by L the shift operator on this space and Id the identity operator. Then the AR equation has the operator form

(Id-\sum _{k=1}^{q}b_{k}L^{k})(\xi _{t})=\epsilon _{t}.

By the spectral mapping theorem, the bounded operator $Id-\sum _{k=1}^{q}b_{k}L^{k}$ is invertible. von Neumann's inequality then implies that its inverse is the given by the series

\sum _{j=0}^{\infty }(\sum _{k=1}^{q}b_{k}L^{k})^{j}.

ARMA

Spectral analysis edit

First example: almost-periodic time series edit

The existence of a spectral measure is the starting point of Fourier analysis for stationary time series. The goal is to understand the series in terms of its frequency content.

It turns out that ξ_t can be viewed as a pure harmonic e^{-2π i λ_k t} with "random amplitudes" Z(λ). This is made precise by the notion of integration with respect to an orthogonal stochastic measure.

Consider the following special case. Let {ξ_t = ∑_{k = 1}^Nz_k e^{-2π i λ_k t}} where z_k, k = 1...N, are orthogonal L²-random variables with mean 0 and standard deviation ||z_k||₂ = σ_k. Such a stationary time series is said to be almost periodic. By definition, each ξ_t is a sum of "pure frequencies" e^{-2π i λ_k t} with "random amplitude" z_k of "intensity" σ_k. If one defines an discrete L²-valued measure Z on [0,1] by Z(Δ) = z_k for any Borel set Δ containing λ_k and no other λ 's, then each ξ_t is the stochastic integral of the pure harmonic e^{-2π i λ t} with respect to Z.

For an almost-periodic {ξ_t}, the autocovariance function is R(t) = ∑_{k = 1}^N σ_k² e^{-2π i λ_k t} and the spectral measure is the sum of Dirac measures dμ = ∑_{k = 1}^Nσ_k²δ_{λ_k}. The spectral measure gives a Hilbert space isomorphism from (L², μ) to the Hilbert subspace generated by {ξ_t}. Under this isomorphism, the image of the indicator function I_Δ where Δ is a Borel set containing λ_k and no other λ 's is precisely z_k. This is the stochastic measure for an almost-periodic process.

This discussion can be extended to an arbitrary stationary time series, and thus allows one to view the t-th element as the integral of the t-th harmonic with respect to a suitable stochastic measure. This is a Bochner's theorem for stationary time series: every stationary time series is the sequence of "Fourier coefficients" of a stochastic measure on the unit circle.

Orthogonal stochastic measures edit

Let (E, Ɛ) be a measurable space and Ɛ₀ ⊂ Ɛ an algebra of subsets. A map Z: Ɛ₀ → L²(Ω, P) is an orthogonal stochastic measure if it satisfies:

(Finite additivity) For any two disjoint Δ₁ and Δ₂ in Ɛ₀, Z(Δ₁ ∪ Δ₂) = Z(Δ₁) + Z(Δ₂).
(Orthogonality) For any two disjoint Δ₁ and Δ₂ in Ɛ₀, Z(Δ₁)⊥Z(Δ₂).

Such a measure is a special case of vector-valued measure.

Given such a Z, the function m(Δ) = E(|Z(Δ)|²) = ||Z(Δ)||₂ is a finitely additive positive measure on Ɛ₀, and therefore by Caratheodory's theorem can be extended to a finite positive measure on Ɛ. This measure, still denoted by m is called the structure function of Z.

The stochastic integral of f in L²(E, Ɛ) with respect to a stochastic measure Z is defined in a natural way as a unitary operator from L²(E, Ɛ, m) to L²(Ω, P). For any simple function f = a^k I_{Δ_k} in L²(E, Ɛ, m), define

\int fdZ(\Delta )=\sum _{k}a_{k}Z(\Delta _{k})\in L^{2}(\Omega ,P).

This defines a linear operator on the dense subspace of simple functions, and it preserves the inner product:

\langle f,g\rangle _{L^{2}(E,{\mathcal {E}},m)}=\langle \int fdZ(\Delta ),\int gdZ(\Delta )\rangle _{L^{2}(\Omega ,P)}.

Extending by continuity allows one to define the integral ∫ f dZ(Δ) for any f in L²(E, Ɛ, m).

Spectral resolution edit

As stated above, the spectral resolution is a Bochner's theorem for stationary time series.

Theorem For every stationary time series {ξ_t} with mean 0 and spectral measure μ, there exists an orthogonal stochastic measure Z = Z(Δ) defined on Borel subsets Δ of [0,1] such that

The variance of Z(Δ), ||Z(Δ)||₂ = E |Z(Δ)|² = μ(Δ).
For all t ∈ ℤ, ξ_t = ∫ e^{-2π i λ t} dZ(λ) P-almost everywhere.

The proof of the theorem follows the same outline as in the almost periodic case. Let L²(ξ) denote the Hilbert subspace generated by {ξ_t}. By definition of μ (and the Stone-Weierstrass theorem), the map ξ_t↦ e^{-2π i λ t} extends to a unitary operator U : L²(ξ) → L²([0,1], μ). Form an orthogonal stochastic measure by Z(Δ) = U^-1(I_Δ). Then by unitarity, ||Z(Δ)||² = ||I_Δ||² = μ(Δ). Therefore, crucially, the structure function of Z(Δ) is the spectral measure μ.

On the set of simple functions, the isomorphism U^-1 defined above agree with integration ∫ with respect to Z. Therefore, for any f ∈ L²([0,1], μ), U^-1(f) = ∫ f dZ(Δ) P-almost everywhere. In particular, it is true for f = e^{-2π i λ t}.

The distribution function associated to μ is sometimes called the spectral function of the time series {ξ_t}. Its stochastic analog is an stochastic process with orthogonal increments indexed by λ and defined using Z(Δ): Z_λ = Z([0, λ]).

L²-ergodic theorem edit

  0 should be replaced with one-half

The dominated convergence theorem yields that

{\frac {1}{n}}\sum _{t=0}^{n-1}\int e^{-2\pi it\lambda }d\mu \rightarrow \mu ({\{0\}}).

In terms of the autocovariance function R(t),

{\frac {1}{n}}\sum _{t=0}^{n-1}R(t)\rightarrow \mu ({\{0\}}).

Similarly, In L²([0,1], μ),

{\frac {1}{n}}\sum _{t=0}^{n-1}e^{-2\pi it\lambda }\rightarrow I_{\{0\}}.

Via the unitary operator ∫(⋅)dZ(Δ), we have the L²-ergodic theorem for stationary time series:

Theorem For any stationary time series {ξ_t} with mean m and corresponding stochastic measure Z,

{\frac {1}{n}}\sum _{t=0}^{n-1}(\xi _{t}-m)=({\frac {1}{n}}\sum _{t=0}^{n-1}\xi _{t})-m\;{\stackrel {L^{2}}{\longrightarrow }}\;Z({\{0\}}).

In particular, when μ({0}) = 0, then arithmetic mean/sample average (1/n)∑_{t = 0}ⁿ ξ_t of the time series converges to its true mean m in L². Conversely, when (1/n)∑_{t = 0}ⁿ ξ_t converges m in L², then μ({0}) must be 0 by the Cauchy-Schwarz inequality. In other words, a L²-law of large numbers hold for a stationary time series if and only if μ({0}) = 0.

When m = 0 and μ({0}) ≠ 0 (and consequently Z({0}) = α ≠ 0 in L²), one can apply the same calculation to the modified series η_t = ξ_t - α and obtain that (1/n)∑_{t = 0}ⁿ ξ_t converges to the "random constant" α in L^2</sup.

Filtering edit

The proof of the spectral resolution theorem constructs explicitly a unitary operator from L²([0,1], μ) to L²(ξ) which is integrating with respect to Z. Thus the theorem can be rephrased as follows:

Corollary For any η in L²(ξ), there exists a unique φ in L²([0,1], μ) such that η = ∫ φ dZ(λ). The image of η under U is φ.

In other words, any linear combination of {ξ_t} (and their L²-limits) can be obtained by integrating some φ in L²([0,1], μ) with respect to Z(Δ).

Of particular interest among such linear transformation are linear filters. Formally, a filter is represented by convolution with a l¹- or 1²-sequence {h(s)}_s∈ℤ. After receiving as input the time series {ξ_t}, the resulting output of the filter is

\eta _{t}=\sum _{s=-\infty }^{\infty }h(t-s)\xi _{s}.

The implementing sequence is called the impulse response of the filter. A filter is said to be physically realizable if h(s) = 0 for all s < 0, i.e. the output of the system only depends on past values of input. A moving-average process is obtained by filtering a white-noise process, and is physically realizable if it is a one-sided moving-average.

Assuming the series defining η_t converges in L², each η_t lies in L²(ξ) and therefore must be of the form η_t = ∫ φ_t dZ(λ) for some φ_t. In fact,

\eta _{t}=\int e^{-2\pi it}\phi (\lambda )dZ(\lambda ),

where φ(λ) = ∑_s∈ℤ h_s e^{-2π i λ s} is the Fourier transform of h; it is also called the spectral characteristic of the filter. In other words, in λ-domain the frequency content of the input {ξ_t} is filtered by φ(λ).

By the above calculation, a moving average process necessarily has a spectral density. In fact, the converse holds also: any stationary sequence with spectral density can be represented as a moving-average process (on a possibly "larger" probability space).

Characterization of process with "squared" spectral density. (One-sided MA)

Characterization of process with rational spectral density. (ARMA)

Statistical estimation edit

Consider a stationary time series {ξ_t} of mean m, autocovariance function R(t), and spectral density f.

For mean edit

Given observation x = (x₀,...,x_N-1) of size N from ξ₀...ξ_N-1, the sample mean is

m_{N}(x)={\frac {1}{N}}\sum _{t=0}^{N-1}x_{t}.

By linearity of expectation, m_N is a unbiased estimator for the true mean m. By the ergodic theorem above, m_N is also a consistent estimator in the L²-sense (the existence of the spectral density implies that μ(1/2) = 0).

For autocovariance function edit

For the autocovariance function R(n), it is natural to define the following estimator bases on N observations x = (x₀,...,x_N-1), where 0 ≤ n < N:

{\hat {R}}_{N}(n,x)={\frac {1}{N-n}}\sum _{k=0}^{N-1-n}x_{k}x_{n+k}.

This is an unbiased estimator for the elements of R(n) it computes:

{\mbox{E}}{\hat {R}}_{N}(n,x)=R(n),\;0\leq n<N.

Next we consider L²-consistency. Fix n, consider the series {η_t} = {ξ_tξ_{t + n}}. Each η_t has the same mean R(n). If this is again a stationary time series, and the hypothesis of the L²-law of large numbers is satisfied, then consistency holds:

{\frac {1}{n}}\sum _{t=0}^{n-1}\eta _{t}={\frac {1}{n}}\sum _{t=0}^{n-1}\xi _{t}\xi _{t+n}\rightarrow R(n),

i.e.

{\hat {R}}_{N}(n,x){\stackrel {L^{2}}{\longrightarrow }}R(n)\;{\mbox{as}}\;N\rightarrow \infty .

A special case under which these conditions can be easily characterized is when {ξ_t} is a Gaussian stationary series with mean 0. For jointly-normal random variables, the means and variance-covariance matrix specifies the joint distribution. So the Gaussian assumption implies that η_t is wide-sense stationary. Its autocovariance function is given by

Q(k)={\mbox{E}}[(\xi _{t+k}\xi _{k}-R(n))(\xi _{0}\xi _{k}-R(n))]={\mbox{E}}[(\xi _{0}\xi _{n}-R(n))(\xi _{k}\xi _{k+n}-R(n))]=R^{2}(k)+R(n+k)R(n-k).

For spectral density edit

Assume the spectral density f(λ) exists. Then the autocovariance function R(t) is the Fourier transform of f:

R(t)=\int f(\lambda )e^{-2\pi it\lambda }d\lambda ={\hat {f}}(t).

Recovering the L¹ function f on the circle from its Fourier series R(t) is a classical problem in Fourier analysis. The difficulty is due to fact that Fourier inversion theorem only applies for f in L¹(T) whose Fourier transform is an l¹-sequence. Even for a continuous f, the symmetric partial sum

S_{m}(f)=\sum _{t=-m}^{t=m}{\hat {f}}(t)e^{2\pi it\lambda }

diverges in general. (In fact there is a residual set of continuous functions in C(T) for whom S_m(f) diverges on a dense subset of T. See the article Convergence of Fourier series).

The classical remedy is to introduce a summability kernel Φ_s(t). Φ_s(t) should have the following property:

(Φ_s(t))_t∈ℤ that forms an approximate unit, as s→0, in the Banach algebra c₀ of sequences vanishing at infinity.
For each s, (Φ_s(t)) lies in the domain of the Fourier inversion theorem.

Then by the inversion theorem,

\sum _{t\in \mathbb {Z} }\Phi _{s}(t){\hat {f}}(t)e^{2\pi it\lambda }

converges to f in L¹ and, if f is continuous, uniformly as s→0. This works because the Fourier transforms of Φ_s(t) = Φ^_s(λ)forms an approximate unit in the convolution algebra L¹(T).

One example of a summability kernel is the Fejer kernel (let s = 1/N)

\Phi _{N}(t)=1-{\frac {|t|}{N}}\;{\mbox{if}}\;|t|\leq N,\;\;0\;{\mbox{otherwise}}.

It has Fourier transform

{\hat {\Phi }}_{N}(\lambda )={\frac {1}{N}}|\sum _{t=0}^{N-1}e^{2\pi it\lambda }|^{2}.

In the context of estimating the spectral density of a stationary time series, the same techniques apply but one need to replace R(t) by an appropriate estimator.

Wold decomposition edit

The spectral representation gives a integral decomposition of a stationary time series in the frequency domain; it provides a Fourier-type analysis for stationary time series. In contrast, Wolds's decomposition expresses a stationary time series as the sum of "deterministic" and "completely nondeterministic" parts in the time domain by using geometric features of Hilbert space.

For a stationary time series {ξ_t}, denote by L²(ξ) the Hilbert subspace generated by {ξ_t}_t∈ℤ and L²_t(ξ) the Hilbert subspace generated by {ξ_t, ξ_t-1}, ξ_t-2...}. Define

S(\xi )=\cap _{t}L_{t}^{2}(\xi ).

Then L²(ξ) can be written as an orthogonal sum

L^{2}(\xi )=R(\xi )\oplus S(\xi ).

Each ξ_t then is a corresponding orthogonal sum ξ_t = ξ_t^r + ξ_t^s where ξ_t^r ∈ R(ξ) and ξ_t^s ∈ S(ξ). Informally, the sequence {ξ_t^s} is the part of {ξ_t} that live in the infinite past ("at the beginning of time") and is the deterministic part of {ξ_t}.

More precisely, a time series {η_t) is called deterministic if S(η) = L²(η) and completely nondeterministic is R(η) = L²(η). For {ξ_t^r}, S(ξ^r) ⊥ S(ξ) because every ξ_t^r is orthogonal to S(ξ) by definition. But S(ξ^r) ⊂ S(ξ) also, which implies S(ξ^r) = {0}. So {ξ_t^r} is completely nondeterministic. For {ξ_t^s}, S(ξ^s) ⊂ L²(ξ^s) ⊂ S(ξ). But S(ξ) ⊂ L²_t(ξ^s) (⊕ L²_t(ξ^r)) for all t. So S(ξ) ⊂ S(ξ^s). This shows S(ξ^s) = L²(ξ^s), i.e. {ξ_t^s} is deterministic. One can also show this decomposition is unique. In summary, we have the following theorem.

Theorem For any stationary time series {ξ_t}, there exists a unique pair of time series {ξ_t^r} and {ξ_t^s} such that

ξ_t = ξ_t^r + ξ_t^s for all t.
{ξ_t^r} and {ξ_t^s} are orthogonal.
{ξ_t^r} is completely nondeterministic and {ξ_t^s} is deterministic.

Remark Wold's decomposition has a counterpart in operator theory, which bears the same name. The operator version says that any unitary operator on a Hilbert space can be decomposed into a unitary part and a completely nonunitary part. These correspond to the deterministic and completely nondeterministic part of a time series respectively.

Characterization of completely nondeterministic time series as one-sided moving averages edit

Let {ε_t} be a white-noise process. A one-sided moving average is an immediate example of a completely nondeterministic time series:

\xi _{t}=\sum _{k\geq 0}a_{k}\epsilon _{t-k}

for some l¹-sequence {a_k}. This in fact characterizes completely non-deterministic processes, i.e. they can all be viewed as the output signal of a physically realizable filter whose input is white noise.

A white-noise process {ε_t} is said to be an innovation process for {ξ_t} if L²_t(ε)= L²_t(ξ) for all t. Innovation" means ε_t+1 provided "new information" that is needed to form ξ_t+1, together with the past.

Theorem A stationary time series {ξ_t} is completely nondeterministic if and only if it is a one-sided moving average, i.e.

\xi _{t}=\sum _{k\geq 0}a_{k}\epsilon _{t-k}

for some (a_k) ∈ l² and some {ε_t} that is innovation for {ξ_t}. The convergence of the series holds in the L²-sense.

As stated above, sufficiency holds by definition. Necessity follows from the Gram-Schmidt procedure as follows: Fix t. Let ε₀ be a unit vector in

L_{t}^{2}(\xi )\ominus L_{t-1}^{2}(\xi )

and a₀ε₀ be the projection of ξ_t onto ε₀. By stationarity and the assumption that {ξ_t} is completely nondeterministic, for each s the subspaces

L_{t-s}^{2}(\xi )\ominus L_{t-s-1}^{2}(\xi )

is one-dimensional. (If any one of them is {0}, then it is {0} for any s by stationarity, in which case {ξ_t} is trivially deterministic.) So this procedure must produce an orthonormal basis for L²_t(ξ) and we have

\xi _{t}=\sum _{k\geq 0}a_{k}\epsilon _{t-k}

where {ε_t} is an innovation for {ξ_t} by construction. The coefficients a_k produced is independent of t by covariance-stationarity. This proves the theorem.

User:Mct mht/Wide-sense stationary time series

Contents

Definition edit