Loading [MathJax]/jax/output/HTML-CSS/jax.js
Skip to main content
Library homepage
 

Text Color

Text Size

 

Margin Size

 

Font Type

Enable Dyslexic Font
Statistics LibreTexts

3.4: Forecasting

( \newcommand{\kernel}{\mathrm{null}\,}\)

Suppose that the variables X1,,Xn of a weakly stationary time series (Xt:tZ) have been observed with the goal to predict or forecast the future values of Xn+1,Xn+2,. The focus is here on so-called one-step best linear predictors (BLP). These are, by definition, linear combinations

ˆXn+1=ϕn0+ϕn1Xn++ϕnnX1

of the observed variables X1,,Xn that minimize the mean-squared error

E[{Xn+1g(X1,,Xn)}2]

for functions g of X1,,Xn. Straightforward generalizations yield definitions for the m-step best linear predictors ˆXn+m of Xn+m for arbitrary mN in the same fashion. Using Hilbert space theory, one can prove the following theorem which will be the starting point for our considerations.

Theorem 3.4.1: Best linear prediction (BLP)

Let (Xt:tZ) be a weakly stationary stochastic process of which X1,,Xn are observed. Then, the one-step BLP ˆXn+1 of Xn+1 is determined by the equations

E[(Xn+1ˆXn+1)Xn+1j]=0

for all j=1,,n+1, where X0=1.

The equations specified in Theorem 3.4.1 can be used to calculate the coefficients ϕn0,,ϕnn in Equation ???. It suffices to focus on mean zero processes (Xt:tZ) and thus to set ϕn0=0 as the following calculations show. Assume that E[Xt]=μ for all tZ. Then, Theorem 3.4.1 gives that E[ˆXn+1]=E[Xn+1]=μ (using the equation with j=n+1. Consequently, it holds that

μ=E[ˆXn+1]=E[ϕn0+n=1ϕnXn+1]=ϕn0+n=1ϕnμ.

Using now that ϕn0=μ(1ϕn1ϕnn), Equation ??? can be rewritten as

ˆYn+1=ϕn1Yn++ϕnnY1,

where ˆYn+1=ˆXn+1μ has mean zero.

With the ACVF γ of (Xt:tZ), the equations in Theorem 3.4.1 can be expressed as

n=1ϕnγ(j)=γ(j),j=1,,n.

Note that due to the convention ϕn0=0, the last equation in Theorem 3.4.1 (for which j=n+1) is omitted. More conveniently, this is restated in matrix notation. To this end, let Γn=(γ(j))j,=1,,n, ϕn=(ϕn1,,ϕnn)T and γn=(γ(1),,γ(n))T, where T denotes the transpose. With these notations, (3.4.2.) becomes

Γnϕn=γnϕn=Γ1nγn,

provided that Γn is nonsingular.

The determination of the coefficients ϕn has thus been reduced to solving a linear equation system and depends only on second-order properties of (Xt:tZ) which are given by the ACVF γ.

Let Xn=(Xn,Xn1,,X1)T. Then, ˆXn+1=ϕTnXn. To assess the quality of the prediction, one computes the mean-squared error with the help of Equation ??? as follows:

Pn+1=E[(Xn+1ˆXn+1)2]=E[(Xn+1ϕTnXn)2]=E[(Xn+1γTnΓ1nXn)2]=E[X2n+12γTnΓ1nXnXn+1+γTnΓ1nXnXTnΓ1nγn]=γ(0)2γTnΓ1nγn+γTnΓ1nΓnΓ1nγn=γ(0)γTnΓ1nγn.

As an initial example, we explain the prediction procedure for an autoregressive process of order 2.

Example 3.4.1: Prediction of an AR(2) Process

Let (Xt:tZ) be the causal AR(2) process Xt=ϕ1Xt1+ϕ2Xt2+Zt. Suppose that only an observation of X1 is available to forecast the value of X2. In this simplified case, the single prediction Equation ??? is

ϕ11γ(0)=γ(1),

so that ϕ11=ρ(1) and ˆX1+1=ρ(1)X1.

In the next step, assume that observed values of X1 and X2 are at hand to forecast the value of X3. Then, one similarly obtains from (3.4.2.) that the predictor can be computed from

ˆX2+1=ϕ21X2+ϕ22X1=ϕT2X2=(Γ12γ2)TX2=(γ(1),γ(2))(γ(0)γ(1)γ(1)γ(0))1(X2X1).

However, applying the arguments leading to the definition of the PAC in Section 3.3.3., one finds that

E[{X3(ϕ1X2+ϕ2X1)}X1]=E[Z3X1]=0,

E[{X3(ϕ1X2+ϕ2X1)}X2]=E[Z3X2]=0.

Hence, ˆX2+1=ϕ1X2+ϕ2X1 and even ˆXn+1=ϕ1Xn+ϕ2Xn1 for all n2, exploiting the particular autoregressive structure.
Since similar results can be proved for general causal AR(p) processes, the one-step predictors have the form

ˆXn+1=ϕ1Xn++ϕpXnp+1

whenever the number of observed variables n is at least p.

The major drawback of this approach is immediately apparent from the previous example: For larger sample sizes n, the prediction procedure requires the calculation of the inverse matrix Γ1n which is computationally expensive. In the remainder of this section, two recursive prediction methods are introduced that bypass the inversion altogether. They are known as Durbin-Levinson algorithm and innovations algorithm. Finally, predictors based on the infinite past are introduced which are often easily applicable for the class of causal and invertible ARMA processes.

Method 1: The Durbin-Levinson algorithm

If (Xt:tZ) is a zero mean weakly stationary process with ACVF γ such that γ(0)>0 and γ(h)0 as h, then the coefficients ϕn in (3.4.2.) and the mean squared errors Pn in (3.4.4.) satisfy the recursions

ϕ11=γ(1)γ(0),P0=γ(0),

and, for n1,

ϕnn=1Pn1(γ(n)n1=1ϕn1,γ(n)),

(ϕn1 ϕn,n1)=(ϕn1,1 ϕn1,n1)ϕnn(ϕn1,n1 ϕn1,1)

and

Pn=Pn1(1ϕ2nn).

It can be shown that under the assumptions made on the process (Xt:tZ), it holds indeed that ϕnn is equal to the value of the PACF of (Xt:tZ) at lag n. The result is formulated as Corollary 5.2.1 in Brockwell and Davis (1991). This fact is highlighted in an example.

The PACF of an AR(2) process

Let (Xt:tZ) be a causal AR(2) process. Then, ρ(1)=ϕ1/(1ϕ2) and all other values can be computed recursively from

ρ(h)ϕ1ρ(h1)ϕ2ρ(h2)=0,h2.

Note that the ACVF γ satisfies a difference equation with the same coefficients, which is seen by multiplying the latter equation with γ(0). Applying the Durbin-Levinson algorithm gives first that

ϕ11=γ(1)γ(0)=ρ(1)andP1=P0(1ϕ211)=γ(0)(1ρ(1)2).

Ignoring the recursion for the error terms Pn in the following, the next ϕn values are obtained a

ϕ22=1P1[γ(2)ϕ11γ(1)]=11ρ(1)2[ρ(2)ρ(1)2]

=ϕ21(1ϕ2)1+ϕ2[ϕ1(1ϕ2)1]21[ϕ1(1ϕ2)1]2=ϕ2,

ϕ21=ϕ11ϕ22ϕ11=ρ(1)(1ϕ2)=ϕ1,

ϕ33=1P2[γ(3)ϕ21γ(2)ϕ22γ(1)]=1P2[γ(3)ϕ1γ(2)ϕ2γ(2)]=0.

Now, referring to the remarks after Example 3.3.7., no further computations are necessary to determine the PACF because ϕnn=0 for all n>p=2.

Method 2: The innovations algorithm

In contrast to the Durbin-Levinson algorithm, this method can also be applied to nonstationary processes. It should thus, in general, be preferred over Method 1. The innovations algorithm gets its name from the fact that one directly uses the form of the prediction equations in Theorem 3.4.1. which are stated in terms of the innovations (Xt+1ˆXt+1)tZ. Observe that the sequence consists of uncorrelated random variables.

The one-step predictors ˆXn+1 can be calculated from the recursions

ˆX0+1=0,P1=γ(0)

and, for n1,

ˆXn+1=n=1θn(Xn+1ˆXn+1)

Pn+1=γ(0)n1=0θ2n,nP+1,

where the coefficients are obtained from the equations

θn,n=1P+1[γ(n)1i=0θ,iθn,niPi+1],=0,1,,n1.

As example we show how the innovations algorithm is applied to a moving average time series of order 1.

Example 3.4.3: Prediction of an MA(1) Process

Let (Xt:tZ) be the MA(1) process Xt=Zt+θZt1. Note that

γ(0)=(1+θ2)σ2,γ(1)=θσ2andγ(h)=0(h2).

Using the innovations algorithm, one can compute the one-step predictor from the values

θn1=θσ2Pn,θn=0(=2,,n1),

and

P1=(1+θ2)σ2,Pn+1=(1+θ2θθn1)σ2

as

ˆXn+1=θσ2Pn(XnˆXn).

Method 3: Prediction based on the infinite past

Suppose that a causal and invertible ARMA(p,q) process is analyzed. Assume further that (unrealistically) the complete history of the process can be stored and that thus all past variables (Xt:tn) can be accessed. Define then

˜Xn+m=E[Xn+m|Xn,Xn1,],

as the m-step ahead predictor based on the infinite past. It can be shown that, for large sample sizes n, the difference between the values of ˆXn+m and ˜Xn+m vanishes at an exponential rate. Exploiting causality and invertibility of the ARMA process, one can transform the predictor ˜Xn+m so that it is in a computationally more feasible form. To do so, note that by causality

˜Xn+m=E[Xn+m|Xn,Xn1,]=E[j=0ψjZn+mj|Xn,Xn1,]=j=mψjZn+mj

because E[Zt|Xn,Xn1,] equals zero if t>n and equals Z_t if tn (due to invertibility!). The representation in (3.4.5.) can be used to compute the mean squared prediction error ˜Pn+m. It follows from causality that

˜Pn+m=E[(Xn+m˜Xn+m)2]=E[(m1j=0ψjZn+mj)2]=σ2m1j=0ψ2j.

On the other hand, Equation 3.4.34 does not allow to directly calculate the forecasts because ˜Xn+m is given in terms of the noise variables Zn+mj. Instead invertibility will be utilized. Observe first that

E[Xn+mj|Xn,Xn1,]={˜Xn+mj,j<m.Xn+mj,jm.

By invertibility (the ``0='' part follows again from causality),

0=E[Zn+m|Xn,Xn1,]=E[j=0πjXn+mj|Xn,Xn1,]=j=0πjE[Xn+mj|Xn,Xn1,].

Combining the previous two statements, yields

˜Xn+m=m1j=1πj˜Xn+mjj=mπjXn+mj.

The equations can now be solved recursively for m=1,2, Note, however, that for any m1 the sequence (Xn+m+t˜Xn+m+t:tZ) does not consist of uncorrelated random variables. In fact, if hN0, it holds that

E[(Xn+m˜Xn+m)(Xn+m+h˜Xn+m+h)]=E[m1j=0ψjZn+mjm+h1i=0ψiZn+m+hi]=σ2m1j=0ψjψj+h.

Finally, for practical purposes the given forecast needs to be truncated. This is accomplished by setting

j=n+mπjXn+mj=0.

The resulting equations (see Equation ??? for comparison) yield recursively the truncated m-step predictors Xn+m:

Xn+m=m1j=1πjXn+mjn+m1j=mπjXn+mj.


This page titled 3.4: Forecasting is shared under a not declared license and was authored, remixed, and/or curated by Alexander Aue.

Support Center

How can we help?