# 11.3: Ergodic Markov Chains**

- Page ID
- 3174

A second important kind of Markov chain we shall study in detail is an ergodic Markov chain, defined as follows.

Definition: ergodic chain

A Markov chain is called an *ergodic chain** *if it is possible to go from every state to every state (not necessarily in one move).

In many books, ergodic Markov chains are called .

Definition: Markov chain

A Markov chain is called a chain if some power of the transition matrix has only positive elements.

In other words, for some \(n\), it is possible to go from any state to any state in exactly \(n\) steps. It is clear from this definition that every regular chain is ergodic. On the other hand, an ergodic chain is not necessarily regular, as the following examples show.

Let the transition matrix of a Markov chain be defined by

\[\mat{P} = \begingroup \m@th \@tempdima 8.75\p@ \setbox\z@\vbox{% \def\cr{\crcr\noalign{\kern 2\p@\global\let\cr\endline}}% \ialign{$##$\hfil\kern 2\p@\kern\@tempdima&\thinspace\hfil$##$\hfil &&\quad\hfil$##$\hfil\crcr \omit\strut\hfil\crcr\noalign{\kern-\snellbaselineskip}% & 1 & 2 \cr 1 & 0 & 1 \cr 2 & 1 & 0\crcr\omit\strut\cr}}% \setbox\tw@\vbox{\unvcopy\z@\global\setbox\@ne\lastbox}% \setbox\tw@\hbox{\unhbox\@ne\unskip\global\setbox\@ne\lastbox}% \setbox\tw@\hbox{$\kern\wd\@ne\kern-\@tempdima\left(\kern-\wd\@ne \global\setbox\@ne\vbox{\box\@ne\kern 2\p@}% \vcenter{\kern-\ht\@ne\unvbox\z@\kern-\snellbaselineskip}\,\right)$}% \null\;\vbox{\kern\ht\@ne\box\tw@}\endgroup\ .\]

Then is clear that it is possible to move from any state to any state, so the chain is ergodic. However, if \(n\) is odd, then it is not possible to move from state 0 to state 0 in \(n\) steps, and if \(n\) is even, then it is not possible to move from state 0 to state 1 in \(n\) steps, so the chain is not regular.

A more interesting example of an ergodic, non-regular Markov chain is provided by the Ehrenfest urn model.

Recall the Ehrenfest urn model (Example [exam 11.1.6]). The transition matrix for this example is

\[\mat{P} = \begingroup \m@th \@tempdima 8.75\p@ \setbox\z@\vbox{% \def\cr{\crcr\noalign{\kern 2\p@\global\let\cr\endline}}% \ialign{$##$\hfil\kern 2\p@\kern\@tempdima&\thinspace\hfil$##$\hfil &&\quad\hfil$##$\hfil\crcr \omit\strut\hfil\crcr\noalign{\kern-\snellbaselineskip}% & 0 & 1 & 2 & 3 & 4 \cr 0 & 0 & 1 & 0 & 0 & 0 \cr 1 & 1/4 & 0 & 3/4 & 0 & 0 \cr 2 & 0 & 1/2 & 0 & 1/2 & 0 \cr 3 & 0 & 0 & 3/4 & 0 & 1/4 \cr 4 & 0 & 0 & 0 & 1 & 0\crcr\omit\strut\cr}}% \setbox\tw@\vbox{\unvcopy\z@\global\setbox\@ne\lastbox}% \setbox\tw@\hbox{\unhbox\@ne\unskip\global\setbox\@ne\lastbox}% \setbox\tw@\hbox{$\kern\wd\@ne\kern-\@tempdima\left(\kern-\wd\@ne \global\setbox\@ne\vbox{\box\@ne\kern 2\p@}% \vcenter{\kern-\ht\@ne\unvbox\z@\kern-\snellbaselineskip}\,\right)$}% \null\;\vbox{\kern\ht\@ne\box\tw@}\endgroup\ .\]

In this example, if we start in state 0 we will, after any even number of steps, be in either state 0, 2 or 4, and after any odd number of steps, be in states 1 or 3. Thus this chain is ergodic but not regular.

## Regular Markov Chains

Any transition matrix that has no zeros determines a regular Markov chain. However, it is possible for a regular Markov chain to have a transition matrix that has zeros. The transition matrix of the Land of Oz example of Section 1.1 has \(p_{NN} = 0\) but the second power \(\mat{P}^2\) has no zeros, so this is a regular Markov chain.

An example of a nonregular Markov chain is an absorbing chain. For example, let

\[\mat{P} = \pmatrix{ 1 & 0 \cr 1/2 & 1/2 \cr}\]

be the transition matrix of a Markov chain. Then all powers of \(\mat{P}\) will have a 0 in the upper right-hand corner.

We shall now discuss two important theorems relating to regular chains.

[thm 11.3.6] Let \(\mat{P}\) be the transition matrix for a regular chain. Then, as \(n \to \infty\), the powers \(\mat{P}^n\) approach a limiting matrix \(\mat{W}\) with all rows the same vector \(\mat{w}\). The vector \(\mat{w}\) is a strictly positive probability vector (i.e., the components are all positive and they sum to one).

In the next section we give two proofs of this fundamental theorem. We give here the basic idea of the first proof.

We want to show that the powers \(\mat{P}^n\) of a regular transition matrix tend to a matrix with all rows the same. This is the same as showing that \(\mat{P}^n\) converges to a matrix with constant columns. Now the \(j\)th column of \(\mat{P}^n\) is \(\mat{P}^{n} \mat{y}\) where \(\mat{y}\) is a column vector with \(1\) in the \(j\)th entry and 0 in the other entries. Thus we need only prove that for any column vector \(\mat{y},\mat{P}^{n} \mat{y}\) approaches a constant vector as \(n\) tend to infinity.

Since each row of \(\mat{P}\) is a probability vector, \(\mat{Py}\) replaces \(\mat{y}\) by averages of its components. Here is an example: \[\pmatrix{1/2 & 1/4 & 1/4 \cr 1/3 & 1/3 & 1/3 \cr 1/2 & 1/2 & 0\cr} \pmatrix {1 \cr 2 \cr 3 \cr} = \pmatrix{1/2 \cdot 1+ 1/4 \cdot 2+ 1/4 \cdot 3\cr 1/3 \cdot 1+ 1/3 \cdot 2+ 1/3 \cdot 3\cr 1/2 \cdot 1+ 1/2 \cdot 2+ 0 \cdot 3\cr} =\pmatrix {7/4 \cr 2 \cr 3/2 \cr}\ .\] The result of the averaging process is to make the components of \(\mat{Py}\) more similar than those of \(\mat{y}\). In particular, the maximum component decreases (from 3 to 2) and the minimum component increases (from 1 to 3/2). Our proof will show that as we do more and more of this averaging to get \(\mat{P}^{n} \mat{y}\), the difference between the maximum and minimum component will tend to 0 as \(n \rightarrow \infty\). This means \(\mat{P}^{n} \mat{y}\) tends to a constant vector. The \(ij\)th entry of \(\mat{P}^n\), \(p_{ij}^{(n)}\), is the probability that the process will be in state \(s_j\) after \(n\) steps if it starts in state \(s_i\). If we denote the common row of \(\mat{W}\) by \(\mat{w}\), then Theorem [thm 11.3.6] states that the probability of being in \(s_j\) in the long run is approximately \(w_j\), the \(j\)th entry of \(\mat{w}\), and is independent of the starting state.

[exam 11.3.1] Recall that for the Land of Oz example of Section 1.1, the sixth power of the transition matrix \(\mat{P}\) is, to three decimal places, \[\mat{P}^6 = \begingroup \m@th \@tempdima 8.75\p@ \setbox\z@\vbox{% \def\cr{\crcr\noalign{\kern 2\p@\global\let\cr\endline}}% \ialign{$##$\hfil\kern 2\p@\kern\@tempdima&\thinspace\hfil$##$\hfil &&\quad\hfil$##$\hfil\crcr \omit\strut\hfil\crcr\noalign{\kern-\snellbaselineskip}% & \mbox{R} & \mbox{N} & \mbox{S} \cr \mbox{R} & .4 & .2 & .4 \cr \mbox{N} & .4 & .2 & .4 \cr \mbox{S} & .4 & .2 & .4\crcr\omit\strut\cr}}% \setbox\tw@\vbox{\unvcopy\z@\global\setbox\@ne\lastbox}% \setbox\tw@\hbox{\unhbox\@ne\unskip\global\setbox\@ne\lastbox}% \setbox\tw@\hbox{$\kern\wd\@ne\kern-\@tempdima\left(\kern-\wd\@ne \global\setbox\@ne\vbox{\box\@ne\kern 2\p@}% \vcenter{\kern-\ht\@ne\unvbox\z@\kern-\snellbaselineskip}\,\right)$}% \null\;\vbox{\kern\ht\@ne\box\tw@}\endgroup\ .\] Thus, to this degree of accuracy, the probability of rain six days after a rainy day is the same as the probability of rain six days after a nice day, or six days after a snowy day. Theorem [thm 11.3.6] predicts that, for large \(n\), the rows of \(\mat{P}\) approach a common vector. It is interesting that this occurs so soon in our example.

Theorem \(\PageIndex{8}\)

et \(\mat {P}\) be a regular transition matrix, let \[\mat{W} = \lim_{n \rightarrow \infty} \mat{P}^n\ ,\] let \(\mat{w}\) be the common row of \(\mat{W}\), and let \(\mat{c}\) be the column vector all of whose components are 1. Then

- \(\mat{w}\mat{P} = \mat{w}\), and any row vector \(\mat{v}\) such that \(\mat{v}\mat{P} = \mat{v}\) is a constant multiple of \(\mat{w}\).
- \(\mat{P}\mat{c} = \mat{c}\), and any column vector such that \(\mat{P}\mat{x} = \mat{x}\) is a multiple of \(\mat{c}\).

Proof

To prove part (a), we note that from Theorem [thm 11.3.6], \[\mat{P}^n \to \mat{W}\ .\] Thus, \[\mat{P}^{n + 1} = \mat{P}^n \cdot \mat{P} \to \mat{W}\mat{P}\ .\] But \(\mat{P}^{n + 1} \to \mat{W}\), and so \(\mat{W} = \mat{W}\mat{P}\), and \(\mat{w} = \mat{w}\mat{P}\).

Let \(\mat{v}\) be any vector with \(\mat{v} \mat{P} = \mat{v}\). Then \(\mat{v} = \mat{v} \mat{P}^n\), and passing to the limit, \(\mat{v} = \mat{v} \mat{W}\). Let \(r\) be the sum of the components of \(\mat{v}\). Then it is easily checked that \(\mat{v}\mat{W} = r\mat{w}\). So, \(\mat{v} = r\mat{w}\).

To prove part (b), assume that \(\mat{x} = \mat{P} \mat{x}\). Then \(\mat{x} = \mat{P}^n \mat{x}\), and again passing to the limit, \(\mat{x} = \mat{W}\mat{x}\). Since all rows of \(\mat{W}\) are the same, the components of \(\mat{W}\mat{x}\) are all equal, so \(\mat{x}\) is a multiple of \(\mat{c}\).

\(\square\)

Note that an immediate consequence of Theorem [thm 11.3.8] is the fact that there is only one probability vector \(\mat{v}\) such that \(\mat{v}\mat{P} = \mat{v}\).

## Fixed Vectors

Definition: 11.3.1: Row and Column Vectors

A *row vector* \(\mat{w}\) with the property \(\mat{w}\mat{P} = \mat{w}\) is called a for \(\mat{P}\). Similarly, a *column vector *\(\mat{x}\) such that \(\mat{P}\mat{x} = \mat{x}\) is called a for \(\mat{P}\).

Thus, the common row of \(\mat{W}\) is the unique vector \(\mat{w}\) which is both a fixed row vector for \(\mat{P}\) and a probability vector. Theorem [thm 11.3.8] shows that any fixed row vector for \(\mat{P}\) is a multiple of \(\mat{w}\) and any fixed column vector for \(\mat{P}\) is a constant vector.

One can also state Definition [def 11.3.1] in terms of eigenvalues and eigenvectors. A fixed row vector is a left eigenvector of the matrix \(\mat{P}\) corresponding to the eigenvalue 1. A similar statement can be made about fixed column vectors.

We will now give several different methods for calculating the fixed row vector for a regular Markov chain.

[exam 11.3.2] By Theorem [thm 11.3.6] we can find the limiting vector \(\mat{w}\) for the Land of Oz from the fact that \[w_1 + w_2 + w_3 = 1\] and \[\pmatrix{ w_1 & w_2 & w_3 } \pmatrix{ 1/2 & 1/4 & 1/4 \cr 1/2 & 0 & 1/2 \cr 1/4 & 1/4 & 1/2 \cr} = \pmatrix{ w_1 & w_2 & w_3 }\ .\]

These relations lead to the following four equations in three unknowns: \[\begin{aligned} w_1 + w_2 + w_3 &=& 1\ , \\ (1/2)w_1 + (1/2)w_2 + (1/4)w_3 &=& w_1\ , \\ (1/4)w_1 + (1/4)w_3 &=& w_2\ , \\ (1/4)w_1 + (1/2)w_2 + (1/2)w_3 &=& w_3\ .\end{aligned}\]

Our theorem guarantees that these equations have a unique solution. If the equations are solved, we obtain the solution \[\mat{w} = \pmatrix{ .4 & .2 & .4 }\ ,\] in agreement with that predicted from \(\mat{P}^6\), given in Example [exam 11.1.1.5].

To calculate the fixed vector, we can assume that the value at a particular state, say state one, is 1, and then use all but one of the linear equations from \(\mat{w}\mat{P} = \mat{w}\). This set of equations will have a unique solution and we can obtain \(\mat{w}\) from this solution by dividing each of its entries by their sum to give the probability vector \(\mat{w}\). We will now illustrate this idea for the above example.

[exam 11.3.2.5](Example [exam 11.3.2] continued) We set \(w_1 = 1\), and then solve the first and second linear equations from \(\mat{w}\mat{P} = \mat{w}\). We have \[\begin{aligned} (1/2) + (1/2)w_2 + (1/4)w_3 &=& 1\ , \\ (1/4) + (1/4)w_3 &=& w_2\ . \\\end{aligned}\] If we solve these, we obtain \[\pmatrix{w_1&w_2&w_3} = \pmatrix{1&1/2&1}\ .\] Now we divide this vector by the sum of the components, to obtain the final answer: \[\mat{w} = \pmatrix{.4&.2&.4}\ .\] This method can be easily programmed to run on a computer.

As mentioned above, we can also think of the fixed row vector \(\mat{w}\) as a left eigenvector of the transition matrix \(\mat{P}\). Thus, if we write \(\mat{I}\) to denote the identity matrix, then \(\mat{w}\) satisfies the matrix equation \[\mat{w}\mat{P} = \mat{w}\mat{I}\ ,\] or equivalently, \[\mat{w}(\mat{P} - \mat{I}) = \mat{0}\ .\] Thus, \(\mat{w}\) is in the left nullspace of the matrix \(\mat{P} - \mat{I}\). Furthermore, Theorem [thm 11.3.8] states that this left nullspace has dimension 1. Certain computer programming languages can find nullspaces of matrices. In such languages, one can find the fixed row probability vector for a matrix \(\mat{P}\) by computing the left nullspace and then normalizing a vector in the nullspace so the sum of its components is 1.

The program **FixedVector** uses one of the above methods (depending upon the language in which it is written) to calculate the fixed row probability vector for regular Markov chains.

So far we have always assumed that we started in a specific state. The following theorem generalizes Theorem [thm 11.3.6] to the case where the starting state is itself determined by a probability vector.

thm 11.3.9

Let \(\mat{P}\) be the transition matrix for a regular chain and \(\mat{v}\) an arbitrary probability vector. Then \[\lim_{n \to \infty} \mat{v} \mat{P}^n = \mat{w}\ ,\] where \(\mat{w}\) is the unique fixed probability vector for \(\mat{P}\). By Theorem [thm 11.3.6], \[\lim_{n \to \infty} \mat{P}^n = \mat {W}\ .\] Hence, \[\lim_{n \to \infty} \mat {v} \mat {P}^n = \mat {v} \mat {W}\ .\] But the entries in \(\mat {v}\) sum to 1, and each row of \(\mat {W}\) equals \(\mat {w}\). From these statements, it is easy to check that \[\mat {v} \mat {W} = \mat {w}\ .\]

If we start a Markov chain with initial probabilities given by \(\mat{v}\), then the probability vector \(\mat{v} \mat{P}^n\) gives the probabilities of being in the various states after \(n\) steps. Theorem [thm 11.3.9] then establishes the fact that, even in this more general class of processes, the probability of being in \(s_j\) approaches \(w_j\).

## Equilibrium

We also obtain a new interpretation for \(\mat{w}\). Suppose that our starting vector picks state \(s_i\) as a starting state with probability \(w_i\), for all \(i\). Then the probability of being in the various states after \(n\) steps is given by \(\mat {w} \mat {P}^n = \mat {w}\), and is the same on all steps. This method of starting provides us with a process that is called “stationary." The fact that \(\mat{w}\) is the only probability vector for which \(\mat {w} \mat {P} = \mat {w}\) shows that we must have a starting probability vector of exactly the kind described to obtain a stationary process.

Many interesting results concerning regular Markov chains depend only on the fact that the chain has a unique fixed probability vector which is positive. This property holds for all ergodic Markov chains.

[thm 11.3.10] For an ergodic Markov chain, there is a unique probability vector \(\mat{w}\) such that \(\mat {w} \mat {P} = \mat {w}\) and \(\mat{w}\) is strictly positive. Any row vector such that \(\mat {v} \mat {P} = \mat {v}\) is a multiple of \(\mat{w}\). Any column vector \(\mat{x}\) such that \(\mat {P} \mat {x} = \mat {x}\) is a constant vector. This theorem states that Theorem [thm 11.3.8] is true for ergodic chains. The result follows easily from the fact that, if \(\mat{P}\) is an ergodic transition matrix, then \(\bar{\mat{P}} = (1/2)\mat {I} + (1/2)\mat {P}\) is a regular transition matrix with the same fixed vectors (see Exercises [exer 11.3.24]–[exer 11.3.27]).

For ergodic chains, the fixed probability vector has a slightly different interpretation. The following two theorems, which we will not prove here, furnish an interpretation for this fixed vector.

[thm 11.3.11] Let \(\mat{P}\) be the transition matrix for an ergodic chain. Let \(\mat {A}_n\) be the matrix defined by \[\mat {A}_n = \frac{\mat {I} + \mat {P} + \mat {P}^2 +\cdots + \mat {P}^n}{n + 1}\ .\] Then \(\mat {A}_n \to \mat {W}\), where \(\mat{W}\) is a matrix all of whose rows are equal to the unique fixed probability vector \(\mat{w}\) for \(\mat{P}\).

If \(\mat{P}\) is the transition matrix of an ergodic chain, then Theorem [thm 11.3.8] states that there is only one fixed row probability vector for \(\mat{P}\). Thus, we can use the same techniques that were used for regular chains to solve for this fixed vector. In particular, the program **FixedVector** works for ergodic chains.

To interpret Theorem [thm 11.3.11], let us assume that we have an ergodic chain that starts in state \(s_i\). Let \(X^{(m)} = 1\) if the \(m\)th step is to state \(s_j\) and 0 otherwise. Then the average number of times in state \(s_j\) in the first \(n\) steps is given by \[H^{(n)} = \frac{X^{(0)} + X^{(1)} + X^{(2)} +\cdots+ X^{(n)}}{n + 1}\ .\] But \(X^{(m)}\) takes on the value 1 with probability \(p_{ij}^{(m)}\) and 0 otherwise. Thus \(E(X^{(m)}) = p_{ij}^{(m)}\), and the \(ij\)th entry of \(\mat {A}_n\) gives the expected value of \(H^{(n)}\), that is, the expected proportion of times in state \(s_j\) in the first \(n\) steps if the chain starts in state \(s_i\).

If we call being in state \(s_j\) and any other state we could ask if a theorem analogous to the law of large numbers for independent trials holds. The answer is yes and is given by the following theorem.

**(Law of Large Numbers for Ergodic Markov Chains)**[thm 11.3.12] Let \(H_j^{(n)}\) be the proportion of times in \(n\) steps that an ergodic chain is in state \(s_j\). Then for any \(\epsilon > 0\), \[P\Bigl(|H_j^{(n)} - w_j| > \epsilon\Bigr) \to 0\ ,\] independent of the starting state \(s_i\).

We have observed that every regular Markov chain is also an ergodic chain. Hence, Theorems [thm 11.3.11] and [thm 11.3.12] apply also for regular chains. For example, this gives us a new interpretation for the fixed vector \(\mat {w} = (.4,.2,.4)\) in the Land of Oz example. Theorem [thm 11.3.11] predicts that, in the long run, it will rain 40 percent of the time in the Land of Oz, be nice 20 percent of the time, and snow 40 percent of the time.

## Simulation

We illustrate Theorem [thm 11.3.12] by writing a program to simulate the behavior of a Markov chain. **SimulateChain** is such a program.

[exam 11.3.2.6] In the Land of Oz, there are 525 days in a year. We have simulated the weather for one year in the Land of Oz, using the program **SimulateChain**. The results are shown in Table [table 11.2].

```
SSRNRNSSSSSSNRSNSSRNSRNSSSNSRRRNSSSNRRSSSSNRSSNSRRRRRRNSSS
SSRRRSNSNRRRRSRSRNSNSRRNRRNRSSNSRNRNSSRRSRNSSSNRSRRSSNRSNR
RNSSSSNSSNSRSRRNSSNSSRNSSRRNRRRSRNRRRNSSSNRNSRNSNRNRSSSRSS
NRSSSNSSSSSSNSSSNSNSRRNRNRRRRSRRRSSSSNRRSSSSRSRRRNRRRSSSSR
RNRRRSRSSRRRRSSRNRRRRRRNSSRNRSSSNRNSNRRRRNRRRNRSNRRNSRRSNR
RRRSSSRNRRRNSNSSSSSRRRRSRNRSSRRRRSSSRRRNRNRRRSRSRNSNSSRRRR
RNSNRNSNRRNRRRRRRSSSNRSSRSNRSSSNSNRNSNSSSNRRSRRRNRRRRNRNRS
SSNSRSNRNRRSNRRNSRSSSRNSRRSSNSRRRNRRSNRRNSSSSSNRNSSSSSSSNR
NSRRRNSSRRRNSSSNRRSRNSSRRNRRNRSNRRRRRRRRRNSNRRRRRNSRRSSSSN
SNS
```

State | Times | Fraction |

R | 217 | .413 |

N | 109 | .208 |

S | 199 | .379 |

We note that the simulation gives a proportion of times in each of the states not too different from the long run predictions of .4, .2, and .4 assured by Theorem [thm 11.3.6]. To get better results we have to simulate our chain for a longer time. We do this for 10,000 days without printing out each day’s weather. The results are shown in Table [table 11.3]. We see that the results are now quite close to the theoretical values of .4, .2, and .4.

State | Times | Fraction |

R | 4010 | .401 |

N | 1902 | .19 |

S | 4088 | .409 |

## Examples of Ergodic Chains

The computation of the fixed vector \(\mat{w}\) may be difficult if the transition matrix is very large. It is sometimes useful to guess the fixed vector on purely intuitive grounds. Here is a simple example to illustrate this kind of situation.

[exam 11.3.3] A white rat is put into the maze of Figure [fig 11.4]. There are nine compartments with connections between the compartments as indicated. The rat moves through the compartments at random. That is, if there are \(k\) ways to leave a compartment, it chooses each of these with equal probability. We can represent the travels of the rat by a Markov chain process with transition matrix given by \[\mat {P} = \begingroup \m@th \@tempdima 8.75\p@ \setbox\z@\vbox{% \def\cr{\crcr\noalign{\kern 2\p@\global\let\cr\endline}}% \ialign{$##$\hfil\kern 2\p@\kern\@tempdima&\thinspace\hfil$##$\hfil &&\quad\hfil$##$\hfil\crcr \omit\strut\hfil\crcr\noalign{\kern-\snellbaselineskip}% & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 \cr 1 & 0 & 1/2 & 0 & 0 & 0 & 1/2 & 0 & 0 & 0 \cr 2 & 1/3 & 0 & 1/3 & 0 & 1/3 & 0 & 0 & 0 & 0 \cr 3 & 0 & 1/2 & 0 & 1/2 & 0 & 0 & 0 & 0 & 0 \cr 4 & 0 & 0 & 1/3 & 0 & 1/3 & 0 & 0 & 0 & 1/3 \cr 5 & 0 & 1/4 & 0 & 1/4 & 0 & 1/4 & 0 & 1/4 & 0 \cr 6 & 1/3 & 0 & 0 & 0 & 1/3 & 0 & 1/3 & 0 & 0 \cr 7 & 0 & 0 & 0 & 0 & 0 & 1/2 & 0 & 1/2 & 0 \cr 8 & 0 & 0 & 0 & 0 & 1/3 & 0 & 1/3 & 0 & 1/3 \cr 9 & 0 & 0 & 0 & 1/2 & 0 & 0 & 0 & 1/2 & 0\crcr\omit\strut\cr}}% \setbox\tw@\vbox{\unvcopy\z@\global\setbox\@ne\lastbox}% \setbox\tw@\hbox{\unhbox\@ne\unskip\global\setbox\@ne\lastbox}% \setbox\tw@\hbox{$\kern\wd\@ne\kern-\@tempdima\left(\kern-\wd\@ne \global\setbox\@ne\vbox{\box\@ne\kern 2\p@}% \vcenter{\kern-\ht\@ne\unvbox\z@\kern-\snellbaselineskip}\,\right)$}% \null\;\vbox{\kern\ht\@ne\box\tw@}\endgroup\ .\]

That this chain is not regular can be seen as follows: From an odd-numbered state the process can go only to an even-numbered state, and from an even-numbered state it can go only to an odd number. Hence, starting in state \(i\) the process will be alternately in even-numbered and odd-numbered states. Therefore, odd powers of \(\mat{P}\) will have 0’s for the odd-numbered entries in row 1. On the other hand, a glance at the maze shows that it is possible to go from every state to every other state, so that the chain is ergodic.

To find the fixed probability vector for this matrix, we would have to solve ten equations in nine unknowns. However, it would seem reasonable that the times spent in each compartment should, in the long run, be proportional to the number of entries to each compartment. Thus, we try the vector whose \(j\)th component is the number of entries to the \(j\)th compartment: \[\mat {x} = \pmatrix{ 2 & 3 & 2 & 3 & 4 & 3 & 2 & 3 & 2}\ .\] It is easy to check that this vector is indeed a fixed vector so that the unique probability vector is this vector normalized to have sum 1: \[\mat {w} = \pmatrix{ \frac1{12} & \frac18 & \frac1{12} & \frac18 & \frac16 & \frac18 & \frac 1{12} & \frac18 & \frac1{12} }\ .\]

[exam 11.3.4](Example [exam 11.1.6] continued) We recall the Ehrenfest urn model of Example [exam 11.1.6]. The transition matrix for this chain is as follows:

\[\mat {P} = \begingroup \m@th \@tempdima 8.75\p@ \setbox\z@\vbox{% \def\cr{\crcr\noalign{\kern 2\p@\global\let\cr\endline}}% \ialign{$##$\hfil\kern 2\p@\kern\@tempdima&\thinspace\hfil$##$\hfil &&\quad\hfil$##$\hfil\crcr \omit\strut\hfil\crcr\noalign{\kern-\snellbaselineskip}% & 0 & 1 & 2 & 3 & 4 \cr 0 & .000 & 1.000 & .000 & .000 & .000\cr 1 & .250 & .000 & .750 & .000 & .000\cr 2 & .000 & .500 & .000 & .500 & .000\cr 3 & .000 & .000 & .750 & .000 & .250\cr 4 & .000 & .000 & .000 &1.000 & .000\crcr\omit\strut\cr}}% \setbox\tw@\vbox{\unvcopy\z@\global\setbox\@ne\lastbox}% \setbox\tw@\hbox{\unhbox\@ne\unskip\global\setbox\@ne\lastbox}% \setbox\tw@\hbox{$\kern\wd\@ne\kern-\@tempdima\left(\kern-\wd\@ne \global\setbox\@ne\vbox{\box\@ne\kern 2\p@}% \vcenter{\kern-\ht\@ne\unvbox\z@\kern-\snellbaselineskip}\,\right)$}% \null\;\vbox{\kern\ht\@ne\box\tw@}\endgroup\ .\] If we run the program **FixedVector** for this chain, we obtain the vector \[\mat {w} = \begingroup \m@th \@tempdima 8.75\p@ \setbox\z@\vbox{% \def\cr{\crcr\noalign{\kern 2\p@\global\let\cr\endline}}% \ialign{$##$\hfil\kern 2\p@\kern\@tempdima&\thinspace\hfil$##$\hfil &&\quad\hfil$##$\hfil\crcr \omit\strut\hfil\crcr\noalign{\kern-\snellbaselineskip}% & 0 & 1 & 2 & 3 & 4\cr & .0625 &.2500 &.3750 &.2500 &.0625\crcr\omit\strut\cr}}% \setbox\tw@\vbox{\unvcopy\z@\global\setbox\@ne\lastbox}% \setbox\tw@\hbox{\unhbox\@ne\unskip\global\setbox\@ne\lastbox}% \setbox\tw@\hbox{$\kern\wd\@ne\kern-\@tempdima\left(\kern-\wd\@ne \global\setbox\@ne\vbox{\box\@ne\kern 2\p@}% \vcenter{\kern-\ht\@ne\unvbox\z@\kern-\snellbaselineskip}\,\right)$}% \null\;\vbox{\kern\ht\@ne\box\tw@}\endgroup\ .\]

By Theorem [thm 11.3.12], we can interpret these values for \(w_i\) as the proportion of times the process is in each of the states in the long run. For example, the proportion of times in state 0 is .0625 and the proportion of times in state 1 is .375. The astute reader will note that these numbers are the binomial distribution 1/16, 4/16, 6/16, 4/16, 1/16. We could have guessed this answer as follows: If we consider a particular ball, it simply moves randomly back and forth between the two urns. This suggests that the equilibrium state should be just as if we randomly distributed the four balls in the two urns. If we did this, the probability that there would be exactly \(j\) balls in one urn would be given by the binomial distribution \(b(n,p,j)\) with \(n = 4\) and \(p = 1/2\).

i[exer 11.3.1] Which of the following matrices are transition matrices for regular Markov chains?

- \(\mat {P} = \pmatrix{ .5 & .5 \cr .5 & .5 }\).
- \(\mat {P} = \pmatrix{ .5 & .5 \cr 1 & 0 }\).
- \(\mat {P} = \pmatrix{ 1/3 & 0 & 2/3 \cr 0 & 1 & 0 \cr 0 & 1/5 & 4/5}\).
- \(\mat {P} = \pmatrix{ 0 & 1 \cr 1 & 0}\).
- \(\mat {P} = \pmatrix{ 1/2 & 1/2 & 0 \cr 0 & 1/2 & 1/2 \cr 1/3 & 1/3 & 1/3}\).

i[exer 11.3.2] Consider the Markov chain with transition matrix \[\mat {P} = \pmatrix{ 1/2 & 1/3 & 1/6 \cr3/4 & 0 & 1/4 \cr 0 & 1 & 0}\ .\]

- Show that this is a regular Markov chain.
- The process is started in state 1; find the probability that it is in state 3 after two steps.
- Find the limiting probability vector \(\mat{w}\).

i[exer 11.3.3] Consider the Markov chain with general \(2 \times 2\) transition matrix \[\mat {P} = \pmatrix{ 1 - a & a \cr b & 1 - b}\ .\]

- Under what conditions is \(\mat{P}\) absorbing?
- Under what conditions is \(\mat{P}\) ergodic but not regular?
- Under what conditions is \(\mat{P}\) regular?

i[exer 11.3.4] Find the fixed probability vector \(\mat{w}\) for the matrices in Exercise [exer 11.3.3] that are ergodic.

i[exer 11.3.5] Find the fixed probability vector \(\mat{w}\) for each of the following regular matrices.

- \(\mat {P} = \pmatrix{ .75 & .25 \cr .5 & .5}\).
- \(\mat {P} = \pmatrix{ .9 & .1 \cr .1 & .9}\).
- \(\mat {P} = \pmatrix{ 3/4 & 1/4 & 0 \cr 0 & 2/3 & 1/3 \cr 1/4 & 1/4 & 1/2}\).

i[exer 11.3.6] Consider the Markov chain with transition matrix in Exercise [exer 11.3.3], with \(a = b = 1\). Show that this chain is ergodic but not regular. Find the fixed probability vector and interpret it. Show that \(\mat {P}^n\) does not tend to a limit, but that \[\mat {A}_n = \frac{\mat {I} + \mat {P} + \mat {P}^2 +\cdots + \mat {P}^n}{n + 1}\] does.

i[exer 11.3.7] Consider the Markov chain with transition matrix of Exercise [exer 11.3.3], with \(a = 0\) and \(b = 1/2\). Compute directly the unique fixed probability vector, and use your result to prove that the chain is not ergodic.

i[exer 11.3.8] Show that the matrix \[\mat {P} = \pmatrix{ 1 & 0 & 0 \cr 1/4 & 1/2 & 1/4 \cr 0 & 0 & 1}\] has more than one fixed probability vector. Find the matrix that \(\mat {P}^n\) approaches as \(n \to \infty\), and verify that it is not a matrix all of whose rows are the same.

i[exer 11.3.9] Prove that, if a 3-by-3 transition matrix has the property that its sums are 1, then \((1/3, 1/3, 1/3)\) is a fixed probability vector. State a similar result for \(n\)-by-\(n\) transition matrices. Interpret these results for ergodic chains.

i[exer 11.3.10] Is the Markov chain in Example [exam 11.1.8] ergodic?

i[exer 11.3.10.5] Is the Markov chain in Example [exam 11.1.9] ergodic?

i[exer 11.3.11] Consider Example [exam 11.2.1] (Drunkard’s Walk). Assume that if the walker reaches state 0, he turns around and returns to state 1 on the next step and, similarly, if he reaches 4 he returns on the next step to state 3. Is this new chain ergodic? Is it regular?

i[exer 11.3.12] For Example [exam 11.1.2] when \(\mat{P}\) is ergodic, what is the proportion of people who are told that the President will run? Interpret the fact that this proportion is independent of the starting state.

i[exer 11.3.13] Consider an independent trials process to be a Markov chain whose states are the possible outcomes of the individual trials. What is its fixed probability vector? Is the chain always regular? Illustrate this for Example [exam 11.1.3].

i[exer 11.3.14] Show that Example [exam 11.1.6] is an ergodic chain, but not a regular chain. Show that its fixed probability vector \(\mat{w}\) is a binomial distribution.

i[exer 11.3.15] Show that Example [exam 11.1.7] is regular and find the limiting vector.

i[exer 11.3.16] Toss a fair die repeatedly. Let \(S_n\) denote the total of the outcomes through the \(n\)th toss. Show that there is a limiting value for the proportion of the first \(n\) values of \(S_n\) that are divisible by 7, and compute the value for this limit. : The desired limit is an equilibrium probability vector for an appropriate seven state Markov chain.

i[exer 11.3.17] Let \(\mat{P}\) be the transition matrix of a regular Markov chain. Assume that there are \(r\) states and let \(N(r)\) be the smallest integer \(n\) such that \(\mat{P}\) is regular if and only if \(\mat {P}^{N(r)}\) has no zero entries. Find a finite upper bound for \(N(r)\). See if you can determine \(N(3)\) exactly.

[exer 11.3.18] Define \(f(r)\) to be the smallest integer \(n\) such that for all regular Markov chains with \(r\) states, the \(n\)th power of the transition matrix has all entries positive. It has been shown,^{14} that \(f(r) = r^2 - 2r + 2\).

- Define the transition matrix of an \(r\)-state Markov chain as follows: For states \(s_i\), with \(i = 1\), 2, …, \(r - 2\), \(\mat {P}(i,i + 1) = 1\), \(\mat {P}(r - 1,r) = \mat {P}(r - 1, 1) = 1/2\), and \(\mat {P}(r,1) = 1\). Show that this is a regular Markov chain.
- For \(r = 3\), verify that the fifth power is the first power that has no zeros.
- Show that, for general \(r\), the smallest \(n\) such that \(\mat {P}^n\) has all entries positive is \(n = f(r)\).

i[exer 11.3.19] A discrete time queueing system of capacity \(n\) consists of the person being served and those waiting to be served. The queue length \(x\) is observed each second. If \(0 < x < n\), then with probability \(p\), the queue size is increased by one by an arrival and, inependently, with probability \(r\), it is decreased by one because the person being served finishes service. If \(x = 0\), only an arrival (with probability \(p\)) is possible. If \(x = n\), an arrival will depart without waiting for service, and so only the departure (with probability \(r\)) of the person being served is possible. Form a Markov chain with states given by the number of customers in the queue. Modify the program **FixedVector** so that you can input \(n\), \(p\), and \(r\), and the program will construct the transition matrix and compute the fixed vector. The quantity \(s = p/r\) is called the Describe the differences in the fixed vectors according as \(s < 1\), \(s = 1\), or \(s > 1\).

i[exer 11.3.20] Write a computer program to simulate the queue in Exercise [exer 11.3.19]. Have your program keep track of the proportion of the time that the queue length is \(j\) for \(j = 0\), 1, …, \(n\) and the average queue length. Show that the behavior of the queue length is very different depending upon whether the traffic intensity \(s\) has the property \(s < 1\), \(s = 1\), or \(s > 1\).

i[exer 11.3.21] In the queueing problem of Exercise [exer 11.3.19], let \(S\) be the total service time required by a customer and \(T\) the time between arrivals of the customers.

- Show that \(P(S = j) = (1 - r)^{j - 1}r\) and \(P(T = j) = (1 - p)^{j - 1}p\), for \(j > 0\).
- Show that \(E(S) = 1/r\) and \(E(T) = 1/p\).
- Interpret the conditions \(s < 1\), \(s = 1\) and \(s > 1\) in terms of these expected values.

i[exer 11.3.22] In Exercise [exer 11.3.19] the service time \(S\) has a geometric distribution with \(E(S) = 1/r\). Assume that the service time is, instead, a constant time of \(t\) seconds. Modify your computer program of Exercise [exer 11.3.20] so that it simulates a constant time service distribution. Compare the average queue length for the two types of distributions when they have the same expected service time (i.e., take \(t = 1/r\)). Which distribution leads to the longer queues on the average?

i[exer 11.3.23] A certain experiment is believed to be described by a two-state Markov chain with the transition matrix \(\mat{P}\), where \[\mat {P} = \pmatrix{ .5 & .5 \cr p & 1 - p}\] and the parameter \(p\) is not known. When the experiment is performed many times, the chain ends in state one approximately 20 percent of the time and in state two approximately 80 percent of the time. Compute a sensible estimate for the unknown parameter \(p\) and explain how you found it.

i[exer 11.3.24] Prove that, in an \(r\)-state ergodic chain, it is possible to go from any state to any other state in at most \(r - 1\) steps.

i[exer 11.3.25] Let \(\mat{P}\) be the transition matrix of an \(r\)-state ergodic chain. Prove that, if the diagonal entries \(p_{ii}\) are positive, then the chain is regular.

i[exer 11.3.26] Prove that if \(\mat{P}\) is the transition matrix of an ergodic chain, then \((1/2)(\mat {I} + \mat {P})\) is the transition matrix of a regular chain. : Use Exercise [exer 11.3.25].

i[exer 11.3.27] Prove that \(\mat{P}\) and \((1/2)(\mat {I} + \mat {P})\) have the same fixed vectors.

i[exer 11.3.28] In his book, ^{15} A. Engle proposes an algorithm for finding the fixed vector for an ergodic Markov chain when the transition probabilities are rational numbers. Here is his algorithm: For each state \(i\), let \(a_i\) be the least common multiple of the denominators of the non-zero entries in the \(i\)th row. Engle describes his algorithm in terms of moving chips around on the states—indeed, for small examples, he recommends implementing the algorithm this way. Start by putting \(a_i\) chips on state \(i\) for all \(i\). Then, at each state, redistribute the \(a_i\) chips, sending \(a_i p_{ij}\) to state \(j\). The number of chips at state \(i\) after this redistribution need not be a multiple of \(a_i\). For each state \(i\), add just enough chips to bring the number of chips at state \(i\) up to a multiple of \(a_i\). Then redistribute the chips in the same manner. This process will eventually reach a point where the number of chips at each state, after the redistribution, is the same as before redistribution. At this point, we have found a fixed vector. Here is an example: \[\mat {P} = \begingroup \m@th \@tempdima 8.75\p@ \setbox\z@\vbox{% \def\cr{\crcr\noalign{\kern 2\p@\global\let\cr\endline}}% \ialign{$##$\hfil\kern 2\p@\kern\@tempdima&\thinspace\hfil$##$\hfil &&\quad\hfil$##$\hfil\crcr \omit\strut\hfil\crcr\noalign{\kern-\snellbaselineskip}% & 1 & 2 & 3 \cr 1 & 1/2 & 1/4 & 1/4 \cr 2 & 1/2 & 0 & 1/2 \cr 3 & 1/2 & 1/4 & 1/4\crcr\omit\strut\cr}}% \setbox\tw@\vbox{\unvcopy\z@\global\setbox\@ne\lastbox}% \setbox\tw@\hbox{\unhbox\@ne\unskip\global\setbox\@ne\lastbox}% \setbox\tw@\hbox{$\kern\wd\@ne\kern-\@tempdima\left(\kern-\wd\@ne \global\setbox\@ne\vbox{\box\@ne\kern 2\p@}% \vcenter{\kern-\ht\@ne\unvbox\z@\kern-\snellbaselineskip}\,\right)$}% \null\;\vbox{\kern\ht\@ne\box\tw@}\endgroup\ .\] We start with \(\mat {a} = (4,2,4)\). The chips after successive redistributions are shown in Table [table 11.4].

\[\begin{array}{lll} (4 & 2\;\; & 4)\\ (5 & 2 & 3)\\ (8 & 2 & 4)\\ (7 & 3 & 4)\\ (8 & 4 & 4)\\ (8 & 3 & 5)\\ (8 & 4 & 8)\\ (10 & 4 & 6)\\ (12 & 4 & 8)\\ (12 & 5 & 7)\\ (12 & 6 & 8)\\ (13 & 5 & 8)\\ (16 & 6 & 8)\\ (15 & 6 & 9)\\ (16 & 6 & 12)\\ (17 & 7 & 10)\\ (20 & 8 & 12)\\ (20 & 8 & 12)\ . \end{array}\]

We find that \(\mat {a} = (20,8,12)\) is a fixed vector.

- Write a computer program to implement this algorithm.
- Prove that the algorithm will stop. : Let \(\mat{b}\) be a vector with integer components that is a fixed vector for \(\mat{P}\) and such that each coordinate of the starting vector \(\mat{a}\) is less than or equal to the corresponding component of \(\mat{b}\). Show that, in the iteration, the components of the vectors are always increasing, and always less than or equal to the corresponding component of \(\mat{b}\).

i[exer 11.3.29] (Coffman, Kaduta, and Shepp^{16}) A computing center keeps information on a tape in positions of unit length. During each time unit there is one request to occupy a unit of tape. When this arrives the first free unit is used. Also, during each second, each of the units that are occupied is vacated with probability \(p\). Simulate this process, starting with an empty tape. Estimate the expected number of sites occupied for a given value of \(p\). If \(p\) is small, can you choose the tape long enough so that there is a small probability that a new job will have to be turned away (i.e., that all the sites are occupied)? Form a Markov chain with states the number of sites occupied. Modify the program **FixedVector** to compute the fixed vector. Use this to check your conjecture by simulation.

[exer 11.3.30] (Alternate proof of Theorem [thm 11.3.8]) Let \(\mat{P}\) be the transition matrix of an ergodic Markov chain. Let \(\mat{x}\) be any column vector such that \(\mat{P} \mat{x} = \mat{ x}\). Let \(M\) be the maximum value of the components of \(\mat{x}\). Assume that \(x_i = M\). Show that if \(p_{ij} > 0\) then \(x_j = M\). Use this to prove that \(\mat{x}\) must be a constant vector.

i[exer 11.3.31] Let \(\mat{P}\) be the transition matrix of an ergodic Markov chain. Let \(\mat{w}\) be a fixed probability vector (i.e., \(\mat{w}\) is a row vector with \(\mat {w}\mat {P} = \mat {w}\)). Show that if \(w_i = 0\) and \(p_{ji} > 0\) then \(w_j = 0\). Use this to show that the fixed probability vector for an ergodic chain cannot have any 0 entries.

i[exer 11.3.32] Find a Markov chain that is neither absorbing or ergodic.