# 16.16: Transition Matrices and Generators of Continuous-Time Chains

- Page ID
- 10389

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)## 16. Transition Matrices and Generators of Continuous-Time Chains

### Preliminaries

This is the second of the three introductory sections on continuous-time Markov chains. Thus, suppose that \( \bs{X} = \{X_t: t \in [0, \infty)\} \) is a continuous-time Markov chain defined on an underlying probability space \( (\Omega, \mathscr{F}, \P) \) and with state space \( (S, \mathscr{S}) \). By the very meaning of *Markov chain*, the set of states \( S \) is countable and the \( \sigma \)-algebra \( \mathscr{S} \) is the collection of all subsets of \( S \). So every subset of \( S \) is measurable, as is every function from \( S \) to another measurable space. Recall that \( \mathscr{S} \) is also the Borel \( \sigma \) algebra corresponding to the discrete topology on \( S \). With this topology, every function from \( S \) to another topological space is continuous. Counting measure \( \# \) is the natural measure on \( (S, \mathscr{S}) \), so in the context of the general introduction, integrals over \( S \) are simply sums. Also, kernels on \( S \) can be thought of as matrices, with rows and sums indexed by \( S \). The left and right kernel operations are generalizations of matrix multiplication.

A space of functions on \( S \) plays an important role. Let \( \mathscr{B} \) denote the collection of bounded functions \( f: S \to \R \). With the usual pointwise definitions of addition and scalar multiplication, \( \mathscr{B} \) is a vector space. The supremum norm on \( \mathscr{B} \) is given by \[ \|f\| = \sup\{\left|f(x)\right|: x \in S\}, \quad f \in \mathscr{B} \] Of course, if \( S \) is finite, \( \mathscr{B} \) is the set of all real-valued functions on \( S \), and \( \|f\| = \max\{\left|f(x)\right|: x \in S\}\) for \(f \in \mathscr{B} \).

In the last section, we studied \( \bs{X} \) in terms of *when* and *how* the state changes. To review briefly, let \( \tau = \inf\{t \in (0, \infty): X_t \ne X_0\} \). Assuming that \( \bs{X} \) is right continuous, the Markov property of \( \bs{X} \) implies the memoryless property of \( \tau \), and hence the distribution of \( \tau \) given \( X_0 = x \) is exponential with parameter \( \lambda(x) \in [0, \infty) \) for each \( x \in S \). The assumption of right continuity rules out the pathological possibility that \( \lambda(x) = \infty \), which would mean that \( x \) is an instantaneous state so that \( \P(\tau = 0 \mid X_0 = x) = 1 \). On the other hand, if \( \lambda(x) \in (0, \infty) \) then \( x \) is a stable state, so that \( \tau \) has a proper exponential distribution given \( X_0 = x \) with \( \P(0 \lt \tau \lt \infty \mid X_0 = x) = 1 \). Finally, if \( \lambda(x) = 0 \) then \( x \) is an absorbing state, so that \( \P(\tau = \infty \mid X_0 = x) = 1 \). Next we define a sequence of stopping times: First \( \tau_0 = 0 \) and \( \tau_1 = \tau\). Recursively, if \( \tau_n \lt \infty \) then \( \tau_n = \inf\left\{t \gt \tau_n: X_t \ne X_{\tau_n}\right\} \), while if \( \tau_n = \infty \) then \( \tau_{n+1} = \infty \). With \( M = \sup\{n \in \N: \tau_n \lt \infty\} \) we define \( Y_n = X_{\tau_n} \) if \( n \in \N \) with \( n \le M \) and \( Y_n = Y_M \) if \( n \in \N \) with \( n \gt M \). The sequence \( \bs{Y} = (Y_0, Y_1, \ldots) \) is a discrete-time Markov chain on \( S \) with one-step transition matrix \( Q \) given by \(Q(x, y) = \P(X_\tau = y \mid X_0 = x)\) if \( x, \, y \in S \) with \( x \) stable, and \( Q(x, x) = 1\) if \( x \in S \) is absorbing. Assuming that \( \bs{X} \) is regular, which means that \( \tau_n \to \infty \) as \( n \to \infty \) with probability 1 (ruling out the explosion event of infinitely many transitions in finite time), the structure of \( \bs{X} \) is completely determined by the sequence of stopping times \( \bs{\tau} = (\tau_0, \tau_1, \ldots) \) and the discrete-time jump chain \( \bs{Y} = (Y_0, Y_1, \ldots) \). Analytically, the distribution \( \bs{X} \) is determined by the exponential parameter function \( \lambda \) and the one-step transition matrix \( Q \) of the jump chain.

In this section, we sill study the Markov chain \( \bs{X} \) in terms of the transition matrices in continuous time and a fundamentally important matrix known as the generator. Naturally, the connections between the two points of view are particularly interesting.

### The Transition Semigroup

#### Definition and basic Properties

The first part of our discussion is very similar to the treatment for a general Markov processes, except for simplifications caused by the discrete state space. We assume that \( \bs{X} = \{X_t: t \in [0, \infty)\} \) is a Markov chain on \( S \).

The transition probability matrix \( P_t \) of \( \bs{X} \) corresponding to \( t \in [0, \infty) \) is \[ P_t(x, y) = \P(X_t = y \mid X_0 = x), \quad (x, y) \in S^2 \] In particular, \( P_0 = I \), the identity matrix on \( S \)

## Proof

The mapping \( y \mapsto P_t(x, y) \) is the PDF of \( X_t \) given \( X_0 = x \). Hence \( P_t \) is a probability matrix. That is, \( P_t(x, y) \ge 0 \) for \( (x, y) \in S^2 \) and \( \sum_{y \in S} P_t(x, y) = 1 \) for \( x \in S \). Trivially, \( P_0 = I \) by definition.

Note that since we are assuming that the Markov chain is homogeneous, \[ P_t(x, y) = \P(X_{s + t} = y \mid X_s = x), \quad (x, y) \in S^2 \] for every \( s, \, t \in [0, \infty) \). The Chapman-Kolmogorov equation given next is essentially yet another restatement of the Markov property. The equation is named for Andrei Kolmogorov and Sydney Chapman,

Suppose that \( \bs{P} = \{P_t: t \in [0, \infty)\} \) is the collection of transition matrices for the chain \( \bs{X} \). Then \( P_s P_t = P_{s+t} \) for \( s, \, t \in [0, \infty) \). Explicitly, \[ P_{s+t}(x, z) = \sum_{y \in S} P_s(x, y) P_t(y, z), \quad x, \, z \in S \]

## Proof

We condition on \( X_s \). \[ P_{s+t}(x, z) = \P(X_{s + t} = z \mid X_0 = x) = \sum_{y \in S} \P(X_{s+t} = z \mid X_s = y, X_0 = x) \P(X_s = y \mid X_0 = x) \] But by the Markov and time homogeneous properties, \[ \P(X_{s+t} = z \mid X_s = y, X_0 = x) = \P(X_{s+t} = z \mid X_s = y) = P_t(y, z) \] Of course by definition, \( \P(X_s = y \mid X_0 = x) = P_s(x, y) \). So the first displayed equation above becomes \[ P_{s+t}(x, y) = \sum_{y \in S} P_s(x, y) P_t(y, z) = P_s P_t(x, z) \]

Restated in another form of jargon, the collection \( \bs{P} = \{P_t: t \in [0, \infty)\} \) is a semigroup of probability matrices. The semigroup of transition matrices \( \bs{P}\), along with the initial distribution, determine the finite-dimensional distributions of \( \bs{X} \).

Suppose that \( X_0 \) has probability density function \( f \). If \( (t_1, t_2, \ldots, t_n) \in [0, \infty)^n \) is a time sequence with \( 0 \lt t_1 \lt \cdots \lt t_n \) and \( (x_0, x_1, \ldots, x_n) \in S^{n+1} \) is a state sequence, then \[ \P\left(X_0 = x_0, X_{t_1} = x_1, \ldots X_{t_n} = x_n\right) = f(x_0) P_{t_1}(x_0, x_1) P_{t_2 - t_1}(x_1, x_2) \cdots P_{t_n - t_{n-1}}(x_{n-1}, x_n) \]

## Proof

To simplify the notation, we will just give the cases \( n = 1 \) and \( n = 2 \), which capture the essence of the proof. First suppose \( x, \, y \in S \) and \( t \in [0, \infty) \). Then \[ \P(X_0 = x, X_t = y) = \P(X_0 = x) \P(X_t = y \mid X_0 = x) = f(x) P_t(x, y) \] Next suppose that \( x, \, y, \, z \in S \) and \( s, \, t \in [0, \infty) \) with \( s \lt t \). Then \[ \P(X_0 = x, X_s = y, X_t = z) = \P(X_t = z \mid X_0 = x, X_s = y) \P(X_0 = x, X_s = y) \] But by the Markov and time homogeneous properties, \( \P(X_t = z \mid X_0 = x, X_s = y) = P_{t - s}(y, z) \). By the \( n = 1 \) case, \( \P(X_0 = x, X_s = y) = f(x) P_s(x, y) \). Hence \[ \P(X_0 = x, X_s = y, X_t = z) = f(x) P_s(x, y) P_{t-s}(y, z) \]

As with any matrix on \( S \), the transition matrices define left and right operations on functions which are generalizations of matrix multiplication. For a transition matrix, both have natural interpretations.

Suppose that \( f: S \to \R \), and that either \( f \) is nonnegative or \( f \in \mathscr{B} \). Then for \( t \in [0, \infty) \), \[ P_t f(x) = \sum_{y \in S} P_t(x, y) f(y) = \E[f(X_t) \mid X_0 = x], \quad x \in S \] The mapping \( f \mapsto P_t f \) is a bounded, linear operator on \( \mathscr{B} \) and \( \|P_t\| = 1 \).

## Proof

Since \( P_t(x, \cdot) \) is the conditional probability density function of \( X_t \) given \( X_0 = x \), it follows that \( P_t f(x) = \E[f(X_t) \mid X_0 = x] \). The statement about \( f \mapsto P_t f \) follows from general results on probability kernels.

If \( f \) is nonnegative and \( S \) is infinte, then it's possible that \( P_t f(x) = \infty \). In general, the left operation of a positive kernel acts on positive measures on the state space. In the setting here, if \( \mu \) is a positive (Borel) measure on \( (S, \mathscr{S}) \), then the function \( f: S \to [0, \infty) \) given by \( f(x) = \mu\{x\} \) for \( x \in S \) is the density function of \( \mu \) with respect to counting measure \( \# \) on \( (S, \mathscr{S}) \). This simply means that \( \mu(A) = \sum_{x \in A} f(x) \) for \( A \subseteq S \). Conversely, given \( f: S \to [0, \infty) \), the set function \( \mu(A) = \sum_{x \in A} f(x) \) for \( A \subseteq S \) defines a positive measure on \( (S, \mathscr{S}) \) with \( f \) as its density function. So for the left operation of \( P_t \), it's natural to consider only nonnegative functions.

If \( f: S \to [0, \infty) \) then \[ f P_t(y) = \sum_{x \in S} f(x) P_t(x, y), \quad y \in S\] If \( X_0 \) has probability density function \( f \) then \( X_t \) has probability density function \( f P_t \).

## Proof

If \( X_0 \) has PDF \( f \), then conditioning gives \[ \P(X_t = y) = \sum_{x \in S} \P(X_t = y \mid X_0 = x) \P(X_0 = x) = \sum_{x \in S} P_t(x, y) f(x) = f P_t(x), \quad y \in S \]

More generally, if \( f \) is the density function of a positive measure \( \mu \) on \( (S, \mathscr{S}) \) then \( f P_t \) is the density function of the measure \( \mu P_t \), defined by \[ \mu P_t(A) = \sum_{x \in S} \mu\{x\} P_t(x, A) = \sum_{x \in S} f(x) P_t(x, A), \quad A \subseteq S \]

A function \( f : S \to [0, \infty) \) is invariant for the Markov chain \(\bs{X}\) (or for the transition semigroup \( \bs{P} \)) if \( f P_t = f \) for every \( t \in [0, \infty) \).

It follows that if \( X_0 \) has an invariant probability density function \( f \), then \( X_t \) has probability density function \( f \) for every \( t \in [0, \infty) \), so \( \bs{X} \) is identically distributed. Invariant and limiting distributions are fundamentally important for continuous-time Markov chains.

#### Standard Semigroups

Suppose again that \( \bs{X} = \{X_t: t \in [0, \infty)\} \) is a Markov chain on \( S \) with transition semigroup \( \bs{P} = \{P_t: t \in [0, \infty)\} \). Once again, continuity assumptions need to be imposed on \( \bs{X} \) in order to rule out strange behavior that would otherwise greatly complicate the theory. In terms of the transition semigroup \( \bs{P} \), here is the basic assumption:

The transition semigroup \( \bs{P} \) is standard if \( P_t(x, x) \to 1 \) as \( t \downarrow 0 \) for each \( x \in S \).

Since \( P_0(x, x) = 1 \) for \( x \in S \), the standard assumption is clearly a continuity assumption. It actually implies much stronger smoothness properties that we will build up by stages.

If the transition semigroup \( \bs{P} = \{P_t: t \in [0, \infty)\} \) is standard, then the function \( t \mapsto P_t(x, y) \) is right continuous for each \( (x, y) \in S^2 \).

## Proof

First note that if \( (x, y) \in S^2 \) with \( x \ne y \) then \( P_h(x, y) \le 1 - P_h(x, x) \to 0 \) as \( h \downarrow 0 \). Hence \( P_h(x, y) \to I(x, y) \) as \( h \downarrow 0 \) for all \( (x, y) \in S^2 \). Suppose next that \( t \in (0, \infty) \) and \( (x, y) \in S^2 \). By the semigroup property, \[ P_{t+h}(x, y) = P_t P_h(x, y) = \sum_{z \in S} P_t(x, z) P_h(z, y) \] But \( P_h(z, y) \to I(z, y) \) as \( h \downarrow 0 \) so by the bounded convergence theorem, \( P_{t+h}(x, y) \to P_t(x, y) \) as \( h \downarrow 0 \).

Our next result connects one of the basic assumptions in the section on transition times and the embedded chain with the standard assumption here.

If the Markov chain \( \bs{X} \) has no instantaneous states then the transition semigroup \( \bs{P}\) is standard.

## Proof

Given \( X_0 = x \in S \) note that \( \tau \gt t \) implies \( X_t = x \). Hence \[ P_t(x, x) = \P(X_t = x \mid X_0 = x) \ge \P(\tau \gt t \mid X_0 = x) = e^{-\lambda(x) t} \] Since \( \bs{X} \) has no instantaneous states, \( 0 \le \lambda(x) \lt \infty\) so \( e^{-\lambda(x) t} \to 1 \) as \( t \downarrow 0 \).

Recall that the non-existence of instantaneous states is essentially equivalent to the right continuity of \( \bs{X} \). So we have the nice result that if \( \bs{X} \) is right continuous, then so is \( \bs{P} \). For the remainder of our discussion, we assume that \( \bs{X} = \{X_t: t \in [0, \infty)\} \) is a regular Markov chain on \( S \) with transition semigroup \( \bs{P} = \{P_t: t \in [0, \infty)\} \), exponential function \( \lambda \) and one-step transition matrix \( Q \) for the jump chain. Our next result is the fundamental integral equations relating \( \bs{P} \), \( \lambda \), and \( Q \).

For \( t \in [0, \infty) \), \[ P_t(x, y) = I(x, y) e^{-\lambda(x) t} + \int_0^t \lambda(x) e^{-\lambda(x) s} Q P_{t - s} (x, y) \, ds, \quad (x, y) \in S^2 \]

## Proof

If \( x \) is an absorbing state, then the equation trivially holds, since \( \lambda(x) = 0 \) and \( P_t(x, y) = I(x, y) \). So suppose that \( x \) is a stable state, and as above, let \( \tau = \inf\{t \in [0, \infty): X_t \ne X_0\} \). Given \( X_0 = x \), \( \tau \) has a proper exponential distribution with parameter \( \lambda(x) \in (0, \infty) \). Taking cases, \[ P_t(x, y) = \P(X_t = y \mid X_0 = x) = \P(X_t = y, \tau \gt t \mid X_0 = x) + \P(X_t = y, \tau \le t \mid X_0 = x) \] The first term on the right is 0 if \( y \ne x \) and is \( \P(\tau \gt t \mid X_0 = x) = e^{-\lambda(x) t} \) if \( y = x \). In short, \[ \P(X_t = y, \tau \gt t \mid X_0 = x) = I(x, y) e^{-\lambda(x)s} \] For the second term on the right in the displayed equation, we condition on \( \tau \) and \( Y_1 = X_\tau \). By a result in the last section on transition times and the embedded chain, the joint PDF of \( (\tau, Y_1) \) at \( s \in [0, \infty) \) and \( z \in S \), given \( X_0 = x \), is \( \lambda(x) e^{-\lambda(x) s} Q(x, z) \) (continuous in time, discrete in space). Also, given \(\tau = s \in [0, t] \) and \( Y_1 = z \in S \), we can use the strong Markov property to restart the clock

at \( s \) giving \[ \P(X_t = y \mid X_0 = x, \tau = s, Y_1 = z) = \P(X_{t-s} = y \mid X_0 = z) = P_{t-s}(z, y) \] Putting the pieces together we have \[ \P(X_t = y, \tau \le t \mid X_0 = x) = \int_0^t \lambda(x) e^{-\lambda(x) s} \sum_{z \in S} Q(x, z) P_{t-s}(z, y) \, ds = \int_0^t \lambda(x) e^{-\lambda(x) s} QP_{t - s} (x, y) \, ds\]

We can now improve on the continuity result that we got earlier. First recall the leads to relation for the jump chain \( \bs{Y} \): For \( (x, y) \in S^2 \), \( x \) leads to \( y \) if \( Q^n(x, y) \gt 0 \) for some \( n \in \N \). So by definition, \( x \) leads to \( x \) for each \( x \in S \), and for \( (x, y) \in S^2 \) with \( x \ne y \), \( x \) leads to \( y \) if and only if the discrete-time chain starting in \( x \) eventually reaches \( y \) with positive probability.

For \( (x, y) \in S^2 \),

- \( t \mapsto P_t(x, y) \) is continuous.
- If \( x \) leads to \( y \) then \( P_t(x, y) \gt 0 \) for every \( t \in (0, \infty) \).
- If \( x \) does not lead to \( y \) then \( P_t(x, y) = 0 \) for every \( t \in (0, \infty) \).

For \( t \in [0, \infty) \), we can use the change of variables \( r = t - s \) in the fundamental integral equation to get \[ P_t(x, y) = I(x, y) e^{-\lambda(x) t} + \lambda(x) e^{-\lambda(x) t} \int_0^t e^{\lambda(x) r} Q P_r (x, y) \, dr, \quad (x, y) \in S^2 \]

## Proof

- In the displayed equation, \( r \mapsto P_r(x, y) \) is right continuous for every \( (x, y) \in S^2 \), and hence by the bounded convergence theorem again, so is \( r \mapsto QP_r(x, y) \). Since the integrand in the displayed equation is bounded and right continuous, the integral is a continuous function of \( t \). Hence \( t \mapsto P_t(x, y) \) is continuous for \( (x, y) \in S^2 \).
- For \( x \in S \), note that \( P_t(x, x) \ge e^{-\lambda(x) t} \gt 0 \) for \( t \in [0, \infty) \). If \( x \) leads to \( y \) and \( x \ne y \) then there exists \( n \in \N_+ \) and \( (x_1, x_2, \ldots, x_{n-1}) \in S^{n-1} \) such that \( Q(x, x_1) \gt 0, \, \ldots Q(x_{n-1}, y) \gt 0\). Then \[ P_t(x, y) = \P(X_t = y \mid X_0 = x) \ge \P(Y_1 = x_1, \ldots, Y_{n-1} = x_{n-1}, Y_n = y, \tau_n \le t \lt \tau_{n+1}) \gt 0 \]
- This is clear from the definition of the embedded chain \( \bs{Y} \).

Parts (b) and (c) are known as the Lévy dichotomy, named for Paul Lévy. It's possible to prove the Lévy dichotomy just from the semigroup property of \( \bs{P} \), but this proof is considerably more complicated. In light of the dichotomy, the *leads to* relation clearly makes sense for the continuous-time chain \( \bs{X} \) as well as the discrete-time embedded chain \( \bs{Y} \).

### The Generator Matrix

#### Definition and Basic Properties

In this discussion, we assume again that \( \bs{X} = \{X_t: t \in [0, \infty)\} \) is a regular Markov chain on \( S \) with transition semigroup \( \bs{P} = \{P_t: t \in [0, \infty)\} \), exponential parameter function \( \bs{\lambda} \) and one-step transition matrix \( Q \) for the embedded jump chain. The fundamental integral equation above now implies that the transition probability matrix \( P_t \) is differentiable in \( t \). The derivative at \( 0 \) is particularly important.

The matrix function \( t \mapsto P_t \) has a (right) derivative at 0: \[ \frac{P_t - I}{t} \to G \text { as } t \downarrow 0 \] where the infinitesimal generator matrix \( G \) is given by \( G(x, y) = -\lambda(x) I(x, y) + \lambda(x) Q(x, y) \) for \( (x, y) \in S^2 \).

## Proof

As before the change of variables \( r = t - s \) in the fundamental integral equation gives \[ P_t(x, y) = I(x, y) e^{-\lambda(x) t} + \lambda(x) e^{-\lambda(x) t} \int_0^t e^{\lambda(x) r} Q P_r (x, y) \, dr \] The first term is clearly differentiable in \( t \), and the second term is also differentiable in \( t \) since we now know that the integrand is a continuous function of \( r \). The result then follows from standard calculus.

Note that \( \lambda(x) Q(x, x) = 0 \) for every \( x \in S \), since \( \lambda(x) = 0 \) is \( x \) is absorbing, while \( Q(x, x) = 0 \) if \( x \) is stable. So \( G(x, x) = -\lambda(x) \) for \( x \in S \), and \( G(x, y) = \lambda(x) Q(x, y) \) for \( (x, y) \in S^2 \) with \( y \ne x \). Thus, the generator matrix \( G \) determines the exponential parameter function \( \lambda \) and the jump transition matrix \( Q \), and thus determines the distribution of the Markov chain \( \bs{X} \).

Given the generator matrix \( G \) of \( \bs{X} \),

- \( \lambda(x) = -G(x, x) \) for \( x \in S \)
- \( Q(x, y) = - G(x, y) \big/ G(x, x)\) if \( x \in S \) is stable and \( y \in S - \{x\} \)

The infinitesimal generator has a nice interpretation in terms of our discussion in the last section. Recall that when the chain first enters a stable state \( x \), we set independent, exponentially distributed timers

on (x, y), for each \( y \in S - \{x\} \). Note that \( G(x, y) \) is the exponential parameter for the timer on \( (x, y) \). As soon as an alarm sounds for a particular \( (x, y) \), the chain moves to state \( y \) and the process continues.

The generator matrix \( G \) satisfies the following properties for every \( x \in S \):

- \( G(x, x) \le 0 \)
- \( \sum_{y \in S} G(x, y) = 0 \)

The matrix function \( t \mapsto P_t \) is differentiable on \( [0, \infty) \), and satisfies the Kolmogorov backward equation: \( P^\prime_t = G P_t \). Explicitly, \[ P^\prime_t(x, y) = -\lambda(x) P_t(x, y) + \sum_{z \in S} \lambda(x) Q(x, z) P_t(z, y), \quad (x, y) \in S^2 \]

## Proof

The proof is just like before, and follows from standard calculus and the integral equation \[ P_t(x, y) = I(x, y) e^{-\lambda(x) t} + \lambda(x) e^{-\lambda(x) t} \int_0^t e^{\lambda(x) r} Q P_r (x, y) \, dr \]

The backward equation is named for Andrei Kolmogorov. In continuous time, the transition semigroup \( \bs{P} = \{P_t: t \in [0, \infty)\} \) can be obtained from the single, generator matrix \( G \) in a way that is reminiscent of the fact that in discrete time, the transition semigroup \( \bs{P} = \{P^n: n \in \N\} \) can be obtained from the single, one-step matrix \( P \). From a modeling point of view, we often start with the generator matrix \( G \) and then solve the the backward equation, subject to the initial condition \( P_0 = I \), to obtain the semigroup of transition matrices \( \bs{P} \).

As with any matrix on \( S \), the generator matrix \( G \) defines left and right operations on functions that are analogous to ordinary matrix multiplication. The right operation is defined for functions in \( \mathscr{B} \).

If \( f \in \mathscr{B} \) then \( Gf \) is given by \[ G f(x) = -\lambda(x) f(x) + \sum_{y \in S} \lambda(x) Q(x, y) f(y), \quad x \in S \]

## Proof

By definition, \[ G f(x) = \sum_{y \in S} G(x, y) f(y) = -\lambda(x) f(x) + \sum_{y \in S - \{x\}} \lambda(x) Q(x, y) f(y) \] In the second term, we can sum over all \( y \in S \) since \( \lambda(x) = 0 \) if \( x \) is absorbing and \( Q(x, x) = 0 \) if \( x \) is stable. Note that \( G f \) is well defined since \[ \sum_{y \in S-\{x\}} \lambda(x) Q(x, y) \left|f(x)\right| \le \sum_{y \in S-\{x\}} \lambda(x) Q(x, y) \|f\| = \lambda(x) \|f\| \]

But note that \( G f \) is not in \( \mathscr{B} \) unless \( \lambda \in \mathscr{B} \). Without this additional assumption, \( G \) is a linear operator from the vector space \( \mathscr{B} \) of bounded functions from \( S \) to \( \R \) into the vector space of *all* functions from \( S \) to \( \R \). We will return to this point in our next discussion.

#### Uniform Transition Semigroups

We can obtain stronger results for the generator matrix if we impose stronger continuity assumptions on \( \bs{P} \).

The transition semigroup \( \bs{P} = \{P_t: t \in [0, \infty)\} \) is uniform if \( P_t(x, x) \to 1 \) as \( t \downarrow 0 \) uniformly in \( x \in S \).

If \( \bs{P} \) is uniform, then the operator function \( t \mapsto P_t \) is continuous on the vector space \( \mathscr{B} \).

## Proof

The statement means that for \( f \in \mathscr{B} \), the function \( t \mapsto P_t f \) is continuous with respect to the supremum norm on \( \mathscr{B} \).

As usual, we want to look at this new assumption from different points of view.

The following are equivalent:

- The transition semigroup \( \bs{P} \) is uniform.
- The exponential parameter function \( \lambda \) is bounded.
- The generator matrix \( G \) defiens a bounded linear operator on \( \mathscr{B} \).

## Proof

From our remarks above we know that \( \lambda \in \mathscr{B} \) if and only if the generator matrix \( G \) defines a bounded linear operator on \( \mathscr{B} \). So we just need to show the equivalence of (a) and (b). If \( \lambda \in \mathscr{B} \) then \[ P_t(x, x) = \P(X_t = x \mid X_0 = x) \ge \P(\tau \gt t \mid X_0 = x) = \exp[-\lambda(x) t] \ge \exp(-\|\lambda\|t) \] The last term converges to 1 as \( t \downarrow 0 \) uniformly in \( x \).

So when the equivalent conditions are satisfied, the Markov chain \( \bs X = \{X_t: t \in [0, \infty)\} \) is also said to be uniform. As we will see in a later section, a uniform, continuous-time Markov chain can be constructed from a discrete-time Markov chain and an independent Poisson process. For a uniform transition semigroup, we have a companion to the backward equation.

Suppose that \( \bs{P} \) is a uniform transition semigroup. Then \( t \mapsto P_t \) satisfies the Kolmogorov forward equation \( P^\prime_t = P_t G \). Explicitly, \[ P^\prime_t(x,y) = -\lambda(y) P_t(x, y) + \sum_{z \in S} P_t(x, z) \lambda(z) Q(z, y), \quad (x, y) \in S^2 \]

The backward equation holds with more generality than the forward equation, since we only need the transition semigroup \( \bs{P} \) to be *standard* rather than *uniform*. It would seem that we need stronger conditions on \( \lambda \) for the forward equation to hold, for otherwise it's not even obvious that \( \sum_{z \in S} P_t(x, z) \lambda(z) Q(z, y) \) is finite for \( (x, y) \in S \). On the other hand, the forward equation is sometimes easier to solve than the backward equation, and the assumption that \( \lambda \) is bounded is met in many applications (and of course holds automatically if \( S \) is finite).

As a simple corollary, the transition matrices and the generator matrix commute for a uniform semigroup: \( P_t G = G P_t \) for \( t \in [0, \infty) \). The forward and backward equations formally look like the differential equations for the exponential function. This actually holds with the operator exponential.

Suppose again that \( \bs{P} = \{P_t: t \in [0, \infty)\} \) is a uniform transition semigroup with generator \( G \). Then \[ P_t = e^{t G} = \sum_{n=0}^\infty \frac{t^n}{n!} G^n, \quad t \in [0, \infty) \]

## Proof

First \( e^{t G} \) is well defined as a bounded linear operator on \( \mathscr{B} \) for \( t \in [0, \infty) \) (and hence also simply as a matrix), since \( G \) is a bounded linear operator on \( \mathscr{B} \). Trivially \( e^{0 G} = I\), and by basic properties of the matrix exponential, \[ \frac{d}{dt} e^{t G} = G e^{t G}, \quad t \in (0, \infty) \] It follows that \( P_t = e^{t G} \) for \( t \in [0, \infty) \).

We can characterize the generators of uniform transition semigroups. We just need the minimal conditions that the diagonal entries are nonpositive and the row sums are 0.

Suppose that \( G \) a matrix on \( S \) with \( \|G\| \lt \infty \). Then \( G \) is the generator of a uniform transition semigroup \( \bs{P} = \{P_t: t \in [0, \infty)\} \) if and only if for every \( x \in S \),

- \( G(x, x) \le 0 \)
- \(\sum_{y \in S} G(x, y) = 0\)

## Proof

We know of course that if \( G \) is the generator of a transition semigroup, then conditions (a) and (b) hold. For the converse, we can use the previous result. Let \[ P_t = e^{t G} = \sum_{n=0}^\infty \frac{t^n}{n!} G^n, \quad t \in [0, \infty) \] which makes sense since \( G \) is bounded in norm. Then \( P_t(x, y) \ge 0 \) for \( (x, y) \in S^2 \). By part (b), \( \sum_{y \in S} G^n(x, y) = 0 \) for every \( x \in S \) and \( n \in \N_+ \), and hence \( \sum_{y \in S} P_t(x, y) = \sum_{y \in S} I(x, y) = 1 \) for \( x \in S \). Finally, the semigroup property is a consequence of the law of exponents, which holds for the exponential of a matrix. \[ P_s P_t = e^{s G} e^{t G} = e^{(s+t) G} = P_{s+t} \]

### Examples and Exercises

#### The Two-State Chain

Let \( \bs{X} = \{X_t: t \in [0, \infty)\} \) be the Markov chain on the set of states \( S = \{0, 1\} \), with transition rate \( a \in [0, \infty) \) from 0 to 1 and transition rate \( b \in [0, \infty) \) from 1 to 0. This two-state Markov chain was studied in the previous section. To avoid the trivial case with both states absorbing, we will assume that \( a + b \gt 0 \).

The generator matrix is \[ G = \left[\begin{matrix} -a & a \\ b & -b\end{matrix}\right] \]

Show that for \( t \in [0, \infty) \), \[ P_t = \frac{1}{a + b} \left[\begin{matrix} b & a \\ b & a \end{matrix} \right] - \frac{1}{a + b} e^{-(a + b)t} \left[\begin{matrix} -a & a \\ b & -b\end{matrix}\right] \]

- By solving the Kolmogorov backward equation.
- By solving the Kolmogorov forward equation.
- By computing \( P_t = e^{t G} \).

You probably noticed that the forward equation is easier to solve because there is less coupling of terms than in the backward equation.

Define the probability density function \( f \) on \( S \) by \( f(0) = \frac{b}{a + b} \), \( f(1) = \frac{a}{a + b} \). Show that

- \( P_t \to \frac{1}{a + b} \left[\begin{matrix} b & a \\ b & a \end{matrix} \right] \) as \(t \to \infty \), the matrix with \( f \) in both rows.
- \( f P_t = f \) for all \( t \in [0, \infty) \), so that \( f \) is invariant for \( \bs{P} \).
- \( f G = 0 \).

#### Computational Exercises

Consider the Markov chain \( \bs{X} = \{X_t: t \in [0, \infty)\} \) on \( S = \{0, 1, 2\} \) with exponential parameter function \( \lambda = (4, 1, 3) \) and embedded transition matrix \[ Q = \left[\begin{matrix} 0 & \frac{1}{2} & \frac{1}{2} \\ 1 & 0 & 0 \\ \frac{1}{3} & \frac{2}{3} & 0\end{matrix}\right] \]

- Draw the state graph and classify the states.
- Find the generator matrix \( G \).
- Find the transition matrix \( P_t \) for \( t \in [0, \infty) \).
- Find \( \lim_{t \to \infty} P_t \).

## Answer

- The edge set is \( E = \{(0, 1), (0, 2), (1, 0), (2, 0), (2, 1)\} \). All states are stable.
- The generator matrix is \[ G = \left[\begin{matrix} -4 & 2 & 2 \\ 1 & -1 & 0 \\ 1 & 2 & -3 \end{matrix}\right] \]
- For \( t \in [0, \infty) \), \[ P_t = \frac{1}{15} \left[\begin{matrix} 3 + 12 e^{-5 t} & 10 - 10 e^{-3 t} & 2 - 12 e^{-5 t} + 10 e^{-3 t} \\ 3 - 3 e^{-5 t} & 10 + 5 e^{-3 t} & 2 + 3 e^{-5t} - 5 e^{-3 t} \\ 3 - 3 e^{-5 t} & 10 - 10 e^{-3 t} & 2 + 3 e^{-5 t} + 10 e^{-3 t} \end{matrix}\right] \]
- \[ P_t \to \frac{1}{15} \left[\begin{matrix} 3 & 10 & 2 \\ 3 & 10 & 2 \\ 3 & 10 & 2 \end{matrix}\right] \]

#### Special Models

Read the discussion of generator and transition matrices for chains subordinate to the Poisson process.

Read the discussion of the infinitesimal generator for continuous-time birth-death chains.

Read the discussion of the infinitesimal generator for continuous-time queuing chains.

Read the discussion of the infinitesimal generator for continuous-time branching chains.