# 2.10: Stochastic Processes

- Page ID
- 10138

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)## Introduction

This section requires measure theory, so you may need to review the advanced sections in the chapter on Foundations and in this chapter. In particular, recall that a set \( E \) almost always comes with a \( \sigma \)-algebra \( \mathscr E \) of admissible subsets, so that \( (E, \mathscr E) \) is a measurable space. Usually in fact, \( E \) has a topology and \( \mathscr E \) is the corresponding Borel \( \sigma \)-algebra, that is, the \( \sigma \)-algebra generated by the topology. If \( E \) is countable, we almost always take \( \mathscr E \) to be the collection of all subsets of \( E \), and in this case \( (E, \mathscr E) \) is a discrete space. The other common case is when \( E \) is an uncountable measurable subset of \( \R^n \) for some \( n \in \N \), in which case \( \mathscr E \) is the collection of measurable subsets of \( E \). If \( (E_1, \mathscr E_1), \, (E_2, \mathscr E_2), \ldots, (E_n, \mathscr E_n) \) are measurable spaces for some \( n \in \N_+ \), then the Cartesian product \( E_1 \times E_2 \times \cdots \times E_n \) is given the product \( \sigma \)-algebra \( \mathscr E_1 \otimes \mathscr E_2 \otimes \cdots \otimes \mathscr E_n \). As a special case, the Cartesian power \( E^n \) is given the corresponding power \( \sigma \)-algebra \( \mathscr E^n \).

With these preliminary remarks out of the way, suppose that \( (\Omega, \mathscr F, \P) \) is a probability space, so that \( \Omega \) is the set of outcomes, \( \mathscr F \) the \( \sigma \)-algebra of events, and \( \P \) is the probability measure on the sample space \( (\Omega, \mathscr F) \). Suppose also that \( (S, \mathscr S) \) and \( (T, \mathscr T) \) are measurable spaces. Here is our main definition:

A random process or stochastic process on \( (\Omega, \mathscr F, \P) \) with state space \( (S, \mathscr S) \) and index set \( T \) is a collection of random variables \( \bs X = \{X_t: t \in T\} \) such that \( X_t \) takes values in \( S \) for each \( t \in T \).

Sometimes it's notationally convenient to write \( X(t) \) instead of \( X_t \) for \( t \in T \). Often \( T = \N \) or \( T = [0, \infty) \) and the elements of \( T \) are interpreted as points in time (discrete time in the first case and continuous time in the second). So then \( X_t \in S \) is the state of the random process at time \( t \in T \), and the index space \( (T, \mathscr T) \) becomes the time space.

Since \( X_t \) is itself a function from \( \Omega \) into \( S \), it follows that ultimately, a stochastic process is a function from \( \Omega \times T \) into \( S \). Stated another way, \( t \mapsto X_t \) is a random function on the probability space \( (\Omega, \mathscr F, \P) \). To make this precise, recall that \( S^T \) is the notation sometimes used for the collection of functions from \( T \) into \( S \). Recall also that a natural \( \sigma \)-algebra used for \( S^T \) is the one generated by sets of the form \[ \left\{f \in S^T: f(t) \in A_t \text{ for all } t \in T\right\}, \text{ where } A_t \in \mathscr S \text{ for every } t \in T \text{ and } A_t = S \text{ for all but finitely many } t \in T \] This \( \sigma \)-algebra, denoted \( \mathscr S^T \), generalizes the ordinary power \( \sigma \)-algebra \( \mathscr S^n \) mentioned in the opening paragraph and will be important in the discussion of existence below.

Suppose that \( \bs X = \{X_t: t \in T\} \) is a stochastic process on the probability space \( (\Omega, \mathscr F, \P) \) with state space \( (S, \mathscr S) \) and index set \( T \). Then the mapping that takes \( \omega \) into the function \( t \mapsto X_t(\omega) \) is measurable with respect to \( (\Omega, \mathscr F) \) and \( (S^T, \mathscr S^T) \).

## Proof

Recall that a mapping with values in \( S^T \) is measurable if and only if each of its coordinate functions

is measurable. In the present context that means that we must show that the function \( X_t \) is measurable with respect to \( (\Omega, \mathscr F) \) and \( (S, \mathscr S) \) for each \( t \in T \). But of course, that follows from the very meaning of the term *random variable*.

For \( \omega \in \Omega \), the function \( t \mapsto X_t(\omega) \) is known as a sample path of the process. So \( S^T \), the set of functions from \( T \) into \( S \), can be thought of as a *set of outcomes* of the stochastic process \( \bs X \), a point we will return to in our discussion of existence below.

As noted in the proof of the last theorem, \( X_t \) is a measurable function from \( \Omega \) into \( S \) for each \( t \in T \), by the very meaning of the term *random variable*. But it does not follow in general that \( (\omega, t) \mapsto X_t(\omega) \) is measurable as a function from \( \Omega \times T \) into \( S \). In fact, the \( \sigma \)-algebra on \( T \) has played no role in our discussion so far. Informally, a statement about \( X_t \) for a *fixed* \( t \in T \) or even a statement about \( X_t \) for countably many \( t \in T \) defines an event. But it does not follow that a statement about \( X_t \) for uncountably many \( t \in T \) defines an event. We often want to make such statements, so the following definition is inevitable:

A stochastic process \( \bs X = \{X_t: t \in T\} \) defined on the probability space \( (\Omega, \mathscr F, \P) \) and with index space \( (T, \mathscr T) \) and state space \( (S, \mathscr S) \) is measurable if \( (\omega, t) \mapsto X_t(\omega) \) is a measurable function from \( \Omega \times T \) into \( S \).

Every stochastic process indexed by a countable set \( T \) is measurable, so the definition is only important when \( T \) is uncountable, and in particular for \( T = [0, \infty) \).

## Equivalent Processes

Our next goal is to study different ways that two stochastic processes, with the same state and index spaces, can be equivalent

. We will assume that the diagonal \( D = \{(x, x): x \in S\} \in \mathscr S^2 \), an assumption that almost always holds in applications, and in particular for the discrete and Euclidean spaces that are most important to us. Sufficient conditions are that \( \mathscr S \) have a sub \( \sigma \)-algebra that is countably generated and contains all of the singleton sets, properties that hold for the Borel \( \sigma \)-algebra when the topology on \( S \) is locally compact, Hausdorff, and has a countable base.

First, we often feel that we understand a random process \( \bs X = \{X_t: t \in T\} \) well if we know the finite dimensional distributions, that is, if we know the distribution of \( \left(X_{t_1}, X_{t_2}, \ldots, X_{t_n}\right) \) for every choice of \( n \in \N_+ \) and \( (t_1, t_2, \ldots, t_n) \in T^n \). Thus, we can compute \( \P\left[\left(X_{t_1}, X_{t_2}, \ldots, X_{t_n}\right) \in A\right] \) for every \( n \in \N_+ \), \( (t_1, t_2, \ldots, t_n) \in T^n \), and \( A \in \mathscr S^n \). Using various rules of probability, we can compute the probabilities of many events involving infinitely many values of the index parameter \( t \) as well. With this idea in mind, we have the following definition:

Random processes \( \bs X = \{X_t: t \in T\} \) and \( \bs{Y} = \{Y_t: t \in T\} \) with state space \( (S, \mathscr S) \) and index set \( T \) are equivalent in distribution if they have the same finite dimensional distributions. This defines an equivalence relation on the collection of stochastic processes with this state space and index set. That is, if \( \bs X \), \( \bs Y \), and \( \bs Z \) are such processes then

- \( \bs X \) is equivalent in distribution to \( \bs X \) (the reflexive property)
- If \( \bs X \) is equivalent in distribution to \( \bs{Y} \) then \( \bs{Y} \) is equivalent in distribution to \( \bs X \) (the symmetric property)
- If \( \bs X \) is equivalent in distribution to \( \bs{Y} \) and \( \bs{Y} \) is equivalent in distribution to \( \bs{Z} \) then \( \bs X \) is equivalent in distribution to \( \bs{Z} \) (the transitive property)

Note that since only the finite-dimensional distributions of the processes \( \bs X \) and \( \bs Y \) are involved in the definition, the processes need not be defined on the same probability space. Thus, *equivalence in distribution* partitions the collection of all random processes with a given state space and index set into mutually disjoint equivalence classes. But of course, we already know that two random variables can have the same *distribution* but be very different as variables (functions on the sample space). Clearly, the same statement applies to random processes.

Suppose that \( \bs X = (X_1, X_2, \ldots) \) is a sequence of Bernoulli trials with success parameter \( p = \frac{1}{2} \). Let \( Y_n = 1 - X_n \) for \( n \in \N_+ \). Then \( \bs{Y} = (Y_1, Y_2, \ldots) \) is equivalent in distribution to \( \bs X \) but \[ \P(X_n \ne Y_n \text{ for every } n \in \N_+) = 1 \]

## Proof

By the meaning of *Bernoulli trials*, \( \bs X \) is a sequence of independent indicator random variables with \( \P(X_n = 1) = \frac{1}{2} \) for each \( n \in \N_+ \). It follows that \( \bs{Y} \) is also a Bernoulli trials sequence with success parameter \( \frac{1}{2} \), so \( \bs X \) and \( \bs{Y} \) are equivalent in distribution. Also, of course, the state set is \( \{0, 1\} \) and \( Y_n = 1 \) if and only if \( X_n = 0 \).

Motivated by this example, let's look at another, stronger way that random processes can be equivalent. First recall that random variables \( X \) and \( Y \) on \( (\Omega, \mathscr F, \P) \), with values in \( S \), are equivalent if \( \P(X = Y) = 1 \).

Suppose that \( \bs X = \{X_t: t \in T\} \) and \( \bs{Y} = \{Y_t: t \in T\} \) are stochastic processes defined on the same probability space \( (\Omega, \mathscr F, \P) \) and both with state space \( (S, \mathscr S) \) and index set \( T \). Then \( \bs{Y} \) is a versions of \( \bs X \) if \( Y_t \) is equivalent to \( X_t \) (so that \( \P(X_t = Y_t) = 1 \)) for every \( t \in T \). This defines an equivalence relation on the collection of stochastic processes on the same probability space and with the same state space and index set. That is, if \( \bs X \), \( \bs Y \), and \( \bs Z \) are such processes then

- \( \bs X \) is a version of \( \bs X \) (the reflexive property)
- If \( \bs X \) is a version of \( \bs{Y} \) then \( \bs{Y} \) is ia version of \( \bs X \) (the symmetric property)
- If \( \bs X \) is a version of \( \bs{Y} \) and \( \bs{Y} \) is of \( \bs{Z} \) then \( \bs X \) is a version of \( \bs{Z} \) (the transitive property)

## Proof

Note that \( (X_t, Y_t) \) is a random variable with values in \( S^2 \) (and so the function \( \omega \mapsto (X_t(\omega), Y_t(\omega)) \) is measurable). The event \( \{X_t = Y_t\} \) is the inverse image of the diagonal \( D \in \mathscr S^2 \) under this mapping, and so the definition makes sense.

So the *version of* relation partitions the collection of stochastic processes on a given probability space and with a given state space and index set into mutually disjoint equivalence classes.

Suppose again that \( \bs X = \{X_t: t \in T\} \) and \( \bs{Y} = \{Y_t: t \in T\} \) are random processes on \( (\Omega, \mathscr F, \P) \) with state space \( (S, \mathscr S) \) and index set \( T \). If \( \bs{Y} \) is a version of \( \bs X \) then \( \bs{Y} \) and \( \bs X \) are equivalent in distribution.

## Proof

Suppose that \( (t_1, t_2, \ldots, t_n) \in T^n \) and that \( A \in \mathscr S^n \). Recall that the intersection of a finite (or even countably infinite) collection of events with probability 1 still has probability 1. Hence \begin{align} \P\left[\left(X_{t_1}, X_{t_2}, \ldots, X_{t_n}\right) \in A\right] & = \P\left[\left(X_{t_1}, X_{t_2}, \ldots, X_{t_n}\right) \in A, \, X_{t_1} = Y_{t_1}, X_{t_2} = Y_{t_2}, \ldots, X_{t_n} = Y_{t_n} \right] \\ & = \P\left[\left(Y_{t_1}, Y_{t_2}, \ldots, Y_{t_n}\right) \in A, \, X_{t_1} = Y_{t_1}, X_{t_2} = Y_{t_2}, \ldots, X_{t_n} = Y_{t_n} \right] = \P\left[\left(Y_{t_1}, Y_{t_2}, \ldots, Y_{t_n}\right) \in A\right] \end{align}

As noted in the proof, a countable intersection of events with probability 1 still has probability 1. Hence if \( T \) is countable and random processes \( \bs X \) is a version of \( \bs{Y} \) then \[ \P(X_t = Y_t \text{ for all } t \in T) = 1 \] so \( \bs X \) and \( \bs{Y} \) really are essentially the same random process. But when \( T \) is uncountable the result in the displayed equation may not be true, and \( \bs X \) and \( \bs{Y} \) may be very different as random functions on \( T \). Here is a simple example:

Suppose that \( \Omega = T = [0, \infty) \), \( \mathscr F = \mathscr T \) is the \( \sigma \)-algebra of Borel measurable subsets of \( [0, \infty) \), and \( \P \) is any continuous probability measure on \( (\Omega, \mathscr F) \). Let \( S = \{0, 1\} \) (with all subsets measurable, of course). For \( t \in T \) and \( \omega \in \Omega \), define \( X_t(\omega) = \bs{1}_t(\omega) \) and \( Y_t(\omega) = 0 \). Then \( \bs X = \{X_t: t \in T\} \) is a version of \( \bs{Y} = \{Y_t: t \in T\} \), but \( \P(X_t = Y_t \text{ for all } t \in T\} = 0 \).

## Proof

For \( t \in [0, \infty) \), \( \P(X_t \ne Y_t) = \P\{t\} = 0 \) since \( P \) is a continuous measure. But \( \{\omega \in \Omega: X_t(\omega) = Y_t(\omega) \text{ for all } t \in T\} = \emptyset \).

Motivated by this example, we have our strongest form of equivalence:

Suppose that \( \bs X = \{X_t: t \in T\} \) and \( \bs{Y} = \{Y_t: t \in T\} \) are measurable random processes on the probability space \( (\Omega, \mathscr F, \P) \) and with state space \( (S, \mathscr S) \) and index space \( (T, \mathscr T) \). Then \( \bs X \) is indistinguishable from \( \bs{Y} \) if \( \P(X_t = Y_t \text{ for all } t \in T) = 1 \). This defines an equivalence relation on the collection of measurable stochastic processes defined on the same probability space and with the same state and index spaces. That is, if \( \bs X \), \( \bs Y \), and \( \bs Z \) are such processes then

- \( \bs X \) is indistinguishable from \( \bs X \) (the reflexive property)
- If \( \bs X \) is indistinguishable from \( \bs{Y} \) then \( \bs{Y} \) is indistinguishable from \( \bs X \) (the symmetric property)
- If \( \bs X \) is indistinguishable from \( \bs{Y} \) and \( \bs{Y} \) is indistinguishable from \( \bs{Z} \) then \( \bs X \) is indistinguishable from \( \bs{Z} \) (the transitive property)

## Details

The measurability requirement for the stochastic processes is needed to ensure that \( \{X_t = Y_t \text{ for all } t \in T\} \) is a valid event. To see this, note that \( (\omega, t) \mapsto (X_t(\omega), Y_t(\omega)) \) is measurable, as a function from \( \Omega \times T \) into \( S^2 \). As before, let \( D = \{(x, x): x \in S\} \) denote the diagonal. Then \( D^c \in \mathscr S^2 \) and the inverse image of \( D^c \) under our mapping is \[\{(\omega, t) \in \Omega \times T: X_t(\omega) \ne Y_t(\omega)\} \in \mathscr F \otimes \mathscr T\] The projection of this set onto \( \Omega \) \[ \{\omega \in \Omega: X_t(\omega) \ne Y_t(\omega) \text{ for some } t \in T\} \in \mathscr F \] since the projection of a measurable set in the product space is also measurable. Hence the complementary event \[ \{\omega \in \Omega: X_t(\omega) = Y_t(\omega) \text{ for all } t \in T\} \in \mathscr F \]

So the *indistinguishable from* relation partitions the collection of measurable stochastic processes on a given probability space and with given state space and index space into mutually disjoint equivalence classes. Trivially, if \( \bs X \) is indistinguishable from \( \bs{Y} \), then \( \bs X \) is a version of \( \bs{Y} \). As noted above, when \( T \) is countable, the converse is also true, but not, as our previous example shows, when \( T \) is uncountable. So to summarize, *indistinguishable from* implies *version of* implies *equivalent in distribution*, but none of the converse implications hold in general.

## The Kolmogorov Construction

In applications, a stochastic process is often modeled by giving various distributional properties that the process should satisfy. So the basic existence problem is to construct a process that has these properties. More specifically, how can we construct random processes with specified finite dimensional distributions? Let's start with the simplest case, one that we have seen several times before, and build up from there. Our simplest case is to construct a single random variable with a specified distribution.

Suppose that \( (S, \mathscr S, P) \) is a probability space. Then there exists a random variable \( X \) on probability space \( (\Omega, \mathscr F, \P) \) such that \( X \) takes values in \( S \) and has distribution \( P \).

## Proof

The proof is utterly trivial. Let \( (\Omega, \mathscr F, \P) = (S, \mathscr S, P) \) and define \( X: \Omega \to S \) by \( X(\omega) = \omega \), so that \( X \) is the identity function. Then \( \{X \in A\} = A \) and so \( \P(X \in A) = P(A) \) for \( A \in \mathscr S \).

In spite of its triviality the last result contains the seeds of everything else we will do in this discussion. Next, let's see how to construct a sequence of independent random variables with specified distributions.

Suppose that \( P_i \) is a probability measure on the measurable space \( (S, \mathscr S) \) for \( i \in \N_+ \). Then there exists an independent sequence of random variables \( (X_1, X_2, \ldots) \) on a probability space \( (\Omega, \mathscr F, \P) \) such that \( X_i \) takes values in \(S\) and has distribution \( P_i \) for \( i \in \N_+ \).

## Proof

Let \( \Omega = S^\infty = S \times S \times \cdots \). Next let \( \mathscr F = \mathscr S^\infty \), the corresponding product \( \sigma \)-algebra. Recall that this is the \( \sigma \)-algebra generated by sets of the form \[ A_1 \times A_2 \times \cdots \text{ where } A_i \in \mathscr S \text{ for each } i \in I \text{ and } A_i = S \text{ for all but finitely many } i \in I \] Finally, let \( \P = P_1 \otimes P_2 \otimes \cdots \), the corresponding product measure on \( (\Omega, \mathscr F) \). Recall that this is the unique probability measure that satisfies \[ \P(A_1 \times A_2 \times \cdots) = P_1(A_1) P_2(A_2) \cdots \] where \( A_1 \times A_2 \times \cdots \) is a set of the type in the first displayed equation. Now define \( X_i \) on \( \Omega \) by \( X_i(\omega_1, \omega_2, \ldots) = \omega_i\), for \( i \in \N_+ \), so that \( X_i \) is simply the coordinate function for index \( i \). If \( A_1 \times A_2 \times \cdots \) is a set of the type in the first displayed equation then \[ \{X_1 \in A_1, X_2 \in A_2, \ldots\} = A_1 \times A_2 \times \cdots \] and so by the definition of the product measure, \[ \P(X_1 \in A_1, X_2 \in A_2, \cdots) = P_1(A_1) P_2(A_2) \cdots \] It follows that \( (X_1, X_2, \ldots) \) is a sequence of independent variables and that \( X_i \) has distribution \( P_i \) for \( i \in \N \).

If you looked at the proof of the last two results you might notice that the last result can be viewed as a special case of the one before, since \( \bs X = (X_1, X_2, \ldots) \) is simply the identity function on \( \Omega = S^\infty \). The important step is the existence of the product measure \( \P \) on \( (\Omega, \mathscr F) \).

The full generalization of these results is known as the Kolmogorov existence theorem (named for Andrei Kolmogorov). We start with the state space \( (S, \mathscr S) \) and the index set \( T \). The theorem states that if we specify the finite dimensional distributions in a consistent way, then there exists a stochastic process defined on a suitable probability space that has the given finite dimensional distributions. The consistency condition is a bit clunky to state in full generality, but the basic idea is very easy to understand. Suppose that \( s \) and \( t \) are distinct elements in \( T \) and that we specify the distribution (probability measure) \( P_s \) of \( X_s \), \( P_t \) of \( X_t \), \( P_{s,t} \) of \( (X_s, X_t) \), and \( P_{t,s} \) of \( (X_t, X_s) \). Then clearly we must specify these so that \[ P_s(A) = P_{s,t}(A \times S), \quad P_t(B) = P_{s,t}(S \times B) \] For all \( A, \, B \in \mathscr S \). Clearly we also must have \( P_{s,t}(C) = P_{t,s}(C^\prime) \) for all measurable \( C \in \mathscr S^2 \), where \( C^\prime = \{(y, x): (x, y) \in C\} \).

To state the consistency conditions in general, we need some notation. For \( n \in \N_+ \), let \( T^{(n)} \subset T^n\) denote the set of \( n \)-tuples of distinct elements of \( T \), and let \( \bs{T} = \bigcup_{n=1}^\infty T^{(n)} \) denote the set of all finite sequences of distinct elements of \( T \). If \( n \in \N_+ \), \( \bs t = (t_1, t_2, \ldots, t_n) \in T^{(n)} \) and \( \pi \) is a permutation of \( \{1, 2, \ldots, n\} \), let \( \bs t \pi \) denote the element of \( T^{(n)} \) with coordinates \( (\bs t \pi)_i = t_{\pi(i)} \). That is, we permute the coordinates of \( \bs t \) according to \( \pi \). If \( C \in \mathscr S^n \), let \[ \pi C = \left\{(x_1, x_2, \ldots, x_n) \in S^n: \left(x_{\pi(1)}, x_{\pi(2)}, \ldots, x_{\pi(n)}\right) \in C\right\} \in \mathscr S^n \] finally, if \( n \gt 1 \), let \( \bs t_- \) denote the vector \( (t_1, t_2, \ldots, t_{n-1}) \in T^{(n-1)} \)

Now suppose that \( P_\bs t \) is a probability measure on \( (S^n, \mathscr S^n) \) for each \( n \in \N_+ \) and \( \bs t \in T^{(n)} \). The idea, of course, is that we want the collection \( \mathscr P = \{P_\bs t: \bs t \in \bs{T}\} \) to be the finite dimensional distributions of a random process with index set \( T \) and state space \( (S, \mathscr S) \). Here is the critical definition:

The collection of probability distributions \( \mathscr P \) relative to \( T \) and \( (S, \mathscr S) \) is consistent if

- \( P_{\bs t \pi}(C) = P_\bs t(\pi C) \) for every \( n \in \N_+ \), \( \bs t \in T^{(n)} \), permutation \( \pi \) of \( \{1, 2, \ldots, n\} \), and measurable \( C \subseteq S^n \).
- \( P_{\bs t_-}(C) = P_\bs t(C \times S) \) for every \( n > 1 \), \( \bs t \in T^{(n)} \), and measurable \( C \subseteq S^{n-1} \)

With the proper definition of consistence, we can state the fundamental theorem.

**Kolmogorov Existence Theorem**. If \( \mathscr P \) is a consistent collection of probability distributions relative to the index set \( T \) and the state space \( (S, \mathscr S) \), then there exists a probability space \( (\Omega, \mathscr F, \P) \) and a stochastic process \( \bs X = \{X_t: t \in T\} \) on this probability space such that \( \mathscr P \) is the collection of finite dimensional distribution of \( \bs X \).

## Proof sketch

Let \( \Omega = S^T \), the set of functions from \( T \) to \( S \). Such functions are the outcomes of the stochastic process. Let \( \mathscr F = \mathscr S^T \), the product \( \sigma \)-algebra, generated by sets of the form \[ B = \{\omega \in \Omega: \omega(t) \in A_t \text{ for all } t \in T\} \] where \( A_t \in \mathscr S \) for all \( t \in T \) and \( A_t = S \) for all but finitely many \( t \in T \). We know how our desired probability measure \( \P \) should work on the sets that generate \( \mathscr F \). Specifically, suppose that \( B \) is a set of the type in the displayed equation, and \( A_t = S \) except for \( \bs t = (t_1, t_2, \ldots, t_n) \in T^{(n)} \). Then we want \[ \P(B) = P_\bs t(A_{t_1} \times A_{t_2} \times \cdots \times A_{t_n}) \] Basic existence and uniqueness theorems in measure theory that we discussed earlier, and the consistency of \( \mathscr P \), guarantee that \( \P \) can be extended to a probability measure on all of \( \mathscr F \). Finally, for \( t \in T \) we define \( X_t: \Omega \to S \) by \( X_t(\omega) = \omega(t) \) for \( \omega \in \Omega \), so that \( X_t \) is simply the coordinate function of index \( t \). Thus, we have a stochastic process \( \bs X = \{X_t: t \in T\} \) with state space \( (S, \mathscr S) \), defined on the probability space \( (\Omega, \mathscr F, \P) \), with \( \mathscr P \) as the collection of finite dimensional distributions.

Note that except for the more complicated notation, the construction is very similar to the one for a sequence of independent variables. Again, \( \bs X \) is essentially the identity function on \( \Omega = S^T \). The important and more difficult part is the construction of the probability measure \( \P \) on \( (\Omega, \mathscr F) \).

## Applications

Our last discussion is a summary of the stochastic processes that are studied in this text. All are classics and are immensely important in applications.

Random processes are associated with Bernoulli trials include

- the Bernoulli trials sequence itself
- the sequence of binomial variables
- the sequence of geometric variables
- the sequence of negative binomial variables
- the simple random walk

## Construction

The Bernoulli trials sequence in (a) is a sequence of independent, identically distributed indicator random variables, and so can be constructed as in (). The random processes in (b)–(e) are constructed from the Bernoulli trials sequence.

Random process associated with the Poisson model include

- the sequence of inter-arrival times
- the sequence of arrival times
- the counting process on \( [0, \infty) \), both in the homogeneous and non-homogeneous cases.
- A compound Poisson process.
- the counting process on a general measure space

## Constructions

The random process in (a) is a sequence of independent random variable with a common exponential distribution, and so can be constructed as in (). The processes in (b) and (c) can be constructed from the sequence in (a).

Random processes associated with renewal theory include

- the sequence of inter-arrival times
- the sequence of arrival times
- the counting process on \( [0, \infty) \)

Markov chains form a very important family of random processes as do Brownian motion and related processes. We will study these in subsequent chapters.