4.12: Uniformly Integrable Variables
Two of the most important modes of convergence in probability theory are convergence with probability 1 and convergence in mean. As we have noted several times, neither mode of convergence implies the other. However, if we impose an additional condition on the sequence of variables, convergence with probability 1 will imply convergence in mean. The purpose of this brief, but advanced section, is to explore the additional condition that is needed. This section is particularly important for the theory of martingales.
Basic Theory
As usual, our starting point is a random experiment modeled by a probability space \( (\Omega, \mathscr{F}, \P) \). So \( \Omega \) is the set of outcomes, \( \mathscr{F} \) is the \( \sigma \)-algebra of events, and \( \P \) is the probability measure on the sample space \( (\Omega, \mathscr F) \). In this section, all random variables that are mentioned are assumed to be real valued, unless otherwise noted. Next, recall from the section on vector spaces that for \( k \in [1, \infty) \), \( \mathscr{L}_k \) is the vector space of random variables \( X \) with \( \E(|X|^k) \lt \infty \), endowed with the norm \( \|X\|_k = \left[\E(X^k)\right]^{1/k} \). In particular, \( X \in \mathscr{L}_1 \) simply means that \( \E(|X|) \lt \infty \) so that \( \E(X) \) exists as a real number. From the section on expected value as an integral, recall the following notation, assuming of course that the expected value makes sense: \[ \E(X; A) = \E(X \bs{1}_A) = \int_A X \, d\P \]
Definition
The following result is motivation for the main definition in this section.
If \( X \) is a random variable then \( \E(|X|) \lt \infty \) if and only if \( \E(|X|; |X| \ge x) \to 0 \) as \( x \to \infty \).
Proof
Note that that \( |X| \bs{1}(|X| \le x) \) is nonnegative, increasing in \( x \in [0, \infty) \) and \( |X| \bs{1}(|X| \le x) \to |X| \) as \( x \to \infty \). From the monotone convergence theorem, \( \E(|X|; |X| \le x) \to \E(|X|) \) as \( x \to \infty \). On the other hand, \[ \E(|X|) = \E(|X|; |X| \le x) + \E(|X|; |X| \gt x) \] If \( \E(|X|) \lt \infty \) then taking limits in the displayed equation shows that \( \E(|X|: |X| \gt x) \to 0 \) as \( x \to \infty \). On the other hand, \( \E(|X|; |X| \le x) \le x \). So if \( \E(|X|) = \infty \) then \( \E(|X|; |X| \gt x) = \infty \) for every \( x \in [0, \infty) \).
Suppose now that \( X_i \) is a random variable for each \( i \) in a nonempty index set \( I \) (not necessarily countable). The critical definition for this section is to require the convergence in the previous theorem to hold uniformly for the collection of random variables \( \bs X = \{X_i: i \in I\} \).
The collection \( \bs X = \{X_i: i \in I\} \) is uniformly integrable if for each \( \epsilon \gt 0 \) there exists \( x \gt 0 \) such that for all \( i \in I \), \[ \E(|X_i|; |X_i| \gt x) \lt \epsilon \] Equivalently \( \E(|X_i|; |X_i| \gt x) \to 0 \) as \( x \to \infty \) uniformly in \( i \in I \).
Properties
Our next discussion centers on conditions that ensure that the collection of random variables \( \bs X = \{X_i: i \in I\} \) is uniformly integrable. Here is an equivalent characterization:
The collection \( \bs X = \{X_i: i \in I\} \) is uniformly integrable if and only if the following conditions hold:
- \( \{\E(|X_i|): i \in I\} \) is bounded.
- For each \( \epsilon \gt 0 \) there exists \( \delta \gt 0 \) such that if \( A \in \mathscr{F} \) and \( \P(A) \lt \delta \) then \( \E(|X_i|; A) \lt \epsilon \) for all \( i \in I \).
Proof
Suppose that \( \bs X \) is uniformly integrable. With \( \epsilon = 1 \) there exists \( x \gt 0 \) such that \( \E(|X_i|; |X_i| \gt x) \lt 1 \) for all \( i \in I \). Hence \[ \E(|X_i|) = \E(|X_i|; |X_i| \le x) + \E(|X_i|; |X_i| \gt x) \le x + 1, \quad i \in I \] so (a) holds. For (b), let \( \epsilon \gt 0 \). There exists \( x \gt 0 \) such that \( \E(|X_i|; |X_i| \gt x) \lt \epsilon / 2 \) for all \( i \in I \). Let \( \delta = \epsilon / 2 x \). If \( A \in \mathscr{F} \) and \( \P(A) \lt \delta \) then \[ \E(|X_i|; A) = \E(|X_i|; A \cap \{|X| \le x\}) + \E(|X_i|; A \cap \{|X| \gt x\}) \le x \P(A) + \E(|X_i|; |X| \gt x) \lt \epsilon / 2 + \epsilon / 2 = \epsilon\] Conversely, suppose that (a) and (b) hold. By (a), there exists \( c \gt 0 \) such that \( \E(|X_i|) \le c \) for all \( i \in I \). Let \( \epsilon \gt 0 \). By (b) there exists \( \delta \gt 0 \) such that if \( A \in \mathscr{F} \) with \( \P(A) \lt \delta \) then \( \E(|X_i|; A) \lt \epsilon \) for all \( i \in I \). Next, by Markov's inequality, \[ \P(|X_i| \gt x) \le \frac{\E(|X_i|)}{x} \le \frac{c}{x}, \quad i \in I \] Pick \( x \gt 0 \) such that \( c / x \lt \delta \), so that \(\P(|X_i| \gt x) \lt \delta\) for each \( i \in I \). Then for each \( j \in I \), \( \E(|X_i|; |X_j| \gt x) \lt \epsilon \) for all \( i \in I \) and so in particular, \( \E(|X_i|; |X_i| \gt x) \lt \epsilon \) for all \( i \in I \). Hence \( \bs X \) is uniformly integrable.
Condition (a) means that \( \bs X \) is bounded (in norm) as a subset of the vector space \( \mathscr{L}_1 \). Trivially, a finite collection of integrable random variables is uniformly integrable.
Suppose that \( I \) is finite and that \( \E(|X_i|) \lt \infty \) for each \( i \in I \). Then \( \bs X = \{X_i: i \in I\} \) is uniformly integrable.
A subset of a uniformly integrable set of variables is also uniformly integrable.
If \( \{X_i: i \in I\} \) is uniformly integrable and \( J \) is a nonempty subset of \( I \), then \( \{X_j: j \in J\} \) is uniformly integrable.
If the random variables in the collection are dominated in absolute value by a random variable with finite mean, then the collection is uniformly integrable.
Suppose that \( Y \) is a nonnegative random variable with \( \E(Y) \lt \infty \) and that \( |X_i| \le Y \) for each \( i \in I \). Then \( \bs X = \{X_i: i \in I\} \) is uniformly integrable.
Proof
Clearly \( \E(|X_i|; |X_i| \gt x) \le E(Y; Y \gt x) \) for \( x \in [0, \infty) \) and for all \( i \in I \). The right side is independent of \( i \in I \), and by the theorem above , converges to 0 as \( x \to \infty \). Hence \( \bs X \) is uniformly integrable.
The following result is more general, but essentially the same proof works.
Suppose that \( \bs Y = \{X_j: j \in J\} \) is uniformly integrable, and \( \bs X = \{X_i: i \in I\} \) is a set of variables with the property that for each \( i \in I \) there exists \( j \in J \) such that \( |X_i| \le |Y_j| \). Then \( \bs X \) is uniformly integrable.
As a simple corollary, if the variables are bounded in absolute value then the collection is uniformly integrable.
If there exists \( c \gt 0 \) such that \( |X_i| \le c \) for all \( i \in I \) then \( \bs X = \{X_i: i \in I\} \) is uniformly integrable.
Just having \( \E(|X_i|) \) bounded in \( i \in I \) (condition (a) in the characterization above ) is not sufficient for \( \bs X = \{X_i: i \in I\} \) to be uniformly integrable; a counterexample is given below. However, if \( \E(|X_i|^k) \) is bounded in \( i \in I \) for some \( k \gt 1 \), then \( \bs X \) is uniformly integrable. This condition means that \( \bs X \) is bounded (in norm) as a subset of the vector space \( \mathscr{L}_k \).
If \( \left\{\E(|X_i|^k: i \in I\right\} \) is bounded for some \( k \gt 1 \), then \( \{X_i: i \in I\} \) is uniformly integrable.
Proof
Suppose that for some \( k \gt 1 \) and \( c \gt 0 \), \( \E(|X_i|^k) \le c \) for all \( i \in I \). Then \( k - 1 \gt 0 \) and so \( t \mapsto t^{k-1} \) is increasing on \( (0, \infty) \). So if \( |X_i| \gt x \) for \( x \gt 0 \) then \[ |X_i|^k = |X_i| |X_i|^{k-1} \ge |X_i| x^{k-1} \] Hence \( |X_i| \le |X_i|^k / x^{k-1} \) on the event \( |X_i| \gt x \). Therefore \[ \E(|X_i|; |X_i| \gt x) \le \E\left(\frac{|X_i|^k}{x^{k-1}}; |X_i| \gt x\right) \le \frac{\E(|X_i|^k)}{x^{k-1}} \le \frac{c}{x^{k-1}} \] The last expression is independent of \( i \in I \) and converges to 0 as \( x \to \infty \). Hence \( \bs X \) is uniformly integrable.
Uniformly integrability is closed under the operations of addition and scalar multiplication.
Suppose that \( \bs X = \{X_i: i \in I\} \) and \( \bs Y = \{Y_i: i \in I\} \) are uniformly integrable and that \( c \in \R \). Then each of the following collections is also uniformly integrable.
- \( \bs X + \bs Y = \{X_i + Y_i: i \in I\} \)
- \( c \bs X = \{c X_i: i \in I\} \)
Proof
We use the characterization above . The proofs use standard techniques, so try them yourself.
- There exists \( a, \, b \in (0, \infty) \) such that \( \E(|X_i|) \le a \) and \( \E(|Y_i|) \le b \) for all \( i \in I \). Hence \[ \E(|X_i + Y_i|) \le \E(|X_i| + |Y_i|) \le \E(|X_i|) + \E(|Y_i|) \le a + b, \quad i \in I \] Next let \( \epsilon \gt 0 \). There exists \( \delta_1 \gt 0 \) such that if \( A \in \mathscr{F} \) with \( \P(A) \lt \delta_1 \) then \( \E(|X_i|; A) \lt \epsilon / 2 \) for all \( i \in I \), and similarly, there exists \( \delta_2 \gt 0 \) such that if \( A \in \mathscr{F} \) with \( \P(A) \lt \delta_2 \) then \( \E(|Y_i|; A) \lt \epsilon / 2 \) for all \( i \in I \). Hence if \( A \in \mathscr{F} \) with \( \P(A) \lt \delta_1 \wedge \delta_2 \) then \[ \E(|X_i + Y_i|; A) \le \E(|X_i| + |Y_i|; A) = \E(|X_i|; A) + \E(|Y_i|; A) \lt \epsilon / 2 + \epsilon / 2 = \epsilon, \quad i \in I \]
- There exists \( a \in (0, \infty) \) such that \( \E(|X_i|) \le a \) for all \( i \in I \). Hence \[ \E(|c X_i|) = |c| \E(|X_i|) \le c a, \quad i \in I \] The second condition is trivial if \( c = 0 \), so suppose \( c \ne 0 \). For \( \epsilon \gt 0 \) there exists \( \delta \gt 0 \) such that if \( A \in \mathscr{F} \) and \( \P(A) \lt \delta \) then \( \E(|X_i|; A) \lt \epsilon / c \) for all \( i \in I \). Hence \( \E(|c X_i|; A) = |c| \E(|X_i|; A) \lt \epsilon \).
The following corollary is trivial, but will be needed in our discussion of convergence below.
Suppose that \( \{X_i: i \in I\} \) is uniformly integrable and that \( X \) is a random variable with \( \E(|X|) \lt \infty \). Then \( \{X_i - X: i \in I\} \) is uniformly integrable.
Proof
Let \( Y_i = X \) for each \( i \in I \). Then \( \{Y_i: i \in I\} \) is uniformly integrable, so the result follows from the previous theorem.
Convergence
We now come to the main results, and the reason for the definition of uniform integrability in the first place. To set up the notation, suppose that \( X_n \) is a random variable for \( n \in \N_+ \) and that \( X \) is a random variable. We know that if \( X_n \to X \) as \( n \to \infty \) in mean then \( X_n \to X \) as \( n \to \infty \) in probability. The converse is also true if and only if the sequence is uniformly integrable. Here is the first half:
If \( X_n \to X \) as \( n \to \infty \) in mean, then \( \{X_n: n \in \N\} \) is uniformly integrable.
Proof
The hypothesis means that \( X_n \to X \) as \( n \to \infty \) in the vector space \( \mathscr{L}_1 \). That is, \( \E(|X_n|) \lt \infty \) for \( n \in \N_+ \), \( \E(|X|) \lt \infty \), and \( E(|X_n - X|) \to 0 \) as \( n \to \infty \). From the last section, we know that this implies that \( \E(|X_n|) \to \E(|X|) \) as \( n \to \infty \), so \( \E(|X_n|) \) is bounded in \( n \in \N \). Let \( \epsilon \gt 0 \). Then there exists \( N \in \N_+ \) such that if \( n \gt N \) then \( \E(|X_n - X|) \lt \epsilon/2 \). Since all of our variables are in \( \mathscr{L}_1 \), for each \( n \in \N_+ \) there exists \( \delta_n \gt 0 \) such that if \( A \in \mathscr{F} \) and \( \P(A) \lt \delta_n \) then \( \E(|X_n - X|; A) \lt \epsilon / 2 \). Similarly, there exists \( \delta_0 \gt 0 \) such that if \( A \in \mathscr{F} \) and \( \P(A) \lt \delta_0 \) then \( \E(|X|; A) \lt \epsilon / 2 \). Let \( \delta = \min\{\delta_n: n \in \{0, 1, \ldots, N\}\} \) so \( \delta \gt 0 \). If \( A \in \mathscr{F} \) and \( \P(A) \lt \delta \) then \[\E(|X_n|; A) = \E(|X_n - X + X|; A) \le \E(|X_n - X|; A) + \E(|X|; A), \quad n \in \N_+\] If \( n \le N \) then \( \E(|X_n - X|; A) \le \epsilon / 2 \) since \( \delta \le \delta_n \). If \( n \gt N \) then \( \E(|X_n - X|; A) \le \E(|X_n - X|) \lt \epsilon / 2 \). For all \( n \), \( E(|X|; A) \lt \epsilon / 2 \) since \( \delta \le \delta_0 \). So for all \( n \in \N_+ \), \( \E(|X_n|: A) \lt \epsilon \) and hence \( \{X_n: n \in \N_+\} \) is uniformly integrable.
Here is the more important half, known as the uniform integrability theorem :
If \( \{X_n: n \in \N_+\} \) is uniformly integrable and \( X_n \to X \) as \( n \to \infty \) in probability, then \( X_n \to X \) as \( n \to \infty \) in mean.
Proof
Since \( X_n \to X \) as \( n \to \infty \) in probability, we know that there exists a subsequence \( \left(X_{n_k}: k \in \N_+\right) \) of \( (X_n: n \in \N_+) \) such that \( X_{n_k} \to X \) as \( k \to \infty \) with probability 1. By the uniform integrability, \( \E(|X_n|) \) is bounded in \( n \in \N_+ \). Hence by Fatou's lemma \[ \E(|X|) = \E\left(\liminf_{k \to \infty} \left|X_{n_k}\right|\right) \le \liminf_{n \to \infty} \E\left(\left|X_{n_k}\right|\right) \le \limsup_{n \to \infty} \E\left(\left|X_{n_k}\right|\right) \lt \infty \] Let \( Y_n = X_n - X \) for \( n \in \N_+ \). From the corollary above , we know that \( \{Y_n: n \in \N_+\} \) is uniformly integrable, and we also know that \( Y_n \) converges to 0 as \( n \to \infty \) in probability. Hence we need to show that \( Y_n \to 0 \) as \( n \to \infty \) in mean. Let \( \epsilon \gt 0 \). By uniform integrability, there exists \( \delta \gt 0 \) such that if \( A \in \mathscr{F} \) and \( \P(A) \lt \delta \) then \( \E(|Y_n|: A) \lt \epsilon / 2 \) for all \( n \in \N \). Since \( Y_n \to 0 \) as \( n \to \infty \) in probability, there exists \( N \in \N_+ \) such that if \( n \gt N \) then \( \P(|Y_n| \gt \epsilon / 2) \lt \delta \). Hence if \( n \gt N \) then \[ \E(|Y_n|) = \E(|Y_n|; |Y_n| \le \epsilon / 2) + \E(|Y_n|; |Y_n| \gt \epsilon / 2) \lt \epsilon / 2 + \epsilon / 2 = \epsilon \] Hence \( Y_n \to 0 \) as \( n \to \infty \) in mean.
As a corollary, recall that if \( X_n \to X \) as \( n \to \infty \) with probability 1, then \( X_n \to X \) as \( n \to \infty \) in probability. Hence if \( \bs X = \{X_n: n \in \N_+\} \) is uniformly integrable then \( X_n \to X \) as \( n \to \infty \) in mean.
Examples
Our first example shows that bounded \( \mathscr{L}_1 \) norm is not sufficient for uniform integrability.
Suppose that \( U \) is uniformly distributed on the interval \( (0, 1) \) (so \( U \) has the standard uniform distribution ). For \( n \in \N_+ \), let \( X_n = n \bs{1}(U \le 1 / n) \). Then
- \( \E(|X_n|) = 1 \) for all \( n \in \N_+ \)
- \( \E(|X_n|; |X_n| \gt x) = 1 \) for \( x \gt 0 \), \( n \in \N_+ \) with \( n \gt x \)
Proof
First note that \( |X_n| = X_n \) since \( X_n \ge 0 \).
- By definition, \( \E(X_n) = n \P(U \le 1 / n) = n / n = 1 \) for \( n \in \N_+ \).
- If \( n \gt x \gt 0 \) then \( X_n \gt x \) if and only if \( X_n = n \) if and only if \( U \le 1/n \). Hence \( \E(X_n; X_n \gt x) = n \P(U \le 1/n) = 1 \) as before.
By part (b), \( \E(|X_n|; |X_n| \gt x) \) does not converge to 0 as \( x \to \infty \) uniformly in \( n \in \N_+ \), so \( \bs X = \{X_n: n \in \N_+\} \) is not uniformly integrable.
The next example gives an important application to conditional expected value. Recall that if \( X \) is a random variable with \( \E(|X|) \lt \infty \) and \( \mathscr{G} \) is a sub \( \sigma \)-algebra of \( \mathscr{F} \) then \( \E(X \mid \mathscr{G}) \) is the expected value of \( X \) given the information in \( \mathscr{G} \), and is the \( \mathscr{G} \)-measurable random variable closest to \( X \) in a sense. Indeed if \( X \in \mathscr{L}_2(\mathscr{F}) \) then \( \E(X \mid \mathscr{G}) \) is the projection of \( X \) onto \( \mathscr{L}_2(\mathscr{G}) \). The collection of all conditional expected values of \( X \) is uniformly integrable:
Suppose that \( X \) is a real-valued random variable with \( \E(|X|) \lt \infty \). Then \( \{\E(X \mid \mathscr{G}): \mathscr{G} \text{ is a sub }\sigma\text{-algebra of } \mathscr{F}\}\) is uniformly integrable.
Proof
We use the characterization above . Let \( \mathscr{G} \) be a sub \( \sigma \)-algebra of \( \mathscr{F} \). Recall that \( \left|\E(X \mid \mathscr{G})\right| \le \E(|X| \mid \mathscr{G})\) and hence \[ \E[|\E(X \mid \mathscr{G})|] \le \E[\E(|X| \mid \mathscr{G})] = \E(|X|) \] So property (a) holds. Next let \( \epsilon \gt 0 \). Since \( \E(|X|) \lt \infty \), there exists \( \delta \gt 0 \) such that if \( A \in \mathscr{F} \) and \( \P(A) \lt \delta \) then \( \E(|X|; A) \lt \epsilon \). Suppose that \( A \in \mathscr{G} \) with \( \P(A) \lt \delta \). Then \(|\E(X \mid \mathscr{G})| \bs{1}_A \le \E(|X| \mid \mathscr{G}) \bs{1}_A\) so \[ \E[|\E(X \mid \mathscr{G})|; A] \le \E[\E(|X| \mid \mathscr{G}); A] = \E[\E(|X| \bs{1}_A \mid \mathscr{G}] = \E(|X|; A) \lt \epsilon \] So condition (b) holds. Note that the first equality in the displayed equation holds since \( A \in \mathscr{G} \).
Note that the collection of sub \( \sigma \)-algebras of \( \mathscr{F} \), and so also the collection of conditional expected values above, might well be uncountable. The conditional expected values range from \( \E(X) \), when \( \mathscr{G} = \{\Omega, \emptyset\} \) to \( X \) itself, when \( \mathscr{G} = \mathscr{F} \).