3.12: General Measures

Last updated
Save as PDF

Page ID: 10152

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \renewcommand{\P}{\mathbb{P}} \) \( \newcommand{\C}{\mathbb{C}} \) \( \newcommand{\R}{\mathbb{R}} \) \( \newcommand{\Q}{\mathbb{Q}} \) \( \newcommand{\N}{\mathbb{N}} \) \( \newcommand{\bs}{\boldsymbol} \)

Basic Theory

Our starting point in this section is a measurable space \( (S, \mathscr{S}) \). That is, \( S \) is a set and \( \mathscr{S} \) is a \( \sigma \)-algebra of subsets of \( S \). So far, we have only considered positive measures on such spaces. Positive measures have applications, as we know, to length, area, volume, mass, probability, counting, and similar concepts of the nonnegative size of a set. Moreover, we have defined the integral of a measurable function \( f: S \to \R \) with respect to a positive measure, and we have studied properties of the integral.

Definition

But now we will consider measures that can take negative values as well as positive values. These measures have applications to electric charge, monetary value, and other similar concepts of the content of a set that might be positive or negative. Also, this generalization will help in our study of density functions in the next section. The definition is exactly the same as for a positive measure, except that values in \( \R^* = \R \cup \{-\infty, \infty\} \) are allowed.

A measure on \( (S, \mathscr{S}) \) is a function \( \mu: \mathscr{S} \to \R^* \) that satisfies the following properties:

\( \mu(\emptyset) = 0 \)
If \( \{A_i: i \in I\} \) is a countable, disjoint collection of sets in \( \mathscr{S} \) then \( \mu\left(\bigcup_{i \in I} A_i\right) = \sum_{i \in I} \mu(A_i) \)

As before, (b) is known as countable additivity and is the critical assumption: the measure of a set that consists of a countable number of disjoint pieces is the sum of the measures of the pieces. Implicit in the statement of this assumption is that the sum in (b) exists for every countable disjoint collection \( \{A_i: i \in I\} \). That is, either the sum of the positive terms is finite or the sum of the negative terms is finite. In turn, this means that the order of the terms in the sum does not matter (a good thing, since there is no implied order). The term signed measure is used by many, but we will just use the simple term measure, and add appropriate adjectives for the special cases. Note that if \( \mu(A) \ge 0 \) for all \( A \in \mathscr{S} \), then \( \mu \) is a positive measure, the kind we have already studied (and so the new definition really is a generalization). In this case, the sum in (b) always exists in \( [0, \infty] \). If \( \mu(A) \in \R \) for all \( A \in \mathscr{S} \) then \( \mu \) is a finite measure. Note that in this case, the sum in (b) is absolutely convergent for every countable disjoint collection \( \{A_i: i \in I\} \). If \( \mu \) is a positive measure and \( \mu(S) = 1 \) then \( \mu \) is a probability measure, our favorite kind. Finally, as with positive measures, \( \mu \) is \( \sigma \)-finite if there exists a countable collection \( \{A_i: i \in I\} \) of sets in \( \mathscr{S} \) such that \( S = \bigcup_{i \in I} A_i \) and \( \mu(A_i) \in \R \) for \( i \in I \).

Basic Properties

We give a few simple properties of general measures; hopefully many of these will look familiar. Throughout, we assume that \( \mu \) is a measure on \( (S, \mathscr{S}) \). Our first result is that although \( \mu \) can take the value \( \infty \) or \( -\infty \), it turns out that it cannot take both of these values.

Either \( \mu(A) \gt -\infty \) for all \( A \in \mathscr{S} \) or \( \mu(A) \lt \infty \) for all \( A \in \mathscr{S} \).

Proof

Suppose that there exist \( A, \, B \in \mathscr{S} \) with \( \mu(A) = \infty \) and \( \mu(B) = -\infty \). Then \( A = (A \cap B) \cup (A \setminus B) \) and the sets in the union are disjoint. By the additivity assumption, \( \mu(A) = \mu(A \cap B) + \mu(A \setminus B) \). Similarly, \( \mu(B) = \mu(A \cap B) + \mu(B \setminus A) \). The only way that both of these equations can make sense is for \( \mu(A \setminus B) = \infty \), \( \mu(B \setminus A) = -\infty \), and \( \mu(A \cap B) \in \R \). But then \( \mu(A \bigtriangleup B) = \mu(A \setminus B) + \mu(B \setminus A) \) is undefined, and so we have a contradiction.

We will say that two measures are of the same type if neither takes the value \( \infty \) or if neither takes the value \( -\infty \). Being of the same type is trivially an equivalence relation on the collection of measures on \( (S, \mathscr{S}) \).

The difference rule holds, as long as the sets have finite measure:

Suppose that \( A, \, B \in \mathscr{S} \). If \( \mu(B) \in \R \) then \( \mu(B \setminus A) = \mu(B) - \mu(A \cap B) \).

Proof

Note that \( B = (A \cap B) \cup (B \setminus A) \) and the sets in the union are disjoint. Thus \( \mu(B) = \mu(A \cap B) + \mu(B \setminus A) \). Since \( \mu(B) \in \R \), we must have \( \mu(A \cap B) \in \R \) and \( \mu(B \setminus A) \in \R \) also, and then the difference rule holds by subtraction.

The following corollary is the difference rule for subsets, and will be needed below.

Suppose that \( A, \, B \in \mathscr{S} \) and \( A \subseteq B \). If \( \mu(B) \in \R \) then \( \mu(A) \in \R\) and \( \mu(B \setminus A) = \mu(B) - \mu(A) \).

Proof

Note that \( B = A \cup (B \setminus A) \) and the sets in the union are disjoint. Thus \( \mu(B) = \mu(A) + \mu(B \setminus A) \). Since \( \mu(B) \in \R \), we must have \( \mu(A) \in \R \) and \( \mu(B \setminus A) \in \R \) also, and then the difference rule holds by subtraction.

As a consequence, suppose that \( A, \, B \in \mathscr{S} \) and \( A \subseteq B \). If \( \mu(A) = \infty \), then by the infinity rule we cannot have \( \mu(B) = -\infty \) and by the difference rule we cannot have \( \mu(B) \in \R \), so we must have \( \mu(B) = \infty \). Similarly, if \( \mu(A) = -\infty \) then \( \mu(B) = -\infty \). The inclusion-exclusion rules hold for general measures, as long as the sets have finite measure.

Suppose that \(A_i \in \mathscr{S}\) for each \(i \in I\) where \(\#(I) = n\), and that \( \mu(A_i) \in \R \) for \( i \in I \). Then

\[\mu \left( \bigcup_{i \in I} A_i \right) = \sum_{k = 1}^n (-1)^{k - 1} \sum_{J \subseteq I, \; \#(J) = k} \mu \left( \bigcap_{j \in J} A_j \right)\]

Proof

For \( n = 2 \), note that \( A_1 \cup A_2 = A_1 \cup (A_2 \setminus A_1) \) and the sets in the last union are disjoint. By the additivity axiom and the difference rule (3), \[ \mu(A_1 \cup A_2) = \mu(A_1) + \mu(A_2 \setminus A_1) = \mu(A_1) + \mu(A_2) - \mu(A_1 \cap A_2) \] The general result then follows by induction, just like the proof for probability measures.

The continuity properties hold for general measures. Part (a) is the continuity property for increasing sets, and part (b) is the continuity property for decreasing sets.

Suppose that \( A_n \in \mathscr{S} \) for \( n \in \N_+ \).

If \( A_n \subseteq A_{n+1} \) for \( n \in \N_+ \) then \( \lim_{n \to \infty} \mu(A_n) = \mu\left(\bigcup_{i=1}^\infty A_i\right) \).
If \( A_{n+1} \subseteq A_n \) for \( n \in \N_+ \) and \( \mu(A_1) \in \R \), then \( \lim_{n \to \infty} \mu(A_n) = \mu\left(\bigcap_{i=1}^\infty A_i\right) \)

Proof

The proofs are almost the same as for positive measures, except for technicalities involving \( \infty \) and \( -\infty \).

Let \( A = \bigcup_{i=1}^\infty A_i \). From the infinity rule and the difference rule, if \( \mu(A_m) = \infty \) (respectively \( -\infty \)) for some \( m \in \N_+ \), then \( \mu(A_n) = \infty \) (\( -\infty \)) for \( n \ge m \) and \( \mu(A) = \infty \) (\( -\infty \)), so the result trivially holds. Thus, assume that \( \mu(A_n) \in \R \) for all \( n \in \N_+ \). Let \( B_1 = A_1 \) and let \( B_i = A_i \setminus A_{i-1} \) for \( i \in \{2, 3, \ldots\} \). Then \( \{B_i: i \in \N_+\} \) is a disjoint collection of sets and also has union \( A \). Moreover, from the difference rule, \( \mu(B_i) = \mu(A_{i+1}) - \mu(A_i) \) for \( i \in \{2, 3, \ldots\} \). Thus \[ \mu(A) = \sum_{i=1}^\infty \mu(B_i) = \lim_{n \to \infty} \sum_{i=1}^n \mu(B_i) = \lim_{n \to \infty} \left(\mu(A_1) + \sum_{i=2}^n [\mu(A_i) - \mu(A_{i-1})]\right) = \lim_{n \to \infty} \mu(A_n) \]
Let \( C_n = A_1 \setminus A_n \) for \( n \in \N_+ \). Then \( C_n \subseteq C_{n+1} \) for \( n \in \N_+ \) and \( \bigcup_{i=1}^\infty C_i = A_1 \setminus \bigcap_{i=1}^\infty A_i \). Part (a) applies, so \( \lim_{n \to \infty} \mu(C_n) = \mu\left(\bigcup_{i=1}^\infty C_i \right) \). But by the difference rule, \( \mu(C_n) = \mu(A_1) - \mu(A_n) \) for \( n \in \N_+ \) and \( \mu\left(\bigcup_{i=1}^\infty C_i\right) = \mu(A_1) - \mu\left(\bigcap_{i=1}^\infty A_i\right) \). All of these are real numbers, so subtracting \( \mu(A_1) \) gives the result.

Recall that a positive measure is an increasing function, relative to the subset partial order on \( \mathscr{S} \) and the ordinary order on \( [0, \infty] \), and this property follows from the difference rule. But for general measures, the increasing property fails, and so do other properties that flow from it, including the subadditive property (Boole's inequality in probability) and the Bonferroni inequalities.

Constructions

It's easy to construct general measures as differences of positive measures.

Suppose that \( \mu \) and \( \nu \) are positive measures on \( (S, \mathscr{S}) \) and that at least one of them is finite. Then \( \delta = \mu - \nu \) is a measure.

Proof

Suppose that \( \nu \) is a finite measure; the proof when \( \mu \) is finite is similar. First, \( \delta(\emptyset) = \mu(\emptyset) - \nu(\emptyset) = 0 \). Suppose that \( \{A_i: i \in I\} \) is a countable, disjoint collection of sets in \( \mathscr{S} \) and let \( A = \bigcup_{i \in I} A_i \). Then \[ \delta(A) = \mu(A) - \nu(A) = \sum_{i \in I} \mu(A_i) - \sum_{i \in I} \nu(A_i) \] Since \( \nu(A_i) \lt \infty \) for \( i \in I \), we can combine terms to get \[ \delta(A) = \sum_{i \in I} [\mu(A_i) - \nu(A_i)] = \sum_{i \in I} \delta(A_i) \]

The collection of measures on our space is closed under scalar multiplication.

If \( \mu \) is a measure on \( (S, \mathscr{S}) \) and \( c \in \R \), then \( c \mu \) is a measure on \( (S, \mathscr{S}) \)

Proof

First, \( (c \mu)(\emptyset) = c \mu(\emptyset) = c 0 = 0 \). Next suppose that \( \{A_i: i \in I\} \) is a countable, disjoint collection of sets in \( \mathscr S \). Then \[(c \mu) \left(\bigcup_{i \in I} A_i \right) = c \mu \left(\bigcup_{i \in I} A_i\right) = c \sum_{i \in I} \mu(A_i) = \sum_{i \in I} c \mu(A_i) = \sum_{i \in I} (c \mu)(A_i)\] The last step is the important one, and holds since the sum exists.

If \( \mu \) is a finite measure, then so is \( c \mu \) for \( c \in \R \). If \( \mu \) is not finite then \( \mu \) and \( c \mu \) are of the same type if \( c \gt 0 \) and are of opposite types if \( c \lt 0 \). We can add two measures to get another measure, as long as they are of the same type. In particular, the collection of finite measures is closed under addition as well as scalar multiplication, and hence forms a vector space.

If \( \mu \) and \( \nu \) are measures on \( (S, \mathscr{S}) \) of the same type then \( \mu + \nu \) is a measure on \( (S, \mathscr{S}) \).

Proof

First, \( (\mu + \nu)(\emptyset) = \mu(\emptyset) + \nu(\emptyset) = 0 + 0 = 0 \). Next suppose that \( \{A_i: i \in I\} \) is a countable, disjoint collection of sets in \( \mathscr S \). Then \begin{align*} (\mu + \nu) \left(\bigcup_{i \in I} A_i \right) & = \mu \left(\bigcup_{i \in I} A_i\right) + \nu\left(\bigcup_{i \in I} A_i\right)\\ & = \sum_{i \in I} \mu(A_i) + \sum_{i \in I} \nu(A_i) = \sum_{i \in I} [\mu(A_i) + \nu(A_i) = \sum_{i \in I} (\mu + \nu)(A_i) \end{align*} The sums can be combined because the measures are of the same type. That is, either the sum of all of the positive terms is finite or the sum of all the negative terms is finite. In short, we don't have to worry about the dreaded indeterminate form \( \infty - \infty \).

Finally, it is easy to explicitly construct measures on a \( \sigma \)-algebra generated by a countable partition. Such \( \sigma \)-algebras are important for counterexamples and to gain insight, and also because many \( \sigma \)-algebras that occur in applications can be constructed from them.

Suppose that \( \mathscr{A} = \{A_i: i \in I\} \) is a countable partition of \( S \) into nonempty sets, and that \( \mathscr{S} = \sigma(\mathscr{A}) \). For \( i \in I \), define \( \mu(A_i) \in \R^* \) arbitrarily, subject only to the condition that the sum of the positive terms is finite, or the sum of the negative terms is finite. For \( A = \bigcup_{j \in J} A_j \) where \( J \subseteq I \), define \[ \mu(A) = \sum_{j \in J} \mu(A_j) \] Then \( \mu \) is a measure on \( (S, \mathscr{S}) \).

Proof

Recall that every \( A \in \mathscr{S} \) has a unique representation of the form \( A = \bigcup_{j \in J} A_j \) where \( J \subseteq I \).

\( J = \emptyset \) in the representation gives \( A = \emptyset \). The sum over an empty index set is 0, so \( \mu(\emptyset) = 0 \).
Suppose that \( \{B_k: k \in K\} \) is a countable, disjoint collection of events in \( \mathscr{S} \). Then for each \( k \in K \) there exists \( J_k \subseteq I \) and \( \left\{A^k_j: j \in J_k\right\} \subseteq \mathscr{A} \) such that \( B_k = \bigcup_{j \in J_k} A^k_j \). Hence \[ \mu\left(\bigcup_{k \in K} B_k\right) = \mu\left(\bigcup_{k \in K} \bigcup_{j \in J_k} A^k_j\right) = \sum_{k \in k}\sum_{j \in J_k} \mu(A^k_j) = \sum_{k \in K} \mu(B_k) \] The fact that either the sum of all positive terms is finite or the sum of all the negative terms is finite means that we do not have to worry about the order of summation.

Positive, Negative, and Null Sets

To understand the structure of general measures, we need some basic definitions and properties. As before, we assume that \( \mu \) is a measure on \( (S, \mathscr{S}) \).

Definitions

\( A \in \mathscr{S} \) is a positive set for \( \mu \) if \( \mu(B) \ge 0 \) for every \( B \in \mathscr{S} \) with \( B \subseteq A \).
\( A \in \mathscr{S} \) is a negative set for \( \mu \) if \( \mu(B) \le 0 \) for every \( B \in \mathscr{S} \) with \( B \subseteq A \).
\( A \in \mathscr{S} \) is a null set for \( \mu \) if \( \mu(B) = 0 \) for every \( B \in \mathscr{S} \) with \( B \subseteq A \).

Note that positive and negative are used in the weak sense (just as we use the terms increasing and decreasing in this text). Of course, if \( \mu \) is a positive measure, then every \( A \in \mathscr{S} \) is positive for \( \mu \), and \( A \in \mathscr{S} \) is negative for \( \mu \) if and only if \( A \) is null for \( \mu \) if and only if \( \mu(A) = 0 \). For a general measure, \( A \in \mathscr{S} \) is both positive and negative for \( \mu \) if and only if \( A \) is null for \( \mu \). In particular, \( \emptyset \) is null for \( \mu \). A set \( A \in \mathscr{S} \) is a support set for \( \mu \) if and only if \( A^c \) is a null set for \( \mu \). A support set is a set where the measure lives in a sense. Positive, negative, and null sets for \( \mu \) have a basic inheritance property that is essentially equivalent to the definition.

Suppose \( A \in \mathscr{S} \).

If \( A \) is positive for \( \mu \) then \( B \) is positive for \( \mu \) for every \( B \in \mathscr{S} \) with \( B \subseteq A \).
If \( A \) is negative for \( \mu \) then \( B \) is negative for \( \mu \) for every \( B \in \mathscr{S} \) with \( B \subseteq A \).
If \( A \) is null for \( \mu \) then \( B \) is null for \( \mu \) for every \( B \in \mathscr{S} \) with \( B \subseteq A \).

The collections of positive sets, negative sets, and null sets for \( \mu \) are closed under countable unions.

Suppose that \( \{A_i: i \in I\} \) is a countable collection of sets in \( \mathscr{S} \).

If \( A_i \) is positive for \( \mu \) for \( i \in I \) then \( \bigcup_{i \in I} A_i \) is positive for \( \mu \).
If \( A_i \) is negative for \( \mu \) for \( i \in I \) then \( \bigcup_{i \in I} A_i \) is negative for \( \mu \).
If \( A_i \) is null for \( \mu \) for \( i \in I \) then \( \bigcup_{i \in I} A_i \) is null for \( \mu \).

Proof

We will prove (a); the proofs for (b) and (c) are analogous. Without loss of generality, we can suppose that \( I = \N_+ \). Let \( A = \bigcup_{n=1}^\infty A_n \). Now let \( B_1 = A_1 \) and \( B_n = A_n \setminus \left(\bigcup_{i=1}^{n-1} A_i\right) \) for \( n \in \{2, 3, \ldots\} \). Them \( \{B_n: n \in \N_+\} \) is a countable, disjoint collection in \( \mathscr{S} \), and \( \bigcup_{n=1}^\infty B_n = A \). If \( C \subseteq A \) then \( C = \bigcup_{n=1}^\infty (C \cap B_n) \) and the sets in this union are disjoint. Hence by additivity, \( \mu(C) = \sum_{=1}^\infty \mu(C \cap B_n) \). But \( C \cap B_n \subseteq B_n \subseteq A_n \) so \( \mu(C \cap B_n) \ge 0 \). Hence \( \mu(C) \ge 0 \).

It's easy to see what happens to the positive, negative, and null sets when a measure is multiplied by a non-zero constant.

Suppose that \( \mu \) is a measure on \( (S, \mathscr{S}) \), \( c \in \R \), and \( A \in \mathscr{S} \).

If \( c \gt 0 \) then \( A \) is positive (negative) for \( \mu \) if and only if \( A \) is positive (negative) for \( c \mu \).
If \( c \lt 0 \) then \( A \) is positive (negative) for \( \mu \) if and only if \( A \) is negative (positive) for \( c \mu \).
If \( c \ne 0 \) then \( A \) is null for \( \mu \) if and only if \( A \) is null for \( c \mu \)

Positive, negative, and null sets are also preserved under countable sums, assuming that the measures make senes.

Suppose that \( \mu_i \) is a measure on \( (S, \mathscr{S}) \) for each \( i \) in a countable index set \( I \), and that \( \mu = \sum_{i \in I} \mu_i \) is a well-defined measure on \( (S, \mathscr{S}) \). Let \( A \in \mathscr{S} \).

If \( A \) is positive for \( \mu_i \) for every \( i \in I \) then \( A \) is positive for \( \mu \).
If \( A \) is negative for \( \mu_i \) for every \( i \in I \) then \( A \) is negative for \( \mu \).
If \( A \) is null for \( \mu_i \) for every \( i \in I \) then \( A \) is null for \( \mu \).

In particular, note that \( \mu = \sum_{i \in I} \mu_i \) is a well-defined measure if \( \mu_i \) is a positive measure for each \( i \in I \), or if \( I \) is finite and \( \mu_i \) is a finite measure for each \( i \in I \). It's easy to understand the positive, negative, and null sets for a \( \sigma \)-algebra generated by a countable partition.

Suppose that \( \mathscr{A} = \{A_i: i \in I\} \) is a countable partition of \( S \) into nonempty sets, and that \( \mathscr{S} = \sigma(\mathscr{A}) \). Suppose that \( \mu \) is a measure on \( (S, \mathscr{S}) \). Define \[ I_+ = \{i \in I: \mu(A_i) \gt 0\}, \; I_- = \{i \in I: \mu(A_i) \lt 0\}, \; I_0 = \{i \in I: \mu(A_i) = 0\} \] Let \( A \in \mathscr{S} \), so that \( A = \bigcup_{j \in J} A_j \) for some \( J \subseteq I \) (and this representation is unique). Then

\( A \) is positive for \( \mu \) if and only if \( J \subseteq I_+ \cup I_0 \).
\( A \) is negative for \( \mu \) if and only if \( J \subseteq I_- \cup I_0 \).
\( A \) is null for \( \mu \) if and only if \( J \subseteq I_0 \).

The Hahn Decomposition

The fundamental results in this section and the next are two decomposition theorems that show precisely the relationship between general measures and positive measures. First we show that if a set has finite, positive measure, then it has a positive subset with at least that measure.

If \( A \in \mathscr{S} \) and \( 0 \le \mu(A) \lt \infty \) then there exists \( P \in \mathscr{S} \) with \( P \subseteq A \) such that \( P \) is positive for \( \mu \) and \( \mu(P) \ge \mu(A) \).

Proof

The proof is recursive, and works by successively removing sets of negative measure from \( A \). For the initialization step, let \( A_0 = A \). Then trivially, \( A_0 \subseteq A \) and \( \mu(A_0) \ge \mu(A) \). For the recursive step, suppose that \( A_n \in \mathscr{S} \) has been defined with \( A_n \subseteq A \) and \( \mu(A_n) \ge \mu(A) \). If \( A_n \) is positive for \( \mu \), let \( P = A_n \). Otherwise let \( a_n = \inf\{\mu(B): B \in \mathscr{S}, B \subseteq A_n, \mu(B) \lt 0\} \). Note that since \( A_n \) is not positive for \( \mu \), the set in the infimum is nonempty and hence \( a_n \lt 0 \) (and possibly \( -\infty \)). Let \( b_n = a_n / 2 \) if \( -\infty \lt a_n \lt 0 \) and let \( b_n = -1 \) if \( a_n = -\infty \). Since \( b_n \gt a_n \), by definition of the infimum, there exists \( B_n \subseteq A \) with \(\mu(B_n) \le b_n \). Let \( A_{n+1} = A_n \setminus B_n \). Then \( A_{n+1} \subseteq A_n \subseteq A \) and \[ \mu(A_{n+1}) = \mu(A_n) - \mu(B_n) \ge \mu(A_n) - b_n \ge \mu(A_n) \ge \mu(A) \] Now, if the recursive process terminates after a finite number of steps, \( P \) is well defined and is positive for \( \mu \). Otherwise, we have a disjoint sequence of sets \( (B_1, B_2, \ldots) \). Let \( P = A \setminus \left(\bigcup_{i=1}^\infty B_i\right) \). Then \( P \subseteq A \), and by countable additivity and the difference rule, \[ \mu(P) = \mu(A) - \sum_{n=1}^\infty \mu(B_n) \ge \mu(A) - \sum_{n=1}^\infty b_n \ge \mu(A) \] Suppose that \( B \subseteq P \) and \( \mu(B) \lt 0 \). Then \( B \subseteq A_n \) and by definition, \( a_n \le \mu(B) \) for every \( n \in \N_+ \). It follows that \( b_n \le \frac{1}{2} \mu(B) \) or \( b_n = -1 \) for every \( n \in \N_+ \). Hence \( \sum_{n=1}^\infty b_n = -\infty \) and therefore \( \mu(P) = \infty \), a contradiction since \( \mu(A) \lt \infty \). Hence we must have \( \mu(B) \ge 0 \) and thus \( P \) is positive for \( \mu \).

The assumption that \( \mu(A) \lt \infty \) is critical; a counterexample is given below. Our first decomposition result is the Hahn decomposition theorem, named for the Austrian mathematician Hans Hahn. It states that \( S \) can be partitioned into a positive set and a negative set, and this decomposition is essentially unique.

Hahn Decomposition Theorem. There exists \( P \in \mathscr{S} \) such that \( P \) is positive for \( \mu \) and \( P^c \) is negative for \( \mu \). The pair \( (P, P^c) \) is a Hahn decomposition of \( S \). If \( (Q, Q^c) \) is another Hahn decomposition, then \( P \bigtriangleup Q \) is null for \( \mu \).

Proof

Suppose first that \( \mu \) does not take the value \( \infty \). As with the previous result, the proof is recursive. For the initialization step, let \( P_0 = \emptyset \). Then trivially, \( P_0 \) is positive for \( \mu \). For the recursive step, suppose that \( P_n \in \mathscr{S} \) is positive for \( \mu \). If \( P_n^c \) is negative for \( \mu \), let \( P = P_n \). Otherwise let \( a_n = \sup\{\mu(A): A \in \mathscr{S}, A \subseteq P_n^c\} \). Since \( P_n^c \) is not negative for \( \mu \), it follows that \( a_n \gt 0 \) (and possibly \( \infty \)). Let \( b_n = a_n / 2 \) if \( 0 \lt a_n \lt \infty \) and \( b_n = 1 \) if \( a_n = \infty \). Then \( b_n \lt a_n \) so there exists \( B_n \in \mathscr{S} \) with \( B_n \subseteq P_n^c \) and \( \mu(B_n) \ge b_n \gt 0 \). By the previous lemma, there exists \( A_n \in \mathscr{S} \) with \( A_n \subseteq B_n \), \( A_n \) positive for \( \mu \), and \( \mu(A_n) \ge \mu(B_n) \). Let \( P_{n+1} = P_n \cup A_n \). Then \( P_{n+1} \in \mathscr{S} \) is positive for \( \mu \).

If the recursive process ends after a finite number of steps, then \( P \) is well-defined and \( (P, P^c) \) is a Hahn decomposition. Otherwise we generate an infinite sequence \( (A_1, A_2, \ldots) \) of disjoint sets in \( \mathscr{S} \), each positive for \( \mu \). Let \( P = \bigcup_{n=1}^\infty A_n\). Then \( P \in \mathscr{S} \) is positive for \( \mu \) by the closure result above. Let \( A \subseteq P^c \). If \( \mu(A) \gt 0 \) then \( \mu(A) \le a_n \) for every \( n \in \N_+ \). Hence \( b_n \ge \frac{1}{2} \mu(A) \) or \( b_n = 1 \) for every \( n \in \N_+ \). But then \[ \mu(P) = \sum_{n=1}^\infty \mu(A_n) \ge \sum_{n=1}^\infty \mu(B_n) \ge \sum_{n=1}^\infty b_n = \infty \] a contradiction. Hence \( \mu(A) \le 0 \) so \( P^c \) is negative for \( \mu \) and thus \( (P, P^c) \) is a Hahn decomposition.

Suppose that \( (Q, Q^c) \) is another Hahn decomposition of \( S \). Then \( P \cap Q^c \) and \( Q \cap P^c \) are both positive and negative for \( \mu \) and hence are null for \( \mu \). Hence \( P \bigtriangleup Q = (P \cap Q^c) \cup (Q \cap P^c) \) is null for \( \mu \).

Finally, suppose that \( \mu \) takes the value \( \infty \). Then \( \mu \) does not take the value \( -\infty \) by the infinity rule and hence \( -\mu \) does not take the value \( \infty \). By our proof so far, there exists a Hahn decomposition \( (P, P^c) \) for \( -\mu \) that is essentially unique. But then \( (P^c, P) \) is a Hahn decomposition for \( \mu \).

It's easy to see the Hahn decomposition for a measure on a \( \sigma \)-algebra generated by a countable partition.

Suppose that \( \mathscr{A} = \{A_i: i \in I\} \) is a countable partition of \( S \) into nonempty sets, and that \( \mathscr{S} = \sigma(\mathscr{A}) \). Suppose that \( \mu \) is a measure on \( (S, \mathscr{S}) \). Let \( I_+ = \{i \in I: \mu(A_i) \gt 0\} \) and \( I_0 = \{ i \in I: \mu(A_i) = 0 \). Then \( (P, P^c) \) is a Hahn decomposition of \( \mu \) if and only if the positive set \( P \) has the form \( P = \bigcup_{j \in J} A_j \) where \( J = I_+ \cup K \) and \( K \subseteq I_0 \).

The Jordan Decomposition

The Hahn decomposition leads to another decomposition theorem called the Jordan decomposition theorem, named for the French mathematician Camille Jordan. This one shows that every measure is the difference of positive measures. Once again we assume that \( \mu \) is a measure on \( (S, \mathscr{S}) \).

Jordan Decomposition Theorem. The measure \( \mu \) can be written uniquely in the form \( \mu = \mu_+ - \mu_- \) where \( \mu_+ \) and \( \mu_- \) are positive measures, at least one finite, and with the property that if \( (P, P^c) \) is any Hahn decomposition of \( S \), then \( P^c \) is a null set of \( \mu_+ \) and \( P \) is a null set of \( \mu_- \). The pair \( (\mu_+, \mu_-) \) is the Jordan decomposition of \( \mu \).

Proof

Let \( (P, P^c) \) be a Hahn decomposition of \( S \) relative to \( \mu \). Define \( \mu_+(A) = \mu(A \cap P) \) and \( \mu_-(A) = -\mu(A \cap P^c) \) for \( A \in \mathscr{S} \). Then \( \mu_+ \) and \( \mu_- \) are positive measures and \( \mu = \mu_+ - \mu_- \). Moreover, since \( \mu \) cannot take both \( \infty \) and \( -\infty \) as values by the infinity rule, one of these two positive measures is finite.

Suppose that \( (Q, Q^c) \) is an arbitrary Hahn decomposition. If \( A \subseteq Q^c \), then \( \mu_+(A) = \mu(P \cap A) = 0 \) since \( P \cap Q^c \) is a null set of \( \mu \) by the Hahn decomposition theorem. Similarly if \( A \subseteq Q \) then \(\mu_-(A) = \mu(P^c \cap A) = 0 \) since \( P^c \cap Q \) is a null set of \( \mu \).

Suppose that \( \mu = \nu_+ - \nu_- \) is another decomposition with the same properties. If \( A \in \mathscr{S} \) then \( \mu_+(A) = \mu(A \cap P) = [\nu_+(A \cap P) - \nu_-(A \cap P)] = \nu_+(A \cap P)] \). But also \( \nu_+(A) = \nu_+(A \cap P) + \nu_+(A \cap P^c) = \nu_+(A \cap P) \). Hence \( \nu_+ = \mu_+ \) and therefore also \( \nu_- = \mu_- \).

The Jordan decomposition leads to an important set of new definitions.

Suppose that \( \mu \) has Jordan decomposition \( \mu = \mu_+ - \mu_- \).

The positive measure \( \mu_+ \) is called the positive variation measure of \( \mu \).
The positive measure \( \mu_- \) is called the negative variation measure of \( \mu \).
The positive measure \( \left| \mu \right| = \mu_+ + \mu_- \) is called the total variation measure of \( \mu \).
\( \| \mu \| = \left|\mu\right|(S) \) is the total variation of \( \mu \).

Note that, in spite of the similarity in notation, \( \mu_+(A) \) and \( \mu_-(A) \) are not simply the positive and negative parts of the (extended) real number \( \mu(A) \), nor is \( \left| \mu \right|(A) \) the absolute value of \( \mu(A) \). Also, be careful not to confuse the total variation of \( \mu \), a number in \( [0, \infty] \), with the total variation measure. The positive, negative, and total variation measures can be written directly in terms of \( \mu \).

For \( A \in \mathscr{S} \),

\( \mu_+(A) = \sup\{\mu(B): B \in \mathscr{S}, B \subseteq A\} \)
\( \mu_-(A) = -\inf\{\mu(B): B \in \mathscr{S}, B \subseteq A\}\)
\( \left| \mu(A) \right| = \sup\left\{ \sum_{i \in I} \mu(A_i): \{A_i: i \in I\} \text{ is a finite, measurable partition of } A \right\}\)
\( \left\| \mu \right\| = \sup\left\{ \sum_{i \in I} \mu(A_i): \{A_i: i \in I\} \text{ is a finite, measurable partition of } S \right\}\)

The total variation measure is related to sum and scalar multiples of measures in a natural way.

Suppose that \( \mu \) and \( \nu \) are measures of the same type and that \( c \in \R \). Then

\( \left| \mu \right| = 0 \) if and only if \( \mu = 0 \) (the zero measure).
\( \left| c \mu \right| = \left|c\right| \left| \mu \right| \)
\( \left| \mu + \nu \right| \le \left| \mu \right| + \left| \nu \right| \)

Proof

Since \( \mu_+ \), \( \mu_- \) and \( |\mu| = \mu_+ + \mu_- \) are positive measures, \( |\mu| = 0 \) if and only if \( \mu_+ = \mu_- = 0 \) if and only if \( \mu = 0 \).
If \( c \gt 0 \) then \( (c \mu)_+ = c \mu \) and \( (c \mu)_- = c \mu_- \). If \( c \lt 0 \) then \( (c \mu)_+ = -c \mu_- \) and \( (c \mu)_- = - c \mu_+ \). Of course, if \( c = 0 \) then \( (c \mu)_+ = (c \mu)_- = 0 \). In all cases, \[|c \mu| = (c \mu)_+ + (c \mu)_- = |c| (\mu_+ + \mu_-) = |c| |\mu|\]
From the theorem above, \( (\mu + \nu)_+ \le \mu_+ + \nu_+ \) and \( (\mu + \nu)_- \le \mu_- + \nu_- \). So \begin{align*} |\mu + \nu| & = (\mu + \nu)_+ + (\mu + \nu)_- \le (\mu_+ + \nu_+) + (\mu_- + \nu_-)\\ & = (\mu_+ + \mu_-) + (\nu_+ + \nu_-) = |\mu| + |\nu| \end{align*}

You may have noticed that the properties in the last result look a bit like norm properties. In fact, total variation really is a norm on the vector space of finite measures on \( (S, \mathscr{S}) \):

Suppose that \( \mu \) and \( \nu \) are measures of the same type and that \( c \in \R \). Then

\( \| \mu \| = 0 \) if and only if \( \mu = 0 \) (the zero property)
\( \| c \mu \| = \left|c\right| \| \mu \| \) (the scaling property)
\( \| \mu + \nu \| \le \| \mu \| + \| \nu \| \) (the triangle inequality)

Proof

Since \( |\mu| \) is a positive measure, \( \|\mu\| = |\mu(S)| = 0 \) if and only if \( |\mu| = 0 \). From part (a) of the previous theorem, \( |\mu| = 0 \) if and only if \( \mu = 0 \).
From part (b) of the previous theorem, \( \|c \mu\| = | c \mu(S)| = |c| |\mu(S)| = |c| \|\mu\| \).
From part (c) of the previous theorem, \( \|\mu + \mu\| = |\mu + \nu|(S) \le |\mu|(S) + |\nu|(S) = \|\mu\| + \|\nu\| \).

Every norm on a vector space leads to a corresponding measure of distance (a metric). Let \( \mathscr{M} \) denote the collection of finite measures on \( (S, \mathscr{S}) \). Then \( \mathscr{M} \), under the usual definition of addition and scalar multiplication of measures, is a vector space, and as the last theorem shows, \( \| \cdot \| \) is a norm on \( \mathscr{M} \). Here are the corresponding metric space properties:

Suppose that \( \mu, \, \nu, \, \rho \in \mathscr{M} \) and \( c \in \R \). Then

\( \| \mu - \nu \| = \| \nu - \mu\| \), the symmetric property
\( \| \mu \| = 0 \) if and only if \( \mu = 0 \), the zero property
\( \| \mu - \rho\| \le \| \mu - \nu\| + \|\nu - \rho\| \), the triangle inequality

Now that we have a metric, we have a corresponding criterion for convergence.

Suppose that \( \mu_n \in \mathscr{M} \) for \( n \in \N_+ \) and \( \mu \in \mathscr{M} \). We say that \( \mu_n \to \mu \) as \( n \to \infty \) in total variation if \( \|\mu_n - \mu\| \to 0\) as \( n \to \infty \).

Of course, \( \mathscr{M} \) includes the probability measures on \( (S, \mathscr{S}) \), so we have a new notion of convergence to go along with the others we have studied or will study. Here is a list:

convergence with probability 1
convergence in probability
convergence in distribution
convergence in \( k \)th mean
convergence in total variation

The Integral

Armed with the Jordan decomposition, the integral can be extended to general measures in a natural way.

Suppose that \( \mu \) is a measure on \( (S, \mathscr{S}) \) and that \( f: S \to \R \) is measurable. We define \[ \int_S f \, d\mu = \int_S f \, d\mu_+ - \int_S f \, d\mu_- \] assuming that the integrals on the right exist and that the right side is not of the form \( \infty - \infty \).

We will not pursue this extension, but as you might guess, the essential properties of the integral hold.

Complex Measures

Again, suppose that \( (S, \mathscr{S}) \) is a measurable space. The same axioms that work for general measures can be used to define complex measures. Recall that \( \C = \{x + i y: x, \, y \in \R\} \) denotes the set of complex numbers, where \( i \) is the imaginary unit.

A complex measure on \( (S, \mathscr{S}) \) is a function \( \mu: \mathscr{S} \to \C \) that satisfies the following properties:

\( \mu(\emptyset) = 0 \)
If \( \{A_i: i \in I\} \) is a countable, disjoint collection of sets in \( \mathscr{S} \) then \( \mu\left(\bigcup_{i \in I} A_i\right) = \sum_{i \in I} \mu(A_i) \)

Clearly a complex measure \( \mu \) can be decomposed as \( \mu = \nu + i \rho \) where \( \nu \) and \( \rho \) are finite (real) measures on \( (S, \mathscr{S}) \). We will have no use for complex measures in this text, but from the decomposition into finite measures, it's easy to see how to develop the theory.

Computational Exercises

Counterexamples

The lemma needed for the Hahn decomposition theorem can fail without the assumption that \( \mu(A) \lt \infty \).

Let \( S \) be a set with subsets \( A \) and \( B \) satisfying \( \emptyset \subset B \subset A \subset S \). Let \( \mathscr{S} = \sigma\{A, B\} \) be the \( \sigma \)-algebra generated by \( \{A, B\} \). Define \( \mu(B) = -1 \), \( \mu(A \setminus B) = \infty \), \( \mu(A^c) = 1 \).

Draw the Venn diagram of \( A \), \( B \), \( S \).
List the sets in \( \mathscr{S} \).
Using additivity, give the value of \( \mu \) on each set in \( \mathscr{S} \).
Show that \( A \) does not have a positive subset \(P \in \mathscr{S} \) with \( \mu(P) \ge \mu(A) \).