# 3.13: Absolute Continuity and Density Functions

- Page ID
- 10153

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)## Basic Theory

Our starting point is a measurable space \( (S, \ms{S}) \). That is \( S \) is a set and \( \ms{S} \) is a \( \sigma \)-algebra of subsets of \( S \). In the last section, we discussed general measures on \( (S, \ms{S}) \) that can take positive and negative values. Special cases are positive measures, finite measures, and our favorite kind, probability measures. In particular, we studied properties of general measures, ways to construct them, special sets (positive, negative, and null), and the Hahn and Jordan decompositions.

In this section, we see how to construct a new measure from a given positive measure using a density function, and we answer the fundamental question of when a measure has a density function relative to the given positive measure.

### Relations on Measures

The answer to the question involves two important relations on the collection of measures on \( (S, \ms{S}) \) that are defined in terms of null sets. Recall that \( A \in \ms{S} \) is null for a measure \( \mu \) on \( (S, \ms{S}) \) if \( \mu(B) = 0 \) for every \( B \in \ms{S} \) with \( B \subseteq A \). At the other extreme, \( A \in \ms S \) is a support set for \( \mu \) if \( A^c \) is a null set. Here are the basic definitions:

Suppose that \( \mu \) and \( \nu \) are measures on \( (S, \ms{S}) \).

- \( \nu \) is absolutely continuous with respect to \( \mu \) if every null set of \( \mu \) is also a null set of \( \nu \). We write \( \nu \ll \mu \).
- \( \mu \) and \( \nu \) are mutually singular if there exists \( A \in \ms{S} \) such that \( A \) is null for \( \mu \) and \( A^c \) is null for \( \nu \). We write \( \mu \perp \nu \).

Thus \( \nu \ll \mu \) if every support support set of \( \mu \) is a support set of \( \nu \). At the opposite end, \( \mu \perp \nu \) if \( \mu \) and \( \nu \) have disjoint support sets.

Suppose that \( \mu \), \( \nu \), and \( \rho \) are measures on \( (S, \ms{S})\). Then

- \( \mu \ll \mu \), the reflexive property.
- If \( \mu \ll \nu \) and \( \nu \ll \rho \) then \( \mu \ll \rho \), the transitive property.

Recall that every relation that is reflexive and transitive leads to an equivalence relation, and then in turn, the original relation can be extended to a partial order on the collection of equivalence classes. This general theorem on relations leads to the following two results.

Measures \( \mu \) and \( \nu \) on \( (S, \ms{S}) \) are equivalent if \( \mu \ll \nu \) and \( \nu \ll \mu \), and we write \( \mu \equiv \nu \). The relation \(\equiv\) is an equivalence relation on the collection of measures on \((S, \ms S)\). That is, if \( \mu \), \( \nu \), and \( \rho \) are measures on \( (S, \ms{S}) \) then

- \( \mu \equiv \mu \), the reflexive property
- If \( \mu \equiv \nu \) then \( \nu \equiv \mu \), the symmetric property
- If \( \mu \equiv \nu \) and \( \nu \equiv \rho \) then \( \mu \equiv \rho \), the transitive property

Thus, \( \mu \) and \( \nu \) are equivalent if they have the same null sets and thus the same support sets. This equivalence relation is rather weak: equivalent measures have the same support sets, but the values assigned to these sets can be very different. As usual, we will write \( [\mu] \) for the equivalence class of a measure \( \mu \) on \( (S, \ms{S}) \), under the equivalence relation \( \equiv \).

If \( \mu \) and \( \nu \) are measures on \( (S, \ms{S}) \), we write \( [\mu] \preceq [\nu] \) if \( \mu \ll \nu \). The definition is consistent, and defines a partial order on the collection of equivalence classes. That is, if \( \mu \), \( \nu \), and \( \rho \) are measures on \( (S, \ms{S}) \) then

- \( [\mu] \preceq [\mu] \), the reflexive property.
- If \( [\mu] \preceq [\nu] \) and \( [\nu] \preceq [\mu] \) then \( [\mu] = [\nu] \), the antisymmetric property.
- If \( [\mu] \preceq [\nu] \) and \( [\nu] \preceq [\rho] \) then \( [\mu] \preceq [\rho] \), the transitive property

The singularity relation is trivially symmetric and is almost anti-reflexive.

Suppose that \( \mu \) and \( \nu \) are measures on \( (S, \ms{S}) \). Then

- If \( \mu \perp \nu \) then \( \nu \perp \mu \), the symmetric property.
- \( \mu \perp \mu \) if and only if \( \mu = \bs 0 \), the zero measure.

## Proof

Part (a) is trivial from the symmetry of the definition. For part (b), note that \( S \) is null for \( 0 \) and \( \emptyset \) is null for \( 0 \), so \( 0 \perp 0 \). Conversely, suppose that \( \mu \) is a measure and \( \mu \perp \mu \). Then there exists \( A \in \ms{S} \) such that \( A \) is null for \( \mu \) and \( A^c \) is null for \( \mu \). But then \( S = A \cup A^c \) is null for \( \mu \), so \( \mu(B) = 0 \) for every \( B \in \ms{S} \).

Absolute continuity and singularity are preserved under multiplication by nonzero constants.

Suppose that \( \mu \) and \( \nu \) are measures on \( (S, \ms{S}) \) and that \( a, \, b \in \R \setminus \{0\} \). Then

- \( \nu \ll \mu \) if and only if \( a \nu \ll b \mu \).
- \( \nu \perp \mu \) if and only if \( a \nu \perp b \mu \).

## Proof

Recall that if \( c \ne 0 \), then \( A \in \ms{S} \) is null for \( \mu \) if and only if \( A \) is null for \( c \mu \).

There is a corresponding result for sums of measures.

Suppose that \( \mu \) is a measure on \( (S, \ms{S}) \) and that \( \nu_i \) is a measure on \( (S, \ms{S}) \) for each \( i \) in a countable index set \( I \). Suppose also that \( \nu = \sum_{i \in I} \nu_i \) is a well-defined measure on \( (S, \ms{S}) \).

- If \( \nu_i \ll \mu \) for every \( i \in I \) then \( \nu \ll \mu \).
- If \( \nu_i \perp \mu \) for every \( i \in I \) then \( \nu \perp \mu \).

## Proof

Recall that if \( A \in \ms{S} \) is null for \( \nu_i \) for each \(i \in I \), then \( A \) is null for \( \nu = \sum_{i \in I} \nu_i \), assuming that this is a well-defined measure.

As before, note that \( \nu = \sum_{i \in I} \nu_i \) is well-defined if \( \nu_i \) is a positive measure for each \( i \in I \) or if \( I \) is finite and \( \nu_i \) is a finite measure for each \( i \in I \). We close this subsection with a couple of results that involve both the absolute continuity relation and the singularity relation

Suppose that \( \mu \), \( \nu \), and \( \rho \) are measures on \( (S, \ms{S}) \). If \( \nu \ll \mu \) and \( \mu \perp \rho \) then \( \nu \perp \rho \).

## Proof

Since \( \mu \perp \rho \), there exists \( A \in \ms{S} \) such that \( A \) is null for \( \mu \) and \( A^c \) is null for \( \rho \). But \( \nu \ll \mu \) so \( A \) is null for \( \nu \). Hence \( \nu \perp \rho \).

Suppose that \( \mu \) and \( \nu \) are measures on \( (S, \ms{S}) \). If \( \nu \ll \mu \) and \( \nu \perp \mu \) then \( \nu = \bs 0 \).

## Proof

From the previous theorem (with \( \rho = \nu \)) we have \( \nu \perp \nu \) and hence by (5), \( \nu = \bs 0 \).

### Density Functions

We are now ready for our study of density functions. Throughout this subsection, we assume that \( \mu \) is a positive, \( \sigma \)-finite measure on our measurable space \( (S, \ms{S}) \). Recall that if \(f: S \to \R\) is measurable, then the integral of \(f\) with respect to \(\mu\) may exist as a number in \(\R^* = \R \cup \{-\infty, \infty\}\) or may fail to exist.

Suppose that \( f: S \to \R \) is a measurable function whose integral with respect to \( \mu \) exists. Then function \( \nu \) defined by \[ \nu(A) = \int_A f \, d\mu, \quad A \in \ms{S} \] is a \( \sigma \)-finite measure on \( (S, \ms{S}) \) that is absolutely continuous with respect to \( \mu \). The function \( f \) is a density function of \( \nu \) relative to \( \mu \).

## Proof

To say that the integral exists means that either \( \int_S f^+ \, d \mu \lt \infty \) or \( \int_S f^- \, d\mu \lt \infty \), where as usual, \( f^+ \) and \( f^- \) are the positive and negative parts of \( f \). So \( \nu(A) = \nu_+(A) - \nu_-(A) \) for \( A \in \ms S \) where \( \nu_+(A) = \int_A f^+(A) \, d\mu \) and \( \nu_-(A) = \int_A f^-(A) \, d\mu \). Both \( \nu_+ \) and \( \nu_- \) are positive measures by basic properties of the integral: Generically, suppose \( g: S \to [0, \infty) \) is measurable. The integral over the empty set is always 0, so \( \int_\emptyset g \, d\mu = 0 \). Next, if \( \{A_i: i \in I\} \) is a countable, disjoint collection of sets in \( \ms{S} \) and \( A = \bigcup_{i \in I} A_i \), then by the additivity property of the integral over disjoint domains, \[ \int_A g \, d\mu = \sum_{i \in I} \int_{A_i} g \, d\mu \] By the assumption that the integral exists, either \( \nu_+ \) or \( \nu_- \) is a finite positive measure, and hence \( \nu \) is a measure. As you might guess, \( \nu_+ \) and \( \nu_- \) form the Jordan decomposition of \( \nu \), a point that we will revisit below.

Again, either \( \nu_+ \) or \( \nu_- \) is a finite measure. By symmetry, let's suppose that \( \nu_- \) is finite. Then to show that \( \nu \) is \( \sigma \)-finite, we just need to show that \( \nu_+ \) is \( \sigma \)-finite. Since \( \mu \) has this property, there exists a collection \( \{A_n: n \in \N_+\} \) with \( A_n \in \ms S \), \( \mu(A_n) \lt \infty \), and \( \bigcup_{n=1}^\infty A_n = S \). Let \( B_n = \{x \in S: f^+(x) \le n\} \) for \( n \in \N_+ \). Then \( B_n \in \ms S \) for \( n \in \N_+ \) and \( \bigcup_{n=1}^\infty B_n = S \). Hence \( \{A_m \cap A_n: (m, n) \in \N_+^2\} \) is a countable collection of measurable sets whose union is also \( S \). Moreover, \[ \nu_+(A_m \cap B_n) = \int_{A_m \cap B_n} f^+ d\mu \le n \mu(A_m \cap B_n) \lt \infty \] Finally, suppose \( A \in \ms{S} \) is a null set of \( \mu \). If \( B \in \ms{S} \) and \( B \subseteq A \) then \( \mu(B) = 0 \) so \( \nu(B) = \int_B f \, d\mu = 0 \). Hence \( \nu \ll \mu \).

The following three special cases are the most important:

- If \( f \) is nonnegative (so that the integral exists in \(\R \cup \{\infty\}\)) then \( \nu \) is a positive measure since \( \nu(A) \ge 0 \) for \( A \in \ms{S} \).
- If \( f \) is integrable (so that the integral exists in \(\R\)), then \( \nu \) is a finite measure since \( \nu(A) \in \R \) for \( A \in \ms{S} \).
- If \( f \) is nonnegative and \( \int_S f \, d\mu = 1 \) then \( \nu \) is a probability measure since \( \nu(A) \ge 0 \) for \( A \in \ms{S} \) and \( \nu(S) = 1 \).

In case 3, \( f \) is the probability density function of \( \nu \) relative to \( \mu \), our favorite kind of density function. When they exist, density functions are essentially unique.

Suppose that \( \nu \) is a \( \sigma \)-finite measure on \( (S, \ms{S}) \) and that \( \nu \) has density function \( f \) with respect to \( \mu \). Then \( g: S \to \R \) is a density function of \( \nu \) with respect to \( \mu \) if and only if \( f = g \) almost everywhere on \( S \) with respect to \( \mu \).

## Proof

These results also follow from basic properties of the integral. Suppose that \( f, \, g: S \to \R \) are measurable functions whose integrals with respect to \( \mu \) exist. If \( g = f \) almost everywhere on \( S \) with respect to \( \mu \) then \( \int_A f \, d\mu = \int_A g \, d\mu \) for every \( A \in \ms{S} \). Hence if \( f \) is a density function for \( \nu \) with respect to \( \mu \) then so is \( g \). For the converse, if \( \int_A f \, d\mu = \int_A g \, d\mu \) for every \( A \in \ms{S} \), then since \( \mu \) is \( \sigma \)-finite, it follows that \( f = g \) almost everywhere on \( S \) with respect to \( \mu \).

The essential uniqueness of density functions can fail if the positive measure space \( (S, \ms S, \mu) \) is not \( \sigma \)-finite. A simple example is given below. Our next result answers the question of when a measure has a density function with respect to \( \mu \), and is the fundamental theorem of this section. The theorem is in two parts: Part (a) is the Lebesgue decomposition theorem, named for our old friend Henri Lebesgue. Part (b) is the Radon-Nikodym theorem, named for Johann Radon and Otto Nikodym. We combine the theorems because our proofs of the two results are inextricably linked.

Suppose that \( \nu \) is a \( \sigma \)-finite measure on \( (S, \ms{S}) \).

**Lebesgue Decomposition Theorem**. \( \nu \) can be uniquely decomposed as \( \nu = \nu_c + \nu_s \) where \( \nu_c \ll \mu \) and \( \nu_s \perp \mu \).**Radon-Nikodym Theorem**. \( \nu_c \) has a density function with respect to \( \mu \).

## Proof

The proof proceeds in stages. we first prove the result for finite, positive measures, then for \( \sigma \)-finite, positive measures, and finally for general \( \sigma \)-finite measures. The first stage is the most complicated.

**Part 1**, suppose that \( \mu \) and \( \nu \) are positive, finite measures. Let \( \ms{F} \) denote the collection of measurable functions \( g: S \to [0, \infty) \) with \( \int_A g \, d\mu \le \nu(A) \) for all \( A \in \ms{S} \). Note that \( \ms{F} \ne \emptyset\) since the constant function \( 0 \) is in \( \ms{F} \). The proof works by finding a maximal element of \( \ms{F} \) and using this function as the density function of the absolutely continuous part of \( \nu \).

Our first step is to show that \( \ms{F} \) is closed under the max operator. Let \( g_1, \; g_2 \in \ms{F} \). For \( A \in \ms{S} \), let \( A_1 = \{x \in A: g_1(x) \ge g_2(x)\} \) and \( A_2 = \{x \in A: g_1(x) \lt g_2(x)\} \). Then \( A_1, \; A_2 \in \ms{S} \) partition \( A \) so \[ \int_A \max\{g_1, g_2\} \, d\mu = \int_{A_1} \max\{g_1, g_2\} \, d\mu + \int_{A_2} \max\{g_1, g_2\} d\mu = \int_{A_1} g_1 \, d\mu + \int_{A_2} g_2 \, d\mu \le \nu(A_1) + \nu(A_2) = \nu(A) \] Hence \( \max\{g_1, g_2\} \in \ms{F} \).

Our next step is to show that \( \ms{F} \) is closed with respect to increasing limits. Thus suppose that \( g_n \in \ms{F} \) for \( n \in \N_+ \) and that \( g_n \) is increasing in \( n \) on \( S \). Let \( g = \lim_{n \to \infty} g_n \). Then \( g: S \to [0, \infty] \) is measurable, and by the monotone convergence theorem, \( \int_A g \, d\mu = \lim_{n \to \infty} \int_A g_n \, d\mu \) for every \( A \in \ms{S} \). But \( \int_A g_n \, d\mu \le \nu(A) \) for every \( n \in \N_+ \) so \( \int_A g \, d\mu \le \nu(A) \). In particular, \( \int_S g \, d\mu \le \nu(S) \lt \infty \) so \( g \lt \infty \) almost everywhere on \( S \) with respect to \( \mu \). Thus, by redefining \( g \) on a \( \mu \)-null set if necessary, we can assume \( g \lt \infty \) on \( S \). Hence \( g \in \ms{F} \).

Now let \( \alpha = \sup\left\{\int_S g \, d\mu: g \in \ms{F}\right\} \). Note that \( \alpha \le \nu(S) \lt \infty\). By definition of the supremum, for each \( n \in \N_+ \) there exist \( g_n \in \ms{F} \) such that \( \int_S g_n \, d\mu \gt \alpha - \frac{1}{n} \). Now let \( f_n = \max\{g_1, g_2, \ldots, g_n\} \) for \( n \in \N_+ \). Then \( f_n \in \ms{F} \) and \( f_n \) is increasing in \( n \in \N_+ \) on \( S \). Hence \( f = \lim_{n \to \infty} f_n \in \ms{F} \) and \( \int_S f \, d\mu = \lim_{n \to \infty} \int_S f_n \, d\mu \). But \( \int_S f_n \, d\mu \ge \int_S g_n \, d\mu \gt \alpha - \frac{1}{n} \) for each \( n \in \N_+ \) and hence \( \int_S f \, d\mu \ge \alpha \).

Define \( \nu_c(A) = \int_A f \, d\mu \) and \( \nu_s(A) = \nu(A) - \nu_c(A) \) for \( A \in \ms{S} \). Then \( \nu_c \) and \( \nu_s \) are finite, positive measures and by our previous theorem, \( \nu_c \) is absolutely continuous with respect to \( \mu \) and has density function \( f \). Our next step is to show that \( \nu_s \) is singular with respect to \( \mu \). For \( n \in \N \), let \( (P_n, P_n^c) \) denote a Hahn decomposition of the measure \( \nu_s - \frac{1}{n} \mu \). Then \[ \int_A \left(f + \frac{1}{n} \bs{1}_{P_n}\right) \, d\mu = \nu_c(A) + \frac{1}{n} \mu(P_n \cap A) = \nu(A) - \left[\nu_s(A) - \frac{1}{n} \mu(P_n \cap A)\right] \] But \( \nu_s(A) - \frac{1}{n} \mu(P_n \cap A) \ge \nu_s(A \cap P_n) - \frac{1}{n} \mu(A \cap P_n) \ge 0 \) since \( \nu_s \) is a positive measure and \( P_n \) is positive for \( \nu_s - \frac{1}{n} \mu \). Thus we have \( \int_A \left(f + \frac{1}{n} \bs{1}_{P_n} \right) \, d\mu \le \nu(A) \) for every \( A \in \ms{S} \), so \( f + \frac{1}{n} \bs{1}_{P_n} \in \ms{F} \) for every \( n \in \N_+ \). If \( \mu(P_n) \gt 0 \) then \( \int_S \left(f + \frac{1}{n} \bs{1}_{P_n}\right) \, d\mu = \alpha + \frac{1}{n} \mu(P_n) \gt \alpha \), which contradicts the definition of \( \alpha \). Hence we must have \( \mu(P_n) = 0 \) for every \( n \in \N_+ \). Now let \( P = \bigcup_{n=1}^\infty P_n \). Then \( \mu(P) = 0 \). If \( \nu_s(P^c) \gt 0 \) then \( \nu_s(P^c) - \frac{1}{n} \mu(P^c) \gt 0 \) for \( n \) sufficiently large. But this is a contradiction since \( P^c \subseteq P_n^c \) which is negative for \( \nu_s - \frac{1}{n} \mu \) for every \( n \in \N_+ \). Thus we must have \( \nu_s(P^c) = 0 \), so \( \mu \) and \( \nu_s \) are singular.

**Part 2**. Suppose that \( \mu \) and \( \nu \) are \( \sigma \)-finite, positive measures. Then there exists a countable partition \( \{S_i: i \in I\} \) of \( S \) where \( S_i \in \ms{S} \) for \( i \in I \), and \( \mu(S_i) \lt \infty \) and \( \nu(S_i) \lt \infty \) for \( i \in I \). Let \( \mu_i(A) = \mu(A \cap S_i) \) and \( \nu_i(A) = \nu(A \cap S_i) \) for \( i \in I \). Then \( \mu_i \) and \( \nu_i \) are finite, positive measures for \( i \in I \), and \( \mu = \sum_{i \in I} \mu_i \) and \( \nu = \sum_{i \in I} \nu_i \). By part 1, for each \( i \in I \), there exists a measurable function \( f_i: S \to [0, \infty) \) such that \( \nu_i = \nu_{i,c} + \nu_{i,s} \) where \( \nu_{i, c}(A) = \int_A f_i \, d\mu \) for \( A \in \ms{S} \) and \( \nu_{i,s} \perp \mu \). Let \( f = \sum_{i \in I} \bs{1}_{A_i} f_i \). Then \( f: S \to [0, \infty) \) is measurable. Define \( \nu_c(A) = \int_A f \, d\mu \) and \( \nu_s(A) = \nu(A) - \nu_c(A) \) for \( A \in \ms{S} \). Note that \( \nu_c = \sum_{i \in I} \nu_{i,c} \) and \( \nu_s = \sum_{i \in I} \nu_{i,s} \). Then \( \nu_c \ll \mu \) and has density function \( f \) and \( \nu_s \perp \mu \).

**Part 3**. Suppose that \( \nu \) is a \( \sigma \)-finite measure (not necessarily positive). By the Jordan decomposition theorem, \( \nu = \nu_+ - \nu_- \) where \( \nu_+ \) and \( \nu_- \) are \( \sigma \)-finite, positive measures, and at least one is finite. By part 2, there exist measurable functions \( f_+: S \to [0, \infty) \) and \( f_-: S \to [0, \infty) \) such that \( \nu_+ = \nu_{+,c} + \nu_{+,s} \) and \( \nu_- = \nu_{-,c} + \nu_{-,s} \) where \( \nu_{+,c}(A) = \int_A f_+ \, d\mu \), \( \nu_{-,c} = \int_A f_- \, d\mu \) for \( A \in \ms{S} \), and \( \nu_{+,s} \perp \mu \), \( \nu_{-,s} \perp \mu \). Let \( f = f_+ - f_- \), \( \nu_c(A) = \int_A f \, d\mu \), \(\nu_s(A) = \nu(A) - \nu_c(A) \) for \( A \in \ms{S} \). Then \( \nu = \nu_c + \nu_s \) and \( \nu_s = \nu_{+,s} - \nu_{-,s} \perp \mu \).

**Uniqueness**. Suppose that \( \nu = \nu_{c,1} + \nu_{s,1} = \nu_{c,2} + \nu_{s,2} \) where \( \nu_{c,i} \ll \mu \) and \( \nu_{s,i} \perp \mu \) for \( i \in \{1, 2\} \). Then \( \nu_{c,1} - \nu_{c,2} = \nu_{s,2} - \nu_{s,1} \). But \( \nu_{c,1} - \nu_{c,2} \ll \mu \) and \( \nu_{s,2} - \nu_{s,1} \perp \mu \) so \( \nu_{c,1} - \nu_{c,2} = \nu_{s,2} - \nu_{s,1} = \bs 0 \) by the theorem above

In particular, a measure \( \nu \) on \( (S, \ms{S}) \) has a density function with respect to \( \mu \) if and only if \( \nu \ll \mu \). The density function in this case is also referred to as the Radon-Nikodym derivative of \( \nu \) with respect to \( \mu \) and is sometimes written in derivative notation as \( d\nu / d\mu \). This notation, however, can be a bit misleading because we need to remember that a density function is unique only up to a \( \mu \)-null set. Also, the Radon-Nikodym theorem can fail if the positive measure space \( (S, \ms S, \mu) \) is not \( \sigma \)-finite. A couple of examples are given below. Next we characterize the Hahn decomposition and the Jordan decomposition of \( \nu \) in terms of the density function.

Suppose that \( \nu \) is a measure on \( (S, \ms{S}) \) with \( \nu \ll \mu \), and that \( \nu \) has density function \( f \) with respect to \( \mu \). Let \( P = \{x \in S: f(x) \ge 0\} \), and let \( f^+ \) and \( f^- \) denote the positive and negative parts of \( f \).

- A Hahn decomposition of \( \nu \) is \( (P, P^c) \).
- The Jordan decomposition is \( \nu = \nu_+ - \nu_- \) where \( \nu_+(A) = \int_A f^+ \, d\mu \) and \( \nu_-(A) = \int_A f^- \, d\mu\), for \( A \in \ms{S} \).

## Proof

Of course \(P^c = \{x \in S: f(x) \lt 0\}\). The proofs are simple.

- Suppose that \(A \in \ms S\). If \(A \subseteq P\) then \(f(x) \ge 0\) for \(x \in A\) and hence \(\nu(A) = \int_A f \, d\mu \ge 0\). If \(A \subseteq P^c\) then \(\nu(A) = \int_A f \, d\mu \le 0\).
- This follows immediately from (a) and the Jordan decomposition theorem, since \(\nu_+(A) = \nu(A \cap P)\) and \(\nu_-(A) = -\nu(A \cap P^c)\) for \(A \in \ms S\). Note that \( f^+ = \bs 1_P f \) and \( f^- = -\bs 1_{P^c} f \).

The following result is a basic change of variables theorem for integrals.

Suppose that \( \nu \) is a positive measure on \( (S, \ms{S}) \) with \( \nu \ll \mu \) and that \( \nu \) has density function \( f \) with respect to \( \mu \). If \( g: S \to \R \) is a measurable function whose integral with respect to \( \nu \) exists, then \[ \int_S g \, d\nu = \int_S g f \, d\mu \]

## Proof

The proof is a classical bootstrapping argument. Suppose first that \( g = \sum_{i \in I} a_i \bs{1}_{A_i} \) is a nonnegative simple function. That is, \( I \) is a finite index set, \( a_i \in [0, \infty) \) for \( i \in I \), and \( \{A_i: i \in I\} \) is a disjoint collection of sets in \( \ms{S} \). Then \( \int_S g \, d\nu = \sum_{i \in I} a_i \nu(A_i) \). But \( \nu(A_i) = \int_{A_i} f \, d\mu = \int_S \bs{1}_{A_i} f \, d\mu \) for each \( i \in I \) so \[ \int_S g \, d\mu = \sum_{i \in I} a_i \int_S \bs{1}_{A_i} f \, d\mu = \int_S \left(\sum_{i \in I} a_i \bs{1}_{A_i}\right) f \, d\mu = \int_S g f \, d\mu \] Suppose next that \( g: S \to [0, \infty) \) is measurable. There exists a sequence of nonnegative simple functions \( (g_1, g_2, \ldots) \) such that \( g_n \) is increasing in \( n \in \N_+ \) on \( S \) and \( g_n \to g \) as \( n \to \infty \) on \( S \). Since \( f \) is nonnegative, \( g_n f \) is increasing in \( n \in \N_+ \) on \( S \) and \( g_n f \to g f \) as \( n \to \infty \) on \( S \). By the first step, \( \int_S g_n \, d\nu = \int_S g_n f \, d\mu \) for each \( n \in \N_+ \). But by the monotone convergence theorem, \( \int_S g_n \, d\nu \to \int_S g \, d\nu \) and \( \int_S g_n f \, d\mu \to \int_S g f \, d\mu \) as \( n \to \infty \). Hence \( \int_S g \, d\nu = \int_S g f \, d\mu \).

Finally, suppose that \( g: S \to \R \) is a measurable function whose integral with respect to \( \nu \) exists. By the previous step, \( \int_S g^+ \, d\nu = \int_S g^+ f \, d\mu \) and \( \int_S g^- \, d\nu = \int_S g^- f \, d\mu \), and at least one of these integrals is finite. Hence by the additive property \[ \int_S g \, d\nu = \int_S g^+ \, d\nu - \int_S g^- \, d\nu = \int_S g^+ f \, d\mu - \int_S g^- f \, d\mu = \int_S (g^+ - g^-) f \, d\mu = \int_S g f \, d\mu \]

In differential notation, the change of variables theorem has the familiar form \( d\nu = f \, d\mu \), and this is really the justification for the derivative notation \( f = d\nu / d\mu \) in the first place. The following result gives the scalar multiple rule for density functions.

Suppose that \( \nu \) is a measure on \( (S, \ms{S}) \) with \( \nu \ll \mu \) and that \( \nu \) has density function \( f \) with respect to \( \mu \). If \( c \in \R \), then \( c \nu \) has density function \( c f \) with respect to \( \mu \).

## Proof

If \( A \in \ms{S} \) then \( \int_A c f \, d\mu = c \int_A f \, d\mu = c \nu(A) \).

Of course, we already knew that \( \nu \ll \mu \) implies \( c \nu \ll \mu \) for \( c \in \R \), so the new information is the relation between the density functions. In derivative notation, the scalar multiple rule has the familiar form \[ \frac{d(c \nu)}{d\mu} = c \frac{d\nu}{d\mu} \]

The following result gives the sum rule for density functions. Recall that two measures are of the same type if neither takes the value \( \infty \) or if neither takes the value \( -\infty \).

Suppose that \( \nu \) and \( \rho \) are measures on \( (S, \ms{S}) \) of the same type with \( \nu \ll \mu \) and \( \rho \ll \mu \), and that \( \nu \) and \( \rho \) have density functions \( f \) and \( g \) with respect to \( \mu \), respectively. Then \( \nu + \rho \) has density function \( f + g \) with respect to \( \mu \).

## Proof

If \( A \in \ms{S} \) then \[ \int_A (f + g) \, d\mu = \int_A f \, d\mu + \int_A g \, d\mu = \nu(A) + \rho(A) \] The additive property holds because we know that the integrals in the middle of the displayed equation are not of the form \( \infty - \infty \).

Of course, we already knew that \( \nu \ll \mu \) and \( \rho \ll \mu \) imply \( \nu + \rho \ll \mu \), so the new information is the relation between the density functions. In derivative notation, the sum rule has the familiar form \[ \frac{d(\nu + \rho)}{d\mu} = \frac{d\nu}{d\mu} + \frac{d\rho}{d\mu} \] The following result is the chain rule for density functions.

Suppose that \( \nu \) is a positive measure on \( (S, \ms{S}) \) with \( \nu \ll \mu \) and that \( \nu \) has density function \( f \) with respect to \( \mu \). Suppose \( \rho \) is a measure on \( (S, \ms{S}) \) with \( \rho \ll \nu \) and that \( \rho \) has density function \( g \) with respect to \( \nu \). Then \( \rho \) has density function \( g f \) with respect to \( \mu \).

## Proof

This is a simple consequence of the change of variables theorem above. If \( A \in \ms{S} \) then \( \rho(A) = \int_A g \, d\nu = \int_A g f \, d\mu \).

Of course, we already knew that \( \nu \ll \mu \) and \( \rho \ll \nu \) imply \( \rho \ll \mu \), so once again the new information is the relation between the density functions. In derivative notation, the chan rule has the familiar form \[ \frac{d\rho}{d\mu} = \frac{d\rho}{d\nu} \frac{d\nu}{d\mu}\] The following related result is the inverse rule for density functions.

Suppose that \( \nu \) is a positive measure on \( (S, \ms{S}) \) with \( \nu \ll \mu \) and \( \mu \ll \nu \) (so that \( \nu \equiv \mu \)). If \( \nu \) has density function \( f \) with respect to \( \mu \) then \( \mu \) has density function \( 1 / f \) with respect to \( \nu \).

## Proof

Let \( f \) be a density function of \( \nu \) with respect to \( \mu \) and let \( Z = \{x \in S: f(x) = 0\} \). Then \( \nu(Z) = \int_Z f \, d\mu = 0 \) so \( Z \) is a null set of \( \nu \) and hence is also a null set of \( \mu \). Thus, we can assume that \( f \ne 0 \) on \( S \). Let \( g \) be a density of \( \mu \) with respect to \( \nu \). Since \( \mu \ll \nu \ll \mu \), it follows from the chain rule that \( f g \) is a density of \( \mu \) with respect to \( \mu \). But of course the constant function \( 1 \) is also a density of \( \mu \) with respect to itself so we have \( f g = 1 \) almost everywhere on \( S \). Thus \( 1 / f \) is a density of \( \mu \) with respect to \( \nu \).

In derivative notation, the inverse rule has the familiar form \[ \frac{d\mu}{d\nu} = \frac{1}{d\nu / d\mu}\]

## Examples and Special Cases

### Discrete Spaces

Recall that a discrete measure space \((S, \ms S, \#)\) consists of a countable set \( S \) with the \(\sigma\)-algebra \( \ms{S} = \ms{P}(S) \) of all subsets of \( S \), and with counting measure \( \# \). Of course \( \# \) is a positive measure and is trivially \( \sigma \)-finite since \( S \) is countable. Note also that \( \emptyset \) is the only set that is null for \( \# \). If \( \nu \) is a measure on \( S \), then by definition, \( \nu(\emptyset) = 0 \), so \( \nu \) is absolutely continuous relative to \( \mu \). Thus, by the Radon-Nikodym theorem, \( \nu \) can be written in the form \[ \nu(A) = \sum_{x \in A} f(x), \quad A \subseteq S \] for a unique \( f: S \to \R \). Of course, this is obvious by a direct argument. If we define \( f(x) = \nu\{x\} \) for \( x \in S \) then the displayed equation follows by the countable additivity of \( \nu \).

### Spaces Generated by Countable Partitions

We can generalize the last discussion to spaces generated by countable partitions. Suppose that \( S \) is a set and that \( \ms{A} = \{A_i: i \in I\} \) is a countable partition of \( S \) into nonempty sets. Let \( \ms{S} = \sigma(\ms{A}) \) and recall that every \( A \in \ms{S} \) has a unique representation of the form \( A = \bigcup_{j \in J} A_j \) where \( J \subseteq I \). Suppse now that \( \mu \) is a positive measure on \( \ms{S} \) with \( 0 \lt \mu(A_i) \lt \infty \) for every \( i \in I \). Then once again, the measure space \( (S, \ms{S}, \mu) \) is \( \sigma \)-finite and \( \emptyset \) is the only null set. Hence if \( \nu \) is a measure on \( (S, \ms{S}) \) then \( \nu \) is absolutely continuous with respect to \( \mu \) and hence has unique density function \( f \) with respect to \( \mu \): \[ \nu(A) = \int_A f \, d\mu, \quad A \in \ms{S} \] Once again, we can construct the density function explicitly.

In the setting above, define \( f: S \to \R \) by \( f(x) = \nu(A_i) / \mu(A_i) \) for \( x \in A_i \) and \( i \in I \). Then \( f \) is the density of \( \nu \) with respect to \( \mu \).

## Proof

Suppose that \( A \in \ms{S} \) so that \( A = \bigcup_{j \in J} A_j \) for some \( J \subseteq I \). Then \[ \int_A f \, d\mu = \sum_{j \in J} \int_{A_j} f \, d\mu = \sum_{j \in J} \frac{\nu(A_j)}{\mu(A_j)} \mu(A_j) = \sum_{j \in J} \nu(A_j) = \nu(A) \]

Often positive measure spaces that occur in applications can be decomposed into spaces generated by countable partitions. In the section on Convergence in the chapter on Martingales, we show that more general density functions can be obtained as limits of density functions of the type in the last theorem.

### Probability Spaces

Suppose that \( (\Omega, \ms{F}, \P) \) is a probability space and that \( X \) is a random variable taking values in a measurable space \( (S, \ms{S}) \). Recall that the distribution of \( X \) is the probability measure \( P_X \) on \( (S, \ms{S}) \) given by \[ P_X(A) = \P(X \in A), \quad A \in \ms{S} \] If \( \mu \) is a positive measure, \( \sigma \)-finite measure on \( (S, \ms{S}) \), then the theory of this section applies, of course. The Radon-Nikodym theorem tells us precisely when (the distribution of) \( X \) has a probability density function with respect to \( \mu \): we need the distribution to be absolutely continuous with respect to \( \mu \): if \( \mu(A) = 0 \) then \(P_X(A) = \P(X \in A) = 0 \) for \( A \in \ms{S} \).

Suppose that \( r: S \to \R \) is measurable, so that \( r(X) \) is a real-valued random variable. The integral of \( r(X) \) (assuming that it exists) is of fundamental importance, and is knowns as the expected value of \( r(X) \). We will study expected values in detail in the next chapter, but here we just note different ways to write the integral. By the change of variables theorem in the last section we have \[ \int_\Omega r[X(\omega)] d\P(\omega) = \int_S r(x) dP_X(x) \] Assuming that \( P_X \), the distribution of \( X \), is absolutely continuous with respect to \( \mu \), with density function \( f \), we can add to our chain of integrals using Theorem (14): \[ \int_\Omega r[X(\omega)] d\P(\omega) = \int_S r(x) dP_X(x) = \int_S r(x) f(x) d\mu(x)\]

Specializing, suppose that \( (S, \ms S, \#) \) is a discrete measure space. Thus \( X \) has a discrete distribution and (as noted in the previous subsection), the distribution of \( X \) is absolutely continuous with respect to \(\#\), with probability density function \( f \) given by \( f(x) = \P(X = x) \) for \( x \in S \). In this case the integral simplifies: \[ \int_\Omega r[X(\omega)] d\P(\omega) = \sum_{x \in S} r(x) f(x) \]

Recall next that for \(n \in \N_+\), the \(n\)-dimensional Euclidean measure space is \((\R^n, \ms R_n, \lambda_n)\) where \(\ms R_n\) is the \(\sigma\)-algebra of Lebesgue measurable sets and \(\lambda_n\) is Lebesgue measure. Suppose now that \( S \in \ms R_n \) and that \( \ms{S} \) is the \( \sigma \)-algebra of Lebesgue measurable subsets of \( S \), and that once again, \(X\) is a random variable with values in \(S\). By definition, \( X \) has a continuous distribution if \( \P(X = x) = 0 \) for \( x \in S \). But we now know that this is not enough to ensure that the distribution of \( X \) has a density function with respect to \( \lambda_n \). We need the distribution to be *absolutely* continuous, so that if \( \lambda_n(A) = 0 \) then \( \P(X \in A) = 0 \) for \( A \in \ms{S} \). Of course \( \lambda_n\{x\} = 0 \) for \( x \in S \), so absolute continuity implies continuity, but not conversely. Continuity of the distribution is a (much) weaker condition than absolute continuity of the distribution. If the distribution of \( X \) is continuous but not absolutely so, then the distribution will not have a density function with respect to \( \lambda_n \).

For example, suppose that \(\lambda_n(S) = 0\). Then the distribution of \( X \) and \( \lambda_n \) are mutually singular since \( \P(X \in S) = 1 \) and so \(X\) will not have a density function with respect to \(\lambda_n\). This will always be the case if \(S\) is countable, so that the distribution of \(X\) is discrete. But it is also possible for \(X\) to have a continuous distribution on an uncountable set \( S \in \ms R_n \) with \(\lambda_n(S) = 0\). In such a case, the continuous distribution of \( \bs{X} \) is said to be degenerate. There are a couple of natural ways in which this can happen that are illustrated in the following exercises.

Suppose that \(\Theta\) is uniformly distributed on the interval \([0, 2 \pi)\). Let \(X = \cos \Theta\), \(Y = \sin \Theta\).

- \((X, Y)\) has a continuous distribution on the circle \(C = \{(x, y): x^2 + y^2 = 1\}\).
- The distribution of \((X, Y)\) and \(\lambda_2\) are mutually singular.
- Find \(\P(Y \gt X)\).

## Solution

- If \((x, y) \in C\) then there exist a unique \(\theta \in [0, 2 \pi)\) with \(x = \cos \theta\) and \(y = \sin \theta\). Hence \(\P[(X, Y) = (x, y)] = \P(\Theta = \theta) = 0\).
- \(\P[(X, Y) \in C] = 1\) but \(\lambda_2(C) = 0\).
- \(\frac{1}{2}\)

The last example is artificial since \((X, Y)\) has a one-dimensional distribution in a sense, in spite of taking values in \(\R^2\). And of course \(\Theta\) has a probability density function \(f\) with repsect \(\lambda_1\) given by \(f(\theta) = 1 / 2 \pi\) for \(\theta \in [0, 2 \pi)\).

Suppose that \(X\) is uniformly distributed on the set \(\{0, 1, 2\}\), \(Y\) is uniformly distributed on the interval \([0, 2]\), and that \(X\) and \(Y\) are independent.

- \((X, Y)\) has a continuous distribution on the product set \(S = \{0, 1, 2\} \times [0, 2]\).
- The distribution of \((X, Y)\) and \(\lambda_2\) are mutually singular.
- Find \(\P(Y \gt X)\).

## Solution

- The variables are independent and \(Y\) has a continuous distribution so \(\P[(X, Y) = (x, y)] = \P(X = 2) \P(Y = y) = 0\) for \((x, y) \in S\).
- \P[(X, Y) \in S] = 1\) but \(\lambda_2(S) = 0\)
- \(\frac{1}{2}\)

The last exercise is artificial since \(X\) has a discrete distribution on \(\{0, 1, 2\}\) (with all subsets measureable and with \(\#\)), and \(Y\) a continuous distribution on the Euclidean space \([0, 2]\) (with Lebesgue mearuable subsets and with \(\lambda\)). Both are absolutely continuous; \( X \) has density function \( g \) given by \( g(x) = 1/3 \) for \( x \in \{0, 1, 2\} \) and \( Y \) has density function \( h \) given by \( h(y) = 1 / 2 \) for \( y \in [0, 2] \). So really, the proper measure space on \(S\) is the product measure space formed from these two spaces. Relative to this product space \((X, Y)\) has a density \(f\) given by \(f(x, y) = 1/6\) for \((x, y) \in S\).

It is also possible to have a continuous distribution on \(S \subseteq \R^n\) with \(\lambda_n(S) \gt 0\), yet still with no probability density function, a much more interesting situation. We will give a classical construction. Let \((X_1, X_2, \ldots)\) be a sequence of Bernoulli trials with success parameter \(p \in (0, 1)\). We will indicate the dependence of the probability measure \(\P\) on the parameter \(p\) with a subscript. Thus, we have a sequence of independent indicator variables with

\[\P_p(X_i = 1) = p, \quad \P_p(X_i = 0) = 1 - p\]

We interpret \(X_i\) as the \(i\)th binary digit (bit) of a random variable \(X\) taking values in \((0, 1)\). That is, \(X = \sum_{i=1}^\infty X_i / 2^i\). Conversely, recall that every number \(x \in (0, 1)\) can be written in binary form as \(x = \sum_{i=1}^\infty x_i / 2^i \) where \( x_i \in \{0, 1\} \) for each \( i \in \N_+ \). This representation is unique except when \(x \) is a binary rational of the form \(x = k / 2^n\) for \( n \in \N_+ \) and \(k \in \{1, 3, \ldots 2^n - 1\}\). In this case, there are two representations, one in which the bits are eventually 0 and one in which the bits are eventually 1. Note, however, that the set of binary rationals is countable. Finally, note that the uniform distribution on \( (0, 1) \) is the same as Lebesgue measure on \( (0, 1) \).

\(X\) has a continuous distribution on \( (0, 1) \) for every value of the parameter \( p \in (0, 1) \). Moreover,

- If \( p, \, q \in (0, 1) \) and \( p \ne q \) then the distribution of \( X \) with parameter \( p \) and the distribution of \( X \) with parameter \( q \) are mutually singular.
- If \( p = \frac{1}{2} \), \( X \) has the uniform distribution on \( (0, 1) \).
- If \( p \ne \frac{1}{2} \), then the distribution of \( X \) is singular with respect to Lebesgue measure on \( (0, 1) \), and hence has no probability density function in the usual sense.

## Proof

If \(x \in (0, 1)\) is not a binary rational, then \[ \P_p(X = x) = \P_p(X_i = x_i \text{ for all } i \in \N_+) = \lim_{n \to \infty} \P_p(X_i = x_i \text{ for } i = 1, \; 2 \ldots, \; n) = \lim_{n \to \infty} p^y (1 - p)^{n - y} \] where \( y = \sum_{i=1}^n x_i \). Let \(q = \max\{p, 1 - p\}\). Then \(p^y (1 - p)^{n - y} \le q^n \to 0\) as \(n \to \infty\). Hence, \(\P_p(X = x) = 0\). If \(x \in (0, 1)\) is a binary rational, then there are two bit strings that represent \(x\), say \((x_1, x_2, \ldots)\) (with bits eventually 0) and \((y_1, y_2, \ldots)\) (with bits eventually 1). Hence \(\P_p(X = x) = \P_p(X_i = x_i \text{ for all } i \in \N_+) + \P_p(X_i = y_i \text{ for all } i \in \N_+)\). But both of these probabilities are 0 by the same argument as before.

Next, we define the set of numbers for which the limiting relative frequency of 1's is \(p\). Let \(C_p = \left\{ x \in (0, 1): \frac{1}{n} \sum_{i = 1}^n x_i \to p \text{ as } n \to \infty \right\} \). Note that since limits are unique, \(C_p \cap C_q = \emptyset\) for \(p \ne q\). Next, by the strong law of large numbers, \(\P_p(X \in C_p) = 1\). Although we have not yet studied the law of large numbers, The basic idea is simple: in a sequence of Bernoulli trials with success probability \( p \), the long-term relative frequency of successes is \( p \). Thus the distributions of \(X\), as \(p\) varies from 0 to 1, are mutually singular; that is, as \(p\) varies, \(X\) takes values with probability 1 in mutually disjoint sets.

Let \(F\) denote the distribution function of \(X\), so that \(F(x) = \P_p(X \le x) = \P_p(X \lt x)\) for \(x \in (0, 1)\). If \(x \in (0, 1)\) is not a binary rational, then \(X \lt x\) if and only if there exists \(n \in \N_+\) such that \(X_i = x_i\) for \(i \in \{1, 2, \ldots, n - 1\}\) and \(X_n = 0\) while \(x_n = 1\). Hence \( \P_{1/2}(X \lt x) = \sum_{n=1}^\infty \frac{x_n}{2^n} = x \). Since the distribution function of a continuous distribution is continuous, it follows that \(F(x) = x\) for all \(x \in [0, 1]\). This means that \(X\) has the uniform distribution on \((0, 1)\). If \(p \ne \frac{1}{2}\), the distribution of \(X\) and the uniform distribution are mutually singular, so in particular, \( X \) does not have a probability density function with respect to Lebesgue measure.

For an application of some of the ideas in this example, see Bold Play in the game of Red and Black.

### Counterexamples

The essential uniqueness of density functions can fail if the underlying positive measure \( \mu \) is not \( \sigma \)-finite. Here is a trivial counterexample:

Suppose that \( S \) is a nonempty set and that \( \ms{S} = \{S, \emptyset\} \) is the trivial \( \sigma \)-algebra. Define the positive measure \( \mu \) on \( (S, \ms{S}) \) by \( \mu(\emptyset) = 0 \), \( \mu(S) = \infty \). Let \( \nu_c \) denote the measure on \( (S, \ms{S}) \) with constant density function \( c \in \R \) with respect to \( \mu \).

- \( (S, \ms{S}, \mu) \) is not \( \sigma \)-finite.
- \( \nu_c = \mu \) for every \( c \in (0, \infty) \).

The Radon-Nikodym theorem can fail if the measure \( \mu \) is not \( \sigma \)-finite, even if \( \nu \) is finite. Here are a couple of standard counterexample:

Suppose that \( S \) is an uncountable set and \( \ms{S} \) is the \( \sigma \)-algebra of countable and co-countable sets: \[\ms{S} = \{A \subseteq S: A \text{ is countable or } A^c \text{ is countable} \} \] As usual, let \( \# \) denote counting measure on \( \ms{S} \), and define \( \nu \) on \( \ms{S} \) by \( \nu(A) = 0 \) if \( A \) is countable and \( \nu(A) = 1 \) if \( A^c \) is countable. Then

- \( (S, \ms{S}, \#) \) is not \( \sigma \)-finite.
- \( \nu \) is a finite, positive measure on \( (S, \ms{S}) \).
- \( \nu \) is absolutely continuous with respect to \( \# \).
- \( \nu \) does not have a density function with respect to \( \# \).

## Proof

- Recall that a countable union of countable sets is countable, and so \( S \) cannot be written as such a union.
- Note that \( \nu(\emptyset) = 0 \). Suppose that \( \{A_i: i \in I\} \) is a countable, disjoint collection of sets in \( \ms{S} \). If \( A_i \) is countable for every \( i \in I \) then \( \bigcup_{i \in I} A_i \) is countable. Hence \( \nu\left(\bigcup_{i \in I} A_i\right) = 0 \) and \( \nu(A_i) = 0 \) for every \( i \in I \). Next suppose that \( A_j^c \) and \( A_k^c \) are countable for distinct \( j, \; k \in I \). Since \( A_j \cap A_k = \emptyset \), we have \( A_j^c \cup A_k^c = S \). But then \( S \) would be countable, which is a contradiction. Hence it is only possible for to have \( A_j^c \) countable for a single \( j \in I \). In this case, \( \nu(A_j) = 1 \) and \( \nu(A_i) = 0 \) for \( i \ne j \). But also \( \left(\bigcup_{i \in I} A_i\right)^c = \bigcap_{i \in I} A_i^c \) is countable, so \( \nu\left(\bigcup_{i \in I} A_i\right) = 1 \). Hence in all cases, \( \nu\left(\bigcup_{i \in I} A_i \right) = \sum_{i \in I} \nu(A_i) \) so \( \nu \) is a measure on \( (S, \ms{S}) \). It is clearly positive and finite.
- Recall that any measure is absolutely continuous with respect to counting measure, since \( \#(A) = 0 \) if and only if \( A = \emptyset \).
- Suppose that \( \nu \) has density function \( f \) with respect to \( \# \). Then \(0 = \nu\{x\} = \int_{\{x\}} f \, d\# = f(x) \) for every \( x \in S \). But then \( \nu(S) = \int_S f \, d\# = 0 \), which is a contradiction.

Let \( \ms R \) denote the standard Borel \( \sigma \)-algebra on \( \R \). Let \( \# \) and \( \lambda \) denote counting measure and Lebesgue measure on \( (\R, \ms R) \), respectively. Then

- \( (\R, \ms R, \#) \) is not \( \sigma \)-finite.
- \( \lambda \) is absolutely continuous with respect to \( \# \).
- \( \lambda \) does not have a density function with respect to \( \# \).

## Proof

- \( \R \) is uncountable and hence cannot be written as a countable union of finite sets.
- Since \( \emptyset \) is the only null set of \( \# \), \( \lambda \ll \# \).
- Suppose that \( \lambda \) has density function \( f \) with respect to \( \# \). Then \[ 0 = \lambda\{x\} = \int_{\{x\}} f \, d\# = f(x), \quad x \in \R \] But then also \( \lambda(\R) = \int_\R f \, d\# = 0 \), a contradiction.