3.2: Continuous Distributions
 Page ID
 10142
In the previous section, we considered discrete distributions. In this section, we study a complementary type of distribution. As usual, if you are a new student of probability, you may want to skip the technical details.
Basic Theory
Definitions and Basic Properties
As usual, our starting point is a random experiment modeled by a probability space \((S, \mathscr S, \P)\). So to review, \(S\) is the set of outcomes, \(\mathscr S\) the collection of events, and \(\P\) the probability measure on the sample space \((S, \mathscr S)\). We use the terms probability measure and probability distribution synonymously in this text. Also, since we use a general definition of random variable, every probability measure can be thought of as the probability distribution of a random variable, so we can always take this point of view if we like. Indeed, most probability measures naturally have random variables associated with them.
In this section, we assume that \(S \subseteq \R^n\) for some \(n \in \N_+\).
Details
Technically, \( S \) is a measurable subset of \( \R^n \) and \( \mathscr S \) is the \( \sigma \)algebra measurable subsets of \( S \). Typically in applications, \( S \) is defined by a finite number of inequalities involving elementary function.
Here is our first fundamental definition.
The probability measure \( \P \) is continuous if \(\P(\{x\}) = 0\) for all \(x \in S\).
The fact that each point is assigned probability 0 might seem impossible or paradoxical at first, but soon we will see very familiar analogies.
If \(\P\) is a continuous distribtion then \(\P(C) = 0\) for every countable \(C \subseteq S\).
Proof
Since \(C\) is countable, it follows from the additivity axiom of probability that
\[ \P(C) = \sum_{x \in C} \P(\{x\}) = 0 \]Thus, continuous distributions are in complete contrast with discrete distributions, for which all of the probability mass is concentrated on the points in a discrete set. For a continuous distribution, the probability mass is continuously spread over \(S\) in some sense. In the picture below, the light blue shading is intended to suggest a continuous distribution of probability.
Typically, \( S \) is a region of \( \R^n \) defined by inequalities involving elementary functions, for example an interval in \(\R\), a circular region in \(\R^2\), and a conical region in \(\R^3\). Suppose that \(\P\) is a continuous probability measure on \(S\). The fact that each point in \( S \) has probability 0 is conceptually the same as the fact that an interval of \(\R\) can have positive length even though it is composed of points each of which has 0 length. Similarly, a region of \(\R^2\) can have positive area even though it is composed of points (or curves) each of which has area 0. In the onedimensional case, continuous distributions are used to model random variables that take values in intervals of \( \R \), variables that can, in principle, be measured with any degree of accuracy. Such variables abound in applications and include
 length, area, volume, and distance
 time
 mass and weight
 charge, voltage, and current
 resistance, capacitance, and inductance
 velocity and acceleration
 energy, force, and work
Usually a continuous distribution can usually be described by certain type of function.
Suppose again that \(\P\) is a continuous distribution on \(S\). A function \(f: S \to [0, \infty)\) is a probability density function for \(\P\) if \[\P(A) = \int_A f(x) \, dx, \quad A \in \mathscr S\]
Details
Technically, \( f \) must be measurable and is a probability density function of \( \P \) with respect to Lebesgue measure, the standard measure on \( \R^n \). Moreover, the integral is the Lebesgue integral, but the ordinary Riemann integral of calculus suffices for the sets that occur in typical applications.
So the probability distribution \(\P\) is completely determined by the probability density function \(f\). As a special case, note that \(\int_S f(x) \, dx = \P(S) = 1\). Conversely, a nonnegative function on \(S\) with this property defines a probability measure.
A function \(f: S \to [0, \infty)\) that satisfies \(\int_S f(x) \, dx = 1\) is a probability density function on \(S\) and then \(\P\) defined as follows is a continuous probability measure on \(S\): \[\P(A) = \int_A f(x) \, dx, \quad A \in \mathscr S\]
Proof
Note that we can always extend \(f\) to a probability density function on a subset of \(\R^n\) that contains \(S\), or to all of \(\R^n\), by defining \(f(x) = 0\) for \(x \notin S\). This extension sometimes simplifies notation. Put another way, we can be a bit sloppy about the set of values
of the random variable. So for example if \(a, \, b \in \R\) with \(a \lt b\) and \(X\) has a continuous distribution on the interval \([a, b]\), then we could also say that \(X\) has a continuous distribution on \((a, b)\) or \([a, b)\), or \((a, b]\).
The points \( x \in S \) that maximize the probability density function \( f \) are important, just as in the discrete case.
Suppose that \(\P\) is a continuous distribution on \(S\) with probability density function \(f\). An element \(x \in S\) that maximizes \(f\) is a mode of the distribution.
If there is only one mode, it is sometimes used as a measure of the center of the distribution.
You have probably noticed that probability density functions for continuous distributions are analogous to probability density functions for discrete distributions, with integrals replacing sums. However, there are essential differences. First, every discrete distribution has a unique probability density function \(f\) given by \(f(x) = \P(\{x\})\) for \(x \in S\). For a continuous distribution, the existence of a probability density function is not guaranteed. The advanced section on absolute continuity and density functions has several examples of continuous distribution that do not have density functions, and gives conditions that are necessary and sufficient for the existence of a probability density function. Even if a probability density function \(f\) exists, it is never unique. Note that the values of \(f\) on a finite (or even countably infinite) set of points could be changed to other nonnegative values and the new function would still be a probability density function for the same distribution. The critical fact is that only integrals of \(f\) are important. Second, the values of the PDF \(f\) for a discrete distribution are probabilities, and in particular \(f(x) \le 1\) for \(x \in S\). For a continuous distribution the values are not probabilities and in fact it's possible that \(f(x) \gt 1\) for some or even all \(x \in S\). Further, \(f\) can be unbounded on \(S\). In the typical calculus interpretation, \(f(x)\) really is probability density at \(x\). That is, \(f(x) \, dx\) is approximately the probability of a small
region of size \(dx\) about \(x\).
Constructing Probability Density Functions
Just as in the discrete case, a nonnegative function on \( S \) can often be scaled to produce a produce a probability density function.
Suppose that \(g: S \to [0, \infty)\) and let \[c = \int_S g(x) \, dx\] If \(0 \lt c \lt \infty\) then \(f\) defined by \(f(x) = \frac{1}{c} g(x)\) for \(x \in S\) defines a probability density function for a continuous distribution on \(S\).
Proof
Technically, the function \( g \) is measurable. Technicalities aside, the proof is trivial. Clearly \( f(x) \ge 0 \) for \( x \in S \) and \[ \int_S f(x) \, dx = \frac{1}{c} \int_S g(x) \, dx = \frac{c}{c} = 1 \]
Note again that \(f\) is just a scaled version of \(g\). So this result can be used to construct probability density functions with desired properties (domain, shape, symmetry, and so on). The constant \(c\) is sometimes called the normalizing constant of \(g\).
Conditional Densities
Suppose now that \(X\) is a random variable defined on a probability space \( (\Omega, \mathscr F, \P) \) and that \( X \) has a continuous distribution on \(S\). A probability density function for \(X\) is based on the underlying probability measure on the sample space \((\Omega, \mathscr F)\). This measure could be a conditional probability measure, conditioned on a given event \(E \in \mathscr F\) with \(\P(E) \gt 0\). Assuming that the conditional probability density function exists, the usual notation is \[f(x \mid E), \quad x \in S\] Note, however, that except for notation, no new concepts are involved. The defining property is \[\int_A f(x \mid E) \, dx = \P(X \in A \mid E), \quad A \in \mathscr S\] and all results that hold for probability density functions in general hold for conditional probability density functions. The event \( E \) could be an event described in terms of the random variable \( X \) itself:
Suppose that \( X \) has a continuous distribution on \(S\) with probability density function \( f \) and that \(B \in \mathscr S\) with \(\P(X \in B) \gt 0\). The conditional probability density function of \(X\) given \(X \in B\) is the function on \(B\) given by \[f(x \mid X \in B) = \frac{f(x)}{\P(X \in B)}, \quad x \in B \]
Proof
For \(A \in \mathscr S\) with \(A \subseteq B\), \[ \int_A \frac{f(x)}{\P(X \in B)} \, dx = \frac{1}{\P(X \in B)} \int_A f(x) \, dx = \frac{\P(X \in A)}{\P(X \in B)} = \P(X \in A \mid X \in B) \]
Of course, \( \P(X \in B) = \int_B f(x) \, dx \) and hence is the normaliziang constant for the restriction of \( f \) to \( B \), as in (8)
Examples and Applications
As always, try the problems yourself before looking at the answers.
The Exponential Distribution
Let \(f\) be the function defined by \(f(t) = r e^{r t}\) for \(t \in [0, \infty) \), where \(r \in (0, \infty)\) is a parameter.
 Show that \( f \) is a probability density function.
 Draw a careful sketch of the graph of \( f \), and state the important qualitative features.
Proof
 Note that \( f(t) \gt 0 \) for \( t \ge 0 \). Also \( \int_0^\infty e^{r t} \, dt = \frac{1}{r} \) so \( f \) is a PDF.
 \( f \) is decreasing and concave upward so the mode is 0. \( f(x) \to 0 \) as \( x \to \infty \).
The distribution defined by the probability density function in the previous exercise is called the exponential distribution with rate parameter \(r\). This distribution is frequently used to model random times, under certain assumptions. Specifically, in the Poisson model of random points in time, the times between successive arrivals have independent exponential distributions, and the parameter \(r\) is the average rate of arrivals. The exponential distribution is studied in detail in the chapter on Poisson Processes.
The lifetime \(T\) of a certain device (in 1000 hour units) has the exponential distribution with parameter \(r = \frac{1}{2}\). Find
 \(\P(T \gt 2)\)
 \(\P(T \gt 3 \mid T \gt 1)\)
Answer
 \(e^{1} \approx 0.3679\)
 \(e^{1} \approx 0.3679\)
In the gamma experiment, set \( n =1 \) to get the exponential distribution. Vary the rate parameter \( r \) and note the shape of the probability density function. For various values of \(r\), run the simulation 1000 times and compare the the empirical density function with the probability density function.
A Random Angle
In Bertrand's problem, a certain random angle \(\Theta\) has probability density function \(f\) given by \(f(\theta) = \sin \theta\) for \(\theta \in \left[0, \frac{\pi}{2}\right]\).
 Show that \(f\) is a probability density function.
 Draw a careful sketch of the graph \(f\), and state the important qualitative features.
 Find \(\P\left(\Theta \lt \frac{\pi}{4}\right)\).
Answer
 Note that \( \sin \theta \ge 0 \) for \( 0 \le \theta \le \frac{\pi}{2} \) and \( \int_0^{\pi/2} \sin \theta \, d\theta = 1 \).
 \( f \) is increasing and concave downward so the mode is \(\frac{\pi}{2}\).
 \(1  \frac{1}{\sqrt{2}} \approx 0.2929\)
Bertand's problem is named for Joseph Louis Bertrand and is studied in more detail in the chapter on Geometric Models.
In Bertrand's experiment, select the model with uniform distance. Run the simulation 1000 times and compute the empirical probability of the event \(\left\{\Theta \lt \frac{\pi}{4}\right\}\). Compare with the true probability in the previous exercise.
Gamma Distributions
Let \(g_n\) be the function defined by \(g_n(t) = e^{t} \frac{t^n}{n!}\) for \(t \in [0, \infty)\) where \(n \in \N\) is a parameter.
 Show that \(g_n\) is a probability density function for each \(n \in \N\).
 Draw a careful sketch of the graph of \(g_n\), and state the important qualitative features.
Proof
 Note that \( g_n(t) \ge 0 \) for \( t \ge 0 \). Also, \( g_0 \) is the probability density function of the exponential distribution with parameter 1. For \( n \in \N_+ \), integration by parts with \( u = t^n / n! \) and \( dv = e^{t} dt \) gives \( \int_0^\infty g_n(t) \, dt = \int_0^\infty g_{n1}(t) \, dt \). Hence it follows by induction that \( g_n \) is a PDF for each \( n \in \N_+ \).
 \( g_0 \) is decreasing and concave downward, with mode \( t = 0 \). For \( n \gt 0 \), \( g_n \) increases and then decreases, with mode \( t = n \). \( g_1 \) is concave downward and then upward, with inflection point at \( t = 2 \). For \( n \gt 1 \), \( g_n \) is concave upward, then downward, then upward again, with inflection points at \( n \pm \sqrt{n} \). For all \( n \in \N \), \( g_n(t) \to 0 \) as \( t \to \infty \).
Interestingly, we showed in the last section on discrete distributions, that \(f_t(n) = g_n(t)\) is a probability density function on \(\N\) for each \(t \ge 0\) (it's the Poisson distribution with parameter \(t\)). The distribution defined by the probability density function \(g_n\) belongs to the family of Erlang distributions, named for Agner Erlang; \( n + 1 \) is known as the shape parameter. The Erlang distribution is studied in more detail in the chapter on the Poisson Process. In turn the Erlang distribution belongs to the more general family of gamma distributions. The gamma distribution is studied in more detail in the chapter on Special Distributions.
In the gamma experiment, keep the default rate parameter \(r = 1\). Vary the shape parameter and note the shape and location of the probability density function. For various values of the shape parameter, run the simulation 1000 times and compare the empirical density function with the probability density function.
Suppose that the lifetime of a device \(T\) (in 1000 hour units) has the gamma distribution above with \(n = 2\). Find each of the following:
 \(\P(T \gt 3)\).
 \( \P(T \le 2) \)
 \( \P(1 \le T \le 4) \)
Answer
 \(\frac{17}{2} e^{3} \approx 0.4232\)
 \( 1  5 e^{2} \approx 0.3233 \)
 \( \frac{5}{2} e^{1}  13 e^{4} \approx 0.6816 \)
Beta Distributions
Let \(f\) be the function defined by \(f(x) = 6 x (1  x)\) for \(x \in [0, 1]\).
 Show that \( f \) is a probability density function.
 Draw a careful sketch of the graph of \(f\), and state the important qualitative features.
Answer
 Note that \( f(x) \ge 0 \) for \( x \in [0, 1] \). Also, \( \int_0^1 x (1  x) \, dx = \frac{1}{6} \), so \( f \) is a PDF
 \( f \) increases and then decreases, with mode at \(x = \frac{1}{2} \). \( f \) is concave downward. \( f \) is symmetric about \( x = \frac{1}{2} \) (in fact, the graph is a parabola).
Let \(f\) be the function defined by \(f(x) = 12 x^2 (1  x)\) for \(x \in [0, 1]\).
 Show that \( f \) is a probability density function.
 Draw a careful sketch the graph of \(f\), and state the important qualitative features.
Answer
 Note that \( f(x) \ge 0 \) for \( 0 \le x \le 1 \). Also \( \int_0^1 x^2 (1  x) \, dx = \frac{1}{12} \), so \( f \) is a PDF.
 \( f \) increases and then decreases, with mode at \(x = \frac{2}{3}\). \( f \) is concave upward and then downward, with inflection point at \(x = \frac{1}{3}\).
The distributions defined in the last two exercises are examples of beta distributions. These distributions are widely used to model random proportions and probabilities, and physical quantities that take values in bounded intervals (which, after a change of units, can be taken to be \( [0, 1] \)). Beta distributions are studied in detail in the chapter on Special Distributions.
In the special distribution simulator, select the beta distribution. For the following parameter values, note the shape of the probability density function. Run the simulation 1000 times and compare the empirical density function with the probability density function.
Suppose that \( P \) is a random proportion. Find \( \P\left(\frac{1}{4} \le P \le \frac{3}{4}\right) \) in each of the following cases:
Answer
 \(\frac{11}{16}\)
 \(\frac{11}{16}\)
Let \( f \) be the function defined by \[f(x) = \frac{1}{\pi \sqrt{x (1  x)}}, \quad x \in (0, 1)\]
 Show that \( f \) is a probability density function.
 Draw a careful sketch of the graph of \(f\), and state the important qualitative features.
Answer
 Note that \( f(x) \gt 0 \) for \( 0 \lt x \lt 1 \). Using the substitution \( u = \sqrt{x} \) givens \[ \int_0^1 \frac{1}{\sqrt{x (1  x)}} \, dx = \int_0^1 \frac{2}{\sqrt{1  u^2}} \, du = 2 \arcsin u \biggm_0^1 = \pi \] Thus \( f \) is a PDF.
 \( f \) is symmetric about \( x = \frac{1}{2} \). \( f \) decreases and then increases, with minimum at \( x = \frac{1}{2} \). \( f(x) \to \infty \) as \( x \downarrow 0 \) and as \( x \uparrow 1 \) so the distribution has no mode. \( f \) is concave upward.
The distribution defined in the last exercise is also a member of the beta family of distributions. But it is also known as the (standard) arcsine distribution, because of the arcsine function that arises in the proof that \( f \) is a probability density function. The arcsine distribution has applications to a very important random process known as Brownian motion, named for the Scottish botanist Robert Brown. Arcsine distributions are studied in more generality in the chapter on Special Distributions.
In the special distribution simulator, select the (continuous) arcsine distribution and keep the default parameter values. Run the simulation 1000 times and compare the empirical density function with the probability density function.
Suppose that \( X_t \) represents the change in the price of a stock at time \( t \), relative to the value at an initial reference time 0. We treat \( t \) as a continuous variable measured in weeks. Let \( T = \max\left\{t \in [0, 1]: X_t = 0\right\} \), the last time during the first week that the stock price was unchanged over its initial value. Under certain ideal conditions, \( T \) will have the arcsine distribution. Find each of the following:
 \( \P\left(T \lt \frac{1}{4}\right)\)
 \( \P\left(T \ge \frac{1}{2}\right) \)
 \( \P\left(T \le \frac{3}{4}\right) \)
Answer
 \( \frac{1}{3} \)
 \( \frac{1}{2} \)
 \( \frac{2}{3} \)
Open the Brownian motion experiment and select the last zero variable. Run the experiment in single step mode a few times. The random process that you observe models the price of the stock in the previous exercise. Now run the experiment 1000 times and compute the empirical probability of each event in the previous exercise.
The Pareto Distribution
Let \(g\) be the function defined by \(g(x) = 1 /x^b\) for \(x \in [1, \infty)\), where \(b \in (0, \infty)\) is a parameter.
 Draw a careful sketch the graph of \(g\), and state the important qualitative features.
 Find the values of \( b \) for which there exists a probability density function \( f \) (8)proportional to \(g\). Identify the mode.
Answer
 \( g \) is decreasing and concave upward, with \( g(x) \to 0 \) as \( x \to \infty \).
 Note that if \( b \ne 1 \) \[\int_1^\infty x^{b} \, dx = \frac{x^{1  b}}{1  b} \biggm_1^\infty = \begin{cases} \infty, & 0 \lt b \lt 1 \\ \frac{1}{b  1}, & 1 \lt b \lt \infty \end{cases} \] When \( b = 1 \) we have \( \int_1^\infty x^{1} \, dx = \ln x \biggm_1^\infty = \infty \). Thus, when \( 0 \lt b \le 1 \), there is no PDF proportional to \( g \). When \( b \gt 1 \), the PDF proportional to \( g \) is \( f(x) = \frac{b  1}{x^b} \) for \( x \in [1, \infty) \). The mode is 1.
Note that the qualitative features of \( g \) are the same, regardless of the value of the parameter \( b \gt 0 \), but only when \( b \gt 1 \) can \( g \) be normalized into a probability density function. In this case, the distribution is known as the Pareto distribution, named for Vilfredo Pareto. The parameter \(a = b  1\), so that \(a \gt 0\), is known as the shape parameter. Thus, the Pareto distribution with shape parameter \(a\) has probability density function \[f(x) = \frac{a}{x^{a+1}}, \quad x \in [1, \infty)\] The Pareto distribution is widely used to model certain economic variables and is studied in detail in the chapter on Special Distributions.
In the special distribution simulator, select the Pareto distribution. Leave the scale parameter fixed, but vary the shape parameter, and note the shape of the probability density function. For various values of the shape parameter, run the simulation 1000 times and compare the empirical density function with the probability density function.
Suppose that the income \(X\) (in appropriate units) of a person randomly selected from a population has the Pareto distribution with shape parameter \(a = 2\). Find each of the following:
 \(\P(X \gt 2)\)
 \( \P(X \le 4) \)
 \( \P(3 \le X \le 5) \)
Answer
 \(\frac{1}{4}\)
 \( \frac{15}{16} \)
 \( \frac{16}{225} \)
The Cauchy Distribution
Let \( f \) be the function defined by \[f(x) = \frac{1}{\pi (x^2 + 1)}, \quad x \in \R\]
 Show that \( f \) is a probability density function.
 Draw a careful sketch the graph of \(f\), and state the important qualitative features.
Answer
 Note that \( f(x) \gt 0 \) for \( x \in \R \). Also \[ \int_{\infty}^\infty \frac{1}{1 + x^2} \, dx = \arctan x \biggm_{\infty}^\infty = \pi \] and hence \( f \) is a PDF.
 \( f \) increases and then decreases, with mode \(x = 0\). \( f \) is concave upward, then downward, then upward again, with inflection points at \(x = \pm \frac{1}{\sqrt{3}}\). \( f \) is symmetric about \( x = 0 \).
The distribution constructed in the previous exercise is known as the (standard) Cauchy distribution, named after Augustin Cauchy It might also be called the arctangent distribution, because of the appearance of the arctangent function in the proof that \( f \) is a probability density function. In this regard, note the similarity to the arcsine distribution above. The Cauchy distribution is studied in more generality in the chapter on Special Distributions. Note also that the Cauchy distribution is obtained by normalizing the function \(x \mapsto \frac{1}{1 + x^2}\); the graph of this function is known as the witch of Agnesi, in honor of Maria Agnesi.
In the special distribution simulator, select the Cauchy distribution with the default parameter values. Run the simulation 1000 times and compare the empirical density function with the probability density function.
A light source is 1 meter away from position 0 on an infinite, straight wall. The angle \( \Theta \) that the light beam makes with the perpendicular to the wall is randomly chosen from the interval \( \left(\frac{\pi}{2}, \frac{\pi}{2}\right) \). The position \( X = \tan(\Theta) \) of the light beam on the wall has the standard Cauchy distribution. Find each of the following:
 \( \P(1 \lt X \lt 1) \).
 \( \P\left(X \ge \frac{1}{\sqrt{3}}\right)\)
 \( \P(X \le \sqrt{3}) \)
Answer
 \( \frac{1}{2} \)
 \( \frac{1}{3} \)
 \(\frac{2}{3}\)
The Cauchy experiment (with the default parameter values) is a simulation of the experiment in the last exercise.
 Run the experiment a few times in single step mode.
 Run the experiment 1000 times and compare the empirical density function with the probability density function.
 Using the data from (b), compute the relative frequency of each event in the previous exercise, and compare with the true probability.
The Standard Normal Distribution
Let \(\phi\) be the function defined by \(\phi(z) = \frac{1}{\sqrt{2 \pi}} e^{z^2/2}\) for \(z \in \R\).
 Show that \( \phi \) is a probability density function.
 Draw a careful sketch the graph of \(\phi\), and state the important qualitative features.
Proof
 Note that \( \phi(z) \gt 0 \) for \( z \in \R \). Let \(c = \int_{\infty}^\infty e^{z^2 / 2} \, dz\). Then \[ c^2 = \int_{\infty}^\infty e^{x^2/2} \, dx \int_{\infty}^\infty e^{y^2/2} \, dy = \int_{\infty}^\infty \int_{\infty}^\infty e^{(x^2 + y^2) / 2} \, dx \, dy \] Change to polar coordinates: \(x = r \cos \theta\), \(y = r \sin \theta\) where \(r \in [0, \infty)\) and \(\theta \in [0, 2 \pi)\). Then \(x^2 + y^2 = r^2\) and \(dx \, dy = r \, dr \, d\theta\). Hence \[ c^2 = \int_0^{2 \pi} \int_0^\infty r e^{r^2 / 2} \, dr \, d\theta \] Using the simple substitution \(u = r^2\), the inner integral is \(\int_0^\infty e^{u} du = 1\). Then the outer integral is \(\int_0^{2\pi} 1 \, d\theta = 2 \pi\). Hence \( c = \sqrt{2 \pi} \) and so \( f \) is a PDF.
 Note that \( \phi \) is symmetric about 0. \( \phi \) increases and then decreases, with mode \( z = 0 \). \( \phi \) is concave upward, then downward, then upward again, with inflection points at \(z = \pm 1 \). \( \phi(z) \to 0 \) as \( z \to \infty \) and as \( z \to \infty \).
The distribution defined in the last exercise is the standard normal distribution, perhaps the most important distribution in probability and statistics. It's importance stems largely from the central limit theorem, one of the fundamental theorems in probability. In particular, normal distributions are widely used to model physical measurements that are subject to small, random errors. The family of normal distributions is studied in more generality in the chapter on Special Distributions.
In the special distribution simulator, select the normal distribution and keep the default parameter values. Run the simulation 1000 times and compare the empirical density function and the probability density function.
The function \(z \mapsto e^{z^2 / 2}\) is a notorious example of an integrable function that does not have an antiderivative that can be expressed in closed form in terms of other elementary functions. (That's why we had to resort to the polar coordinate trick to show that \(\phi\) is a probability density function.) So probabilities involving the normal distribution are usually computed using mathematical or statistical software.
Suppose that the error \( Z \) in the length of a certain machined part (in millimeters) has the standard normal distribution. Use mathematical software to approximate each of the following:
 \( \P(1 \le Z \le 1) \)
 \( \P(Z \gt 2) \)
 \( \P(Z \lt 3) \)
Answer
 0.6827
 0.0228
 0.0013
The Extreme Value Distribution
Let \(f\) be the function defined by \(f(x) = e^{x} e^{e^{x}}\) for \(x \in \R\).
 Show that \(f\) is a probability density function.
 Draw a careful sketch of the graph of \(f\), and state the important qualitative features.
 Find \(\P(X \gt 0)\), where \(X\) has probability density function \(f\).
Answer
 Note that \( f(x) \gt 0 \) for \( x \in \R \). Using the substitution \( u = e^{x} \), \[ \int_{\infty}^\infty e^{x} e^{e^{x}} \, dx = \int_0^\infty e^{u} \, du = 1 \] (note that the integrand in the last integral is the exponential PDF with parameter 1.
 \( f \) increases and then decreases, with mode \(x = 0\). \( f \) is concave upward, then downward, then upward again, with inflection points at \(x = \pm \ln\left[\left(3 + \sqrt{5}\right)\middle/2\right] \). Note however that \( f \) is not symmetric about 0. \( f(x) \to 0 \) as \( x \to \infty \) and as \( x \to \infty \).
 \(1  e^{1} \approx 0.6321\)
The distribution in the last exercise is the (standard) type 1 extreme value distribution, also known as the Gumbel distribution in honor of Emil Gumbel. Extreme value distributions are studied in more generality in the chapter on Special Distributions.
In the special distribution simulator, select the extreme value distribution. Keep the default parameter values and note the shape and location of the probability density function. Run the simulation 1000 times and compare the empirical density function with the probability density function.
The Logistic Distribution
Let \( f \) be the function defined by \[f(x) = \frac{e^x}{(1 + e^x)^2}, \quad x \in \R\]
 Show that \(f\) is a probability density function.
 Draw a careful sketch the graph of \(f\), and state the important qualitative features.
 Find \(\P(X \gt 1)\), where \(X\) has probability density function \(f\).
Answer
 Note that \( f(x) \gt 0 \) for \( x \in \R \). The substitution \( u = e^x \) gives \[ \int_{\infty}^\infty f(x) \, dx = \int_0^\infty \frac{1}{(1 + u)^2} \, du = 1 \]
 \( f \) is symmetric about 0. \( f \) increases and then decreases with mode \(x = 0\). \( f \) is concave upward, then downward, then upward again, with inflection points at \(x = \pm \ln\left(2 + \sqrt{3}\right)\). \( f(x) \to 0 \) as \( x \to \infty \) and as \( x \to \infty \).
 \(\frac{1}{1 + e} \approx 0.2689\)
The distribution in the last exercise is the (standard) logistic distribution. Logistic distributions are studied in more generality in the chapter on Special Distributions.
In the special distribution simulator, select the logistic distribution. Keep the default parameter values and note the shape and location of the probability density function. Run the simulation 1000 times and compare the empirical density function with the probability density function.
Weibull Distributions
Let \(f\) be the function defined by \(f(t) = 2 t e^{t^2}\) for \( t \in [0, \infty) \).
 Show that \(f\) is a probability density function.
 Draw a careful sketch the graph of \(f\), and state the important qualitative features.
Answer
 Note that \( f(t) \ge 0 \) for \( t \ge 0 \). The substitution \( u = t^2 \) gives \( \int_0^\infty f(t) \, dt = \int_0^\infty e^{u} \, du = 1 \).
 \( f \) increases and then decreases, with mode \(t = 1/\sqrt{2} \). \( f \) is concave downward and then upward, with inflection point at \(t = \sqrt{3/2}\). \( f(t) \to 0 \) as \( t \to \infty \).
Let \(f\) be the function defined by \(f(t) = 3 t^2 e^{t^3}\) for \(t \ge 0\).
 Show that \(f\) is a probability density function.
 Draw a careful sketch the graph of \(f\), and state the important qualitative features.
Answer
 Note that \( f(t) \ge 0 \) for \( t \ge 0 \). The substitution \( u = t^3 \) gives \[ \int_0^\infty f(t) \, dt = \int_0^\infty e^{u} \, du = 1 \]
 \( f \) increases and then decreases, with mode \(t = \left(\frac{2}{3}\right)^{1/3}\). \( f \) is concave upward, then downward, then upward again, with inflection points at \( t = \left(1 \pm \frac{1}{3}\sqrt{7}\right)^{1/3} \). \( f(t) \to 0 \) as \( t \to \infty \).
The distributions in the last two exercises are examples of Weibull distributions, name for Waloddi Weibull. Weibull distributions are studied in more generality in the chapter on Special Distributions. They are often used to model random failure times of devices (in appropriately scaled units).
In the special distribution simulator, select the Weibull distribution. For each of the following values of the shape parameter \(k\), note the shape and location of the probability density function. Run the simulation 1000 times and compare the empirical density function with the probability density function.
Suppose that \( T \) is the failure time of a device (in 1000 hour units). Find \( \P\left(T \gt \frac{1}{2}\right) \) in each of the following cases:
 \( T \) has the first Weibull distribution above.
 \( T \) has the second Weibull distribution above.
Answer
 \(e^{1/4} \approx 0.7788\)
 \(e^{1/8} \approx 0.8825\)
Additional Examples
Let \(f\) be the function defined by \(f(x) = \ln x\) for \(x \in (0, 1]\).
 Show that \(f\) is a probability density function.
 Draw a careful sketch of the graph of \(f\), and state the important qualitative features.
 Find \(\P\left(\frac{1}{3} \le X \le \frac{1}{2}\right)\) where \(X\) has the probability density function in (a).
Answer
 Note that \( \ln x \ge 0 \) for \(0 \lt x \le 1\). Integration by parts with \( u = \ln x \) and \( dv = dx \) gives \[ \int_0^1 \ln x \, dx = x \ln x \biggm_0^1 + \int_0^1 1 \, dx = 1 \]
 \( f \) is decreasing and concave upward, with \( f(x) \to \infty \) as \( x \downarrow 0 \), so there is no mode.
 \(\frac{1}{2} \ln 2  \frac{1}{3} \ln 3 + \frac{1}{6} \approx 0.147\)
Let \(f\) be the function defined by \(f(x) = 2 e^{x} (1  e^{x})\) for \(x \in [0, \infty)\).
 Show that \( f \) is a probability density function.
 Draw a careful sketch of the graph of \(f\), and give the important qualitative features.
 Find \(\P(X \ge 1)\) where \(X\) has the probability density function in (a).
Answer
 Note that \( f(x) \gt 0 \) for \( 0 \lt x \lt \infty. \). Also, \( \int_0^\infty \left(e^{x}  e^{2 x}\right) \, dx = \frac{1}{2} \), so \( f \) is a PDF.
 \( f \) increases and then decreases, with mode \( x = \ln(2) \). \( f \) is concave downward and then upward, with an inflection point at \( x = \ln(4) \). \( f(x) \to 0 \) as \( x \to \infty \).
 \(2 e^{1}  e^{2} \approx 0.6004 \)
The following problems deal with two and three dimensional random vectors having continuous distributions. The idea of normalizing a function to form a probability density function is important for some of the problems. The relationship between the distribution of a vector and the distribution of its components will be discussed later, in the section on joint distributions.
Let \(f\) be the function defined by \(f(x, y) = x + y\) for \(0 \le x \le 1\), \(0 \le y \le 1\).
 Show that \(f\) is a probability density function, and identify the mode.
 Find \(\P(Y \ge X)\) where \((X, Y)\) has the probability density function in (a).
 Find the conditional density of \((X, Y)\) given \(\left\{X \lt \frac{1}{2}, Y \lt \frac{1}{2}\right\}\).
Answer
 mode \( (1, 1) \)
 \(\frac{1}{2}\)
 \(f\left(x, y \bigm X \lt \frac{1}{2}, Y \lt \frac{1}{2}\right) = 8 (x + y)\) for \(0 \lt x \lt \frac{1}{2}\), \(0 \lt y \lt \frac{1}{2}\)
Let \(g\) be the function defined by \(g(x, y) = x + y\) for \(0 \le x \le y \le 1\).
 Find the probability density function \(f\) that is proportional to \(g\).
 Find \(\P(Y \ge 2 X)\) where \((X, Y)\) has the probability density function in (a).
Answer
 \(f(x,y) = 2(x + y)\), \(0 \le x \le y \le 1\)
 \(\frac{5}{12}\)
Let \(g\) be the function defined by \(g(x, y) = x^2 y\) for \(0 \le x \le 1\), \(0 \le y \le 1\).
 Find the probability density function \(f\) that is proportional to \(g\).
 Find \(\P(Y \ge X)\) where \((X, Y)\) has the probability density function in (a).
Answer
 \(f(x,y) = 6 x^2 y\) for \(0 \le x \le 1\), \(0 \le y \le 1\)
 \(\frac{2}{5}\)
Let \(g\) be the function defined by \(g(x, y) = x^2 y\) for \(0 \le x \le y \le 1\).
 Find the probability density function \(f\) that is proportional to \(g\).
 Find \(P(Y \ge 2 X)\) where \((X, Y)\) has the probability density function in (a).
Answer
 \(f(x,y) = 15 x^2 y\) for \(0 \le x \le y \le 1\)
 \(\frac{1}{8}\)
Let \(g\) be the function defined by \(g(x, y, z) = x + 2 y + 3 z\) for \(0 \le x \le 1\), \(0 \le y \le 1\), \(0 \le z \le 1\).
 Find the probability density function \(f\) that is proportional to \(g\).
 Find \(\P(X \le Y \le Z)\) where \((X, Y, Z)\) has the probability density function in (a).
Answer
 \(f(x, y, z) = \frac{1}{3}(x + 2 y + 3 z)\) for \(0 \le x \le 1\), \(0 \le y \le 1\), \(0 \le z \le 1\)
 \(\frac{7}{36}\)
Let \(g\) be the function defined by \(g(x, y) = e^{x} e^{y}\) for \(0 \le x \le y \lt \infty\).
 Find the probability density function \(f\) that is proportional to \( g \).
 Find \(\P(X + Y \lt 1)\) where \((X, Y)\) has the probability density function in (a).
Answer
 \(f(x,y) = 2 e^{x} e^{y}\), \(0 \lt x \lt y \lt \infty\)
 \(1  2 e^{1} \approx 0.2642\)
Continuous Uniform Distributions
Our next discussion will focus on an important class of continuous distributions that are defined purely in terms of geometry. We need a preliminary definition.
For \(n \in \N_+\), the standard measure \(\lambda_n\) on \(\R^n\) is given by \[\lambda_n(A) = \int_A 1 \, dx, \quad A \subseteq \R^n\] In particular, \(\lambda_1(A)\) is the length of \(A \subseteq \R\), \(\lambda_2(A)\) is the area of \(A \subseteq \R^2\), and \(\lambda_3(A)\) is the volume of \(A \subseteq \R^3\).
Details
Technically, \( \lambda_n \) is Lebesgue measure on the \( \sigma \)algebra of measurable subsets of \( \R^n \). The name is in honor of Henri Lebesgue. The representation above in terms of the standard Riemann integral of calculus works for the sets that occur in typical applications. For the remainder of this discussion, we assume that all subsets of \( \R^n \) that are mentioned are measurable
Note that if \(n \gt 1\), the integral above is a multiple integral. Generally, \(\lambda_n(A)\) is referred to as the \(n\)dimensional volumve of \(A \in \subseteq \R^n\).
Suppose that \(S \subseteq \R^n\) for some \( n \in \N_+ \) with \(0 \lt \lambda_n(S) \lt \infty\).
 the function \(f\) defined by \(f(x) = 1 \big/ \lambda_n(S)\) for \(x \in S\) is a probability density function on \(S\).
 The probability measure associated with \( f \) is given by \(\P(A) = \lambda_n(A) \big/ \lambda_n(S) \) for \(A \subseteq S\), and is known as the uniform distribution on \( S \).
Proof
The proof is simple: Clearly \( f(x) \gt 0 \) for \( x \in S \) and \[ \int_A f(x) \, dx = \frac{1}{\lambda_n(S)} \int_A 1 \, dx = \frac{\lambda_n(A)}{\lambda_n(S)}, \quad A \subseteq S \] In particular, when \( A = S \) we have \( \int_S f(x) \, dx = 1 \).
Note that the probability assigned to a set \(A \subseteq \R^n\) is proportional to the size of \(A\), as measured by \(\lambda_n\). Note also that in both the discrete and continuous cases, the uniform distribution on a set \(S\) has constant probability density function on \(S\). The uniform distribution on a set \( S \) governs a point \( X \) chosen at random
from \( S \), and in the continuous case, such distributions play a fundamental role in various Geometric Models. Uniform distributions are studied in more generality in the chapter on Special Distributions.
The most important special case is the uniform distribution on an interval \([a, b]\) where \(a, b \in \R\) and \(a \lt b\). In this case, the probability density function is \[f(x) = \frac{1}{b  a}, \quad a \le x \le b\] This distribution models a point chosen at random
from the interval. In particular, the uniform distribution on \([0, 1]\) is known as the standard uniform distribution, and is very important because of its simplicity and the fact that it can be transformed into a variety of other probability distributions on \(\R\). Almost all computer languages have procedures for simulating independent, standard uniform variables, which are called random numbers in this context.
Conditional distributions corresponding to a uniform distribution are also uniform.
Suppose that \(R \subseteq S \subseteq \R^n\) for some \( n \in \N_+ \), and that \(\lambda_n(R) \gt 0\) and \(\lambda_n(S) \lt \infty\). If \(\P\) is the uniform distribution on \(S\), then the conditional distribution given \(R\) is uniform on \(R\).
Proof
The proof is very simple: For \(A \subseteq R\),
\[ \P(A \mid R) = \frac{\P(A \cap R)}{\P(R)} = \frac{\P(A)}{\P(R)} = \frac{\lambda_n(A) \big/ \lambda_n(S)}{\lambda_n(R) \big/ \lambda_n(S)} = \frac{\lambda_n(A)}{\lambda_n(R)} \]The last theorem has important implications for simulations. If we can simulate a random variable that is uniformly distributed on a set, we can simulate a random variable that is uniformly distributed on a subset.
Suppose again that \(R \subseteq S \subseteq \R^n\) for some \( n \in \N_+ \), and that \(\lambda_n(R) \gt 0\) and \(\lambda_n(S) \lt \infty\). Suppose further that \(\bs X = (X_1, X_2, \ldots)\) is a sequence of independent random variables, each uniformly distributed on \(S\). Let \(N = \min\{k \in \N_+: X_k \in R\}\). Then
 \(N\) has the geometric distribution on \(\N_+\) with success parameter \(p = \lambda_n(R) \big/ \lambda_n(S)\).
 \(X_N\) is uniformly distributed on \(R\).
Proof
 Since the variables are unifromly distributed on \(S\), \(\P(X_k \in \R) = \lambda_n(R) / \lambda_n(S)\) for each \(k \in \N_+\). Since the variables are independent, each point is in \(R\) or not independently. Hence \(N\), the index of the first point to fall in \(R\), has the geometric distribution on \(\N_+\) with success probability \(p = \lambda_n(R) / \lambda_n(S)\). That is, \(\P(N = k) = (1  p)^{k1} p\) for \(k \in \N_+\).
 Note that \(p \in (0, 1]\), so \(\P(N \in \N_+) = 1\) and hence \(X_N\) is well defined. We know from our work on independence and conditional probability that the distribution of \(X_N\) is the same as the conditional distribution of \(X\) given \(X \in R\), which by the previous theorem, is uniformly distributed on \(R\).
Suppose in particular that \(S\) is a Cartesian product of \(n\) bounded intervals. It turns out to be quite easy to simulate a sequence of independent random variables \(\bs X = (X_1, X_2, \ldots)\) each of which is uniformly distributed on \(S\). Thus, the last theorem give an algorithm for simulating a random variable that is uniformly distributed on an irregularly shaped region \(R \subseteq S\) (assuming that we have an algorithm for recognizing when a point \(x \in \R^n\) falls in \(R\)). This method of simulation is known as the rejection method, and as we will see in subsequent sections, is more important that might first appear.
In the simple probability experiment, random points are uniformly distributed on the rectangular region \( S \). Move and resize the events \( A \) and \( B \) and note how the probabilities of the 16 events that can be constructed from \( A \) and \( B \) change. Run the experiment 1000 times and note the agreement between the relative frequencies of the events and the probabilities of the events.
Suppose that \( (X, Y) \) is uniformly distributed on the circular region of radius 5, centered at the origin. We can think of \( (X, Y) \) as the position of a dart thrown randomly
at a target. Let \( R = \sqrt{X^2 + Y^2} \), the distance from the center to \( (X, Y) \).
 Give the probability density function of \( (X, Y) \).
 Find \( \P(n \le R \le n + 1 \) for \( n \in \{0, 1, 2, 3, 4\} \).
Answer
 \( f(x, y) = \frac{1}{25 \pi} \) for \( \left\{(x, y) \in \R^2: x^2 + y^2 \le 25\right\} \)
 \( \P(n \le R \le n + 1) = \frac{2 n + 1}{25} \) for \( n \in \{0, 1, 2, 3, 4\} \)
Suppose that \((X, Y, Z)\) is uniformly distributed on the cube \(S = [0, 1]^3\). Find \(\P(X \lt Y \lt Z)\) in two ways:
 Using the probability density function.
 Using a combinatorial argument.
Answer
 \( \P(X \lt Y \lt Z) = \int_0^1 \int_0^z \int_0^y 1 \, dx \, dy \, dz = \frac{1}{6} \)
 Each of the 6 strict orderings of \( (X, Y, Z) \) are equally likely, so \( \P(X \lt Y \lt Z) = \frac{1}{6} \)
The time \(T\) (in minutes) required to perform a certain job is uniformly distributed over the interval \([15, 60]\).
 Find the probability that the job requires more than 30 minutes
 Given that the job is not finished after 30 minutes, find the probability that the job will require more than 15 additional minutes.
Answer
 \(\frac{2}{3}\)
 \(\frac{1}{6}\)
Data Analysis Exercises
If \(D\) is a data set from a variable \(X\) with a continuous distribution, then an empirical density function can be computed by partitioning the data range into subsets of small size, and then computing the probability density of points in each subset. Empirical probability density functions are studied in more detail in the chapter on Random Samples.
For the cicada data, \(BW\) denotes body weight (in grams), \(BL\) body length (in millimeters), and \(G\) gender (0 for female and 1 for male). Construct an empirical density function for each of the following and display each as a bar graph:
 \(BW\)
 \(BL\)
 \(BW\) given \(G = 0\)
Answer

BW \((0, 0.1]\) \((0.1, 0.2]\) \((0.2, 0.3]\) \((0.3, 0.4]\) Density 0.8654 5.8654 3.0769 0.1923 
BL \((15, 29]\) \((20, 25]\) \((25, 30]\) \((30, 35]\) Density 0.0058 0.1577 0.0346 0.0019 
BW \((0, 0.1]\) \((0.1, 0.2]\) \((0.2, 0.3]\) \((0.3, 0.4]\) Density given \(G = 0\) 0.3390 4.4068 5.0847 0.1695