Skip to main content
Statistics LibreTexts

2.2: Continuous Density Functions

In the previous section we have seen how to simulate experiments with a whole continuum of possible outcomes and have gained some experience in thinking about such experiments. Now we turn to the general problem of assigning probabilities to the outcomes and events in such experiments. We shall restrict our attention here to those experiments whose sample space can be taken as a suitably chosen subset of the line, the plane, or some other Euclidean space. We begin with some simple examples.


Example \(\PageIndex{1}\)

The spinner experiment described in Example 2.1 has the interval [0, 1) as the set of possible outcomes. We would like to construct a probability model in which each outcome is equally likely to occur. We saw that in such a model, it is necessary to assign the probability 0 to each outcome. This does not at all mean that the probability of every event must be zero. On the contrary, if we let the random variable X denote the outcome, then the probability

\[P(0\leq X\leq 1)\]

that the head of the spinner comes to rest somewhere in the circle, should be equal to 1. Also, the probability that it comes to rest in the upper half of the circle should be the same as for the lower half, so that

\[P\bigg( 0\leq X < \frac{1}{2} \bigg)=P\bigg( \frac{1}{2} \leq X <1\bigg) = \frac{1}{2}\]

More generally, in our model, we would like the equation

\[P(c\leq X < d) = d-c\]

to be true for every choice of c and d.

If we let E = [c, d], then we can write the above formula in the form

\[P(E) =\int_E f(x)dx\]

where f(x) is the constant function with value 1. This should remind the reader of the corresponding formula in the discrete case for the probability of an event:

\[P(E) - \sum_{\omega \in E} m(\omega)\]

The difference is that in the continuous case, the quantity being integrated, f(x), is not the probability of the outcome x. (However, if one uses infinitesimals, one can consider f(x) dx as the probability of the outcome x.) In the continuous case, we will use the following convention. If the set of outcomes is a set of real numbers, then the individual outcomes will be referred to by small Roman letters such as x. If the set of outcomes is a subset of R2 , then the individual outcomes will be denoted by (x, y). In either case, it may be more convenient to refer to an individual outcome by using ω, as in Chapter 1. Figure 2.11 shows the results of 1000 spins of the spinner. The function f(x) is also shown in the figure. The reader will note that the area under f(x) and above a given interval is approximately equal to the fraction of outcomes that fell in that interval. The function f(x) is called the density function of the random variable X. The fact that the area under f(x) and above an interval corresponds to a probability is the defining property of density functions. A precise definition of density functions will be given shortly.


Example \(\PageIndex{2}\)

A game of darts involves throwing a dart at a circular target of unit radius. Suppose we throw a dart once so that it hits the target, and we observe where it lands. To describe the possible outcomes of this experiment, it is natural to take as our sample space the set Ω of all the points in the target. It is convenient to describe these points by their rectangular coordinates, relative to a coordinate system with origin at the center of the target, so that each pair (x, y) of coordinates with \(x^2+y^2 ≤ 1\) describes a possible outcome of the experiment. Then \(Ω = \{ (x, y) : x^2 + y^2 ≤ 1 \}\) is a subset of the Euclidean plane, and the event E = { (x, y) : y > 0 }, for example, corresponds to the statement that the dart lands in the upper half of the target, and so forth. Unless there is reason to believe otherwise (and with experts at the game there may well be!), it is natural to assume that the coordinates are chosen at random. (When doing this with a computer, each coordinate is chosen uniformly from the interval [−1, 1]. If the resulting point does not lie inside the unit circle, the point is not counted.) Then the arguments used in the preceding example show that the probability of any elementary event, consisting of a single outcome, must be zero, and suggest that the probability of the event that the dart lands in any subset E of the target should be determined by what fraction of the target area lies in E. Thus,

\[ P(E) = \frac{\text{area of E}}{\text{area of target}} = \frac{\text{area of E}}{\pi}\]

This can be written in the form

\[P(E) = \int_E f(x)dx\]

where f(x) is the constant function with value 1/π. In particular, if E = { (x, y) : x 2 + y 2 ≤ a 2 } is the event that the dart lands within distance a < 1 of the center of the target, then

\[P(E) = \frac{\pi a^2}{\pi}=a^2\]

For example, the probability that the dart lies within a distance 1/2 of the center is 1/4.

Example \(\PageIndex{3}\)

In the dart game considered above, suppose that, instead of observing where the dart lands, we observe how far it lands from the center of the target. In this case, we take as our sample space the set Ω of all circles with centers at the center of the target. It is convenient to describe these circles by their radii, so that each circle is identified by its radius r, 0 ≤ r ≤ 1. In this way, we may regard Ω as the subset [0, 1] of the real line.

What probabilities should we assign to the events E of Ω? If

\[E = \{ r:0 \leq r\leq a \}, \]

then E occurs if the dart lands within a distance a of the center, that is, within the circle of radius a, and we saw in the previous example that under our assumptions the probability of this event is given by


More generally, if

\[ E = \{r:a\leq r \leq b \}\]

then by our basic assumptions,

\[\begin{array}{rcl} P(E) = P([a,b]) &=& P([0,b]) - P([0,a]) \\ &=& b^2-a^2 \\ &=& (b-a)(b+a) \\ &=& 2(b-a)\frac{(b+a)}{2}\end{array}\]

Thus, P(E) =2(length of E)(midpoint of E). Here we see that the probability assigned to the interval E depends not only on its length but also on its midpoint (i.e., not only on how long it is, but also on where it is). Roughly speaking, in this experiment, events of the form E = [a, b] are more likely if they are near the rim of the target and less likely if they are near the center. (A common experience for beginners! The conclusion might well be different if the beginner is replaced by an expert.)

Again we can simulate this by computer. We divide the target area into ten concentric regions of equal thickness. The computer program Darts throws n darts and records what fraction of the total falls in each of these concentric regions.

The program Areabargraph then plots a bar graph with the area of the ith bar equal to the fraction of the total falling in the ith region. Running the program for 1000 darts resulted in the bar graph of Figure 2.12.

Note that here the heights of the bars are not all equal, but grow approximately linearly with r. In fact, the linear function y = 2r appears to fit our bar graph quite well. This suggests that the probability that the dart falls within a distance a of the center should be given by the area under the graph of the function y = 2r between 0 and a. This area is a 2 , which agrees with the probability we have assigned above to this event.

Sample Space Coordinates

These examples suggest that for continuous experiments of this sort we should assign probabilities for the outcomes to fall in a given interval by means of the area under a suitable function.

More generally, we suppose that suitable coordinates can be introduced into the sample space Ω, so that we can regard Ω as a subset of \(\mathbb{R}^n\) . We call such a sample space a continuous sample space. We let X be a random variable which represents the outcome of the experiment. Such a random variable is called a continuous random variable. We then define a density function for X as follows.

Density Functions of Continuous Random Variables

Definition \(\PageIndex{1}\)

Let X be a continuous real-valued random variable. A density function for X is a real-valued function f which satisfies

\[P(a\leq X\leq b) = \int_a^bf(x)dx\]

for all a, b \(\in \mathbb{R}\)

We note that it is not the case that all continuous real-valued random variables possess density functions. However, in this book, we will only consider continuous random variables for which density functions exist. In terms of the density \(f(x)\), if E is a subset of \(\mathbb{R}\), then

\[P(X \in E) = \int_Ef(x)dx\]

The notation here assumes that E is a subset of \(\mathbb{R}\) for which \[\int_E f(x)dx\) makes sense.

 Example \(\PageIndex{10}\)