Skip to main content
Statistics LibreTexts

3.6: Expected Value of Discrete Random Variables

  • Page ID
    4372
  • In this section, and the next, we look at various numerical characteristics of random variables. These give us a way of classifying and comparing random variables.

     

    Expected Value of Discrete Random Variables

    We begin with the formal definition.

     

    Definition \(\PageIndex{1}\)

    If \(X\) is a discrete random variable with possible values \(x_1, x_2, \ldots, x_i, \ldots\), and probability mass function \(p(x)\), then the expected value (or mean) of \(X\) is denoted \(E[X]\) and given by

    $$E[X] = \sum_i x_i\cdot p(x_i).\label{expvalue}$$

    The expected value of \(X\) may also be denoted as \(\mu_X\) or simply \(\mu\) if the context is clear.

     

    The expected value of a random variable has many interpretations. First, looking at the formula in Definition 3.6.1 for computing expected value (Equation \ref{expvalue}), note that it is essentially a weighted average. Specifically, for a discrete random variable, the expected value is computed by "weighting'', or multiplying, each value of the random variable, \(x_i\), by the probability that the random variable takes that value, \(p(x_i)\), and then summing over all possible values. This interpretation of the expected value as a weighted average explains why it is also referred to as the mean of the random variable.

    The expected value of a random variable is also interpreted as the long-run value of the random variable. In other words, if we repeat the underlying random experiment several times and take the average of the values of the random variable corresponding to the outcomes, we would get the expected value, approximately. (Note: This interpretation of expected value is similar to the relative frequency approximation for probability discussed in Section 1.2.) Again, we see that the expected value is related to an average value of the random variable. Given the interpretation of the expected value as an average, either "weighted'' or "long-run'', the expected value is often referred to as a measure of center of the random variable.

    Finally, the expected value of a random variable has a graphical interpretation. The expected value gives the center of mass of the probability mass function, which the following example demonstrates.

     

    Example \(\PageIndex{1}\)

    Consider again the context of Example 1.1.1, where we recorded the sequence of heads and tails in two tosses of a fair coin. In Example 3.1.1 we defined the discrete random variable \(X\) to denote the number of heads obtained. In Example 3.2.2 we found the pmf of \(X\). We now apply Equation \ref{expvalue} from Definition 3.6.1 and compute the expected value of \(X\):
    $$E[X] = 0\cdot p(0) + 1\cdot p(1) + 2\cdot p(2) = 0\cdot(0.25) + 1\cdot(0.5) + 2\cdot(0.25) = 0.5 + 0.5 = 1.\notag$$
    Thus, we expect that the number of heads obtained in two tosses of a fair coin will be 1 in the long-run or on average. Figure 1 demonstrates the graphical representation of the expected value as the center of mass of the probability mass function.

    expec1.jpg

    Figure 1: Histogram of \(X\): The red arrow represents the center of mass, or the expected value of \(X\)

     


     

    Example \(\PageIndex{2}\)

    Suppose we toss a fair coin three times and define the random variable \(X\) to be our winnings on a single play of a game where

    • we win $\(x\) if the first heads is on the \(x^{th}\) toss, for \(x=1,2,3\),
    • and we lose $1 if we get no heads in all three tosses.

    Then \(X\) is a discrete random variable, with possible values \(x=-1,1,2,3\), and pmf given by the following table:

    \(x\) \(p(x) = P(X=x)\)
    -1 \(\frac{1}{8}\)
    1 \(\frac{1}{2}\)
    2 \(\frac{1}{4}\)
    3 \(\frac{1}{8}\)

     


     

     

    For many of the common probability distributions, the expected value is given by a parameter of the distribution. For example, if discrete random variable \(X\) has a Poisson distribution with parameter \(\lambda\), then \(E[X] = \lambda\). This can be derived directly from Definition 3.6.1, but we will derive it another way in Section 3.8 below.

     

    Expected Value of Functions of Random Variables

    In many applications, we may not be interested in the value of a random variable itself, but rather in a function applied to the random variable or a collection of random variables. For example, we may be interested in the value of \(X^2\). The following theorems, which we state without proof, demonstrates how to calculate the expected value of functions of random variables.

     

    Theorem \(\PageIndex{1}\)

    Let \(X\) be a random variable and let \(g\) be a real-valued function. Define the random variable \(Y = g(X)\).

    If \(X\) is a discrete random variable with possible values \(x_1, x_2, \ldots, x_i, \ldots\), and frequency function \(p(x)\), then the expected value of \(Y\) is given by

    $$E[Y] = \sum_i g(x_i)\cdot p(x_i).\notag$$

     

    To put it simply, Theorem 3.6.1 states that to find the expected value of a function of a random variable, just apply the function to the possible values of the random variable in the definition of expected value. Before stating an important special case of Theorem 3.6.1, a word of caution regarding order of operations. Note that, in general,
    $$E[g(X)] \neq g\left(E[X]\right)\text{!}\notag$$
    However, as the next theorem states, there are exceptions.

     

    Special Case of Theorem 3.6.1

    Let \(X\) be a random variable. If \(g\) is a linear function, i.e., \(g(x) = ax + b\), then
    $$E[g(X)] = E[aX + b] = aE[X] + b.\notag$$

     

    The above special case is referred to as the linearity of expected value.

     

    Linearity of Expectation

    Suppose \(X_1, \ldots, X_n\) are jointly distributed random variables, and let \(Y = g(X_1, \ldots, X_n)\).

    If \(X_1, \ldots, X_n\) are discrete random variables with joint frequency function \(p(x_1, \ldots, x_n)\), then the expected value of \(Y\) is given by

    $$E[Y] = \sum_{x_1, \ldots, x_n} g(x_1, \ldots, x_n)\cdot p(x_1, \ldots, x_n),\notag$$
    where the sum is over all possible combinations of possible values for the random variables \(X_1, \ldots, X_n\).

    Theorem 3.7.2 allows us to extend the linearity property of expected value to linear combinations of jointly distributed random variables.


    Extension of Special Case of Theorem 1: Let \(X_1, \ldots, X_n\) be jointly distributed random variables, and let \(a_1, \ldots, a_n, b\) be constants. Then, the following holds:
    $$E[a_1X_1 + \cdots + a_nX_n + b] = a_1E[X_1] + \cdots + a_nE[X_n] + b.\notag$$

    As a corollary to Theorem 3.7.2, we obtain an easy way of finding the expected value of products of functions of independent random variables.

    corollary 3.7.1

    If \(X\) and \(Y\) are independent random variables, then
    $$E[g(X)\cdot h(Y)] = E[g(X)] \cdot E[h(Y)].\notag$$

    Corollary 3.7.1 implies that, for independent random variables, \(E[XY] = E[X]E[Y]\).