5.1: Basics of Probability Distributions
- Page ID
- 45478
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)
- Define a discrete probability distribution as a list of all possible outcomes of a random variable with their probabilities.
- Use discrete probability distributions to model real-world scenarios such as dice rolls or item counts.
- Calculate key measures from the distribution, including mean, variance, standard deviation, and expected value.
As a reminder, a variable, or what will be called the random variable from now on, is represented by the letter x, which means a quantitative (numerical) variable that is measured or observed in an experiment.
Also, remember that there are different types of quantitative variables, called discrete or continuous. What is the difference between discrete and continuous data? Discrete data can only take on particular values in a range, while continuous data can take on any value in a range. Discrete data usually arises from counting, while continuous data usually arises from measuring.
Probability Distribution
How tall is a plant given a new fertilizer? Continuous. This is something you measure. How many fleas are on prairie dogs in a colony? Discrete. The number of fleas can be counted.
A random variable has outcomes that are determined by chance, and it is what is being measured. A discrete random variable is a random variable whose outcomes are countable whole-number values. For example, in the problem of "how tall is a plant given a new fertilizer?", the random variable is the height of the plant given a new fertilizer. For "how many fleas are on prairie dogs in a colony," the discrete random variable is the number of fleas on a prairie dog in a colony.
Suppose all the outcomes of a discrete random variable are listed together with their corresponding probabilities. A distribution is formed, but now it is called a probability distribution since it involves probabilities. A probability distribution is an assignment of probabilities to the values of the random variable. Furthermore, the sum of all the probabilities in the distribution must equal 1.
Note: The abbreviation of pdf is used for a probability distribution function.
For probability distributions, \(0 \leq P(x) \leq 1 \operatorname{and} \sum P(x)=1\)
The 2010 U.S. Census found the chance of a household being a certain size. Is the variable X a discrete random variable? Also, determine whether the distribution represents a probability distribution. If it does not, explain why. The data is in Example \(\PageIndex{1}\) ("Households by age," 2013).
| Size of household | 1 | 2 | 3 | 4 | 5 | 6 | 7 or more |
|---|---|---|---|---|---|---|---|
| Probability | 26.7% | 33.6% | 15.8% | 13.7% | 6.3% | 2.4% | 1.5% |
Solution
In this case, the random variable is x = the number of people in a household. Since you are counting the number of people in a household, this is a discrete random variable.
This is a probability distribution since you have the x value and the probabilities that go with it. All of the probabilities are between zero and one, and the sum of all is one.
You can give a probability distribution in table form (as in the Example above) or as a graph. The graph looks like a histogram. A probability distribution is a relative frequency distribution based on a very large sample.
The 2010 U.S. Census found the chance of a household being a certain size. The data is in the table ("Households by age," 2013). Draw a histogram of the probability distribution.
| Size of household | 1 | 2 | 3 | 4 | 5 | 6 | 7 or more |
|---|---|---|---|---|---|---|---|
| Probability | 26.7% | 33.6% | 15.8% | 13.7% | 6.3% | 2.4% | 1.5% |
Solution
State random variable:
x = number of people in a household
You draw a histogram, with the x values on the horizontal axis representing the classes (for the 7 or more categories, just call it 7) and the probabilities on the vertical axis representing the probabilities.
Notice this graph is skewed right.
Just as with any data set, the mean and standard deviation can be computed. In problems involving a probability distribution function (PDF), the probability distribution is considered the population, even though the PDF, in most cases, comes from repeating an experiment many times. This is because you are using the data from repeated experiments to estimate the true probability. Since a PDF is a population, the mean and standard deviation that are calculated are the population parameters and not the sample statistics. The notation used is the same as the notation for population mean and population standard deviation that was used in Chapter 3.
Expected Value, Variance, and Standard Deviation
The mean can be thought of as the expected value. It is the value you expect to get if the trials are repeated an infinite number of times. The mean or expected value does not need to be a whole number, even if the possible values of x are whole numbers.
For a discrete probability distribution function,
The mean or expected value is \(\mu=\sum x P(x)\)
The variance is \(\sigma^2=\left\lbrack\Sigma x\cdot p\left(x\right)\right\rbrack-\mu^2\)
The standard deviation is \(\sigma=\sqrt{\left\lbrack\Sigma x\cdot p\left(x\right)\right\rbrack-\mu^2}\)
where x = the value of the random variable and P(x) = the probability corresponding to a particular x value.
The 2010 U.S. Census found the chance of a household being a certain size. The data is in the table ("Households by age," 2013).
| Size of household | 1 | 2 | 3 | 4 | 5 | 6 | 7 or more |
|---|---|---|---|---|---|---|---|
| Probability | 26.7% | 33.6% | 15.8% | 13.7% | 6.3% | 2.4% | 1.5% |
- Find the mean
- Find the variance
- Find the standard deviation
- Use a TI-83/84 to calculate the mean and standard deviation
Solution
State random variable:
x= number of people in a household
a. To find the mean, it is easier just to use a table as shown below. Consider the category 7 or more to be just 7. The formula for the mean says to multiply the x value by the P(x) value, so add a row to the table for this calculation. Also, convert all P(x) to decimal form.
| \(X\) | \(P(X)\) | \(X \cdot P(X)\) |
|---|---|---|
| 1 | 0.267 | 0.267 |
| 2 | 0.336 | 0.672 |
| 3 | 0.158 | 0.474 |
| 4 | 0.137 | 0.548 |
| 5 | 0.063 | 0.315 |
| 6 | 0.024 | 0.144 |
| 7 | 0.015 | 0.098 |
| No sum needed | \(\Sigma = 1\) | \(\Sigma X \cdot P(X) = 2.525\) |
This is the mean or the expected value is the sum of the last column, which is \(\mu\) = 2.525 people. This means that you expect a household in the U.S. to have 2.525 people in it. Now, of course, you can’t have half a person, but what this tells you is that you expect a household to have either 2 or 3 people, with a little more 3-person households than 2-person households.
b. To find the variance, again, it is easier to use a table version than to try to use just the formula in a line. Looking at the formula, you will notice that the first operation that you should do is to subtract the mean from each x value. Then, you square each of these values. Then, you multiply each of these answers by the probability of each x value. Finally, you add up all of these values.
| \(X\) | \(P(X)\) | \(X^2 \cdot P(X)\) |
|---|---|---|
| 1 | 0.267 | 0.267 |
| 2 | 0.336 | 1.344 |
| 3 | 0.158 | 1.422 |
| 4 | 0.137 | 2.192 |
| 5 | 0.063 | 1.575 |
| 6 | 0.024 | 0.864 |
| 7 | 0.015 | 0.735 |
| No sum needed | \(\Sigma = 1\) | \(\Sigma X^2 \cdot P(X) = 8.399\) |
To find the variance, add the values in the last column and subtract the square of the mean from the sum. It is \(\sigma^{2}=8.399 - 2.525^{2} = 8.399- 6.376 = 2.023. (Note: try not to round your numbers too much so you aren’t creating rounding errors in your answer. The numbers in the table above were rounded off because of space limitations, but the answer was calculated using many decimal places.)
c. To find the standard deviation, just take the square root of the variance, \(\sigma=\sqrt{2.023375} \approx 1.422454\) people. This means that you can expect a U.S. household to have 2.525 people in it, with a standard deviation of 1.42 people.
d. Go into the [STAT] menu, then the [Edit] menu. Type the x values into [L1] and the P(x) values into [L2]. Then go into the [STAT] menu, then the [CALC] menu. Choose [1:1-Var Stats]. This will put 1-Var Stats on the home screen. Now type in [L1], [L2] (there is a comma between [L1] and [L2]) and then press [ENTER]. If you have the newer operating system on the TI-84, then your input will be slightly different. You will see the output in the figure below.
The mean is 2.525 people, and the standard deviation is 1.422 people.
To compute the variance from the calculator output. Type the standard deviation, including all digits, into the calculator. Then select the square function and enter. Round the answer to three decimals. Thus, the variance is 2.023.
Expected Value for a Game of Chance
The expected value can also be used to find the average winnings for a game of chance. The probability distribution must be rewritten to include win and loss rows. Wins are treated as positive numbers, and losses are treated as negative numbers. Also, the following rules hold for each game of chance.
- If the expected value = 0, then the game is fair.
- If the expected value < 0, then the game favors the house (whoever is holding the game).
- If the expected value > 0, then the game favors the player.
In the Arizona lottery called Pick 3, a player pays $1 and then picks a three-digit number. If those three numbers are picked in that specific order the person wins $500. What is the expected value of this game?
Solution
To find the expected value, the first step is to create the probability distribution. In this case, the random variable x = winnings. If the numbers are picked in the right order, then the player wins $500, but the player must pay $1 to play, so he/she win $499.
If the player does not pick the right numbers, he/she lose $1, and the x value is -$1.
The probabilities of winning and losing must also be accounted for. The player picks a three-digit number, and there are 10 possible numbers for each digit. Also, each of these three numbers is independent of the others. The multiplication rule will be used to compute the total number of possibilities. To win, you have to pick the right numbers in the right order. For the first digit, you pick 1 number out of 10, the second digit you pick 1 number out of 10, and the third digit you pick 1 number out of 10. The probability of picking the right number in the right order is \(\dfrac{1}{10} \cdot \dfrac{1}{10} \cdot \dfrac{1}{10}=\dfrac{1}{1000}=0.001\). The probability of losing (not winning) would be \(1-\dfrac{1}{1000}=\dfrac{999}{1000}=0.999\). Putting this information into a table will help to calculate the expected value.
| Win or lose | x | P(x) | xP(x) |
|---|---|---|---|
| Win | $499 | 0.001 | $0.499 |
| Lose | -$1 | 0.999 | -$0.999 |
Now add the two values together, and you have the expected value. It is \(\$ 0.499+(-\$ 0.999)=-\$ 0.50\). In the long run, you can expect to lose $0.50. Since the expected value is not 0, this game is not fair. Since you lose money, Arizona makes money, which is why they have the lottery.
A player decides to play a new game of chance at a local casino. In the game, the player pays $5 and selects a card. If the player selects an ace, the player wins $25. If the player selects a jack, the player wins $10. If the player selects a 10, then the break-even even. If the player selects any other card, he/she lose the game.
Solution
Recall that the following probabilities are used to compute outcomes of a standard deck of cards.
- P(ace) = 4/52 = 0.077
- P(jack) = 4/52 = 0.077
- P(10) = 4/52 = 0.077
- P(Not an ace or jack or 10) = 1 - 12/52 = 1 - 0.231 = 0.769
Construct a table with rows for wins and losses. Also, subtract the payout of $5 from each of the winnings.
Table \(\PageIndex{7}\): Finding expected value.
| Win or Lose | x | P(x) | xP(x) |
|---|---|---|---|
| Select an ace | $20 | 0.077 | $1.54 |
| Select a Jack | $5 | 0.077 | $0.385 |
| Select a 10 | $0 | 0.077 | $0 |
| Lose the game by selecting a non-winning card. | -$5 | 0.769 | -$3.845 |
The expected value is the sum of the last column. It is -$1.92. Thus, the game favors the house and the player loses on average $1.92 each time he/she plays the game.
Authors
"5.1: Basics of Probability Distributions" by Toros Berberyan, Tracy Nguyen, and Alfie Swan is licensed under CC BY-SA 4.0
Attributions
"5.1: Basics of Probability Distributions" by Kathryn Kozak is licensed under CC BY-SA 4.0


