3.2: Conditional Probability
- Page ID
- 56913
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)There can be rich relationships between two or more variables that are useful to understand. For example a car insurance company will consider information about a person’s driving history to assess the risk that they will be responsible for an accident. These types of relationships are the realm of conditional probabilities.
Exploring probabilities with a contingency table
The photo_classify data set represents a classifier a sample of 1822 photos from a photo sharing website. Data scientists have been working to improve a classifier for whether the photo is about fashion or not, and these 1822 photos represent a test for their classifier. Each photo gets two classifications: the first is called mach_learn and gives a classification from a machine learning (ML) system of either pred_fashion or pred_not. Each of these 1822 photos have also been classified carefully by a team of people, which we take to be the source of truth; this variable is called truth and takes values fashion and not. Figure 3.11 summarizes the results.
| fashion | not | Total | ||
pred_fashion |
197 | 22 | 219 | |
mach_learn |
pred_not |
112 | 1491 | 1603 |
| Total | 309 | 1513 | 1822 |
photo_classify data set.If a photo is actually about fashion, what is the chance the ML classifier correctly identified the photo as being about fashion?
Solution
We can estimate this probability using the data. Of the 309 fashion photos, the ML algorithm correctly classified 197 of the photos:
\[\begin{aligned} P(\text
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[1]/section[1]/div/div/p[2]/span, line 1, column 5
We sample a photo from the data set and learn the ML algorithm predicted this photo was not about fashion. What is the probability that it was incorrect and the photo is about fashion?
Solution
If the ML classifier suggests a photo is not about fashion, then it comes from the second row in the data set. Of these 1603 photos, 112 were actually about fashion:
\[\begin{aligned} P(\text
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[1]/section[2]/div/div/p[2]/span, line 1, column 6
Marginal and joint probabilities
Figure 3.11 includes row and column totals for each variable separately in the photo_classify data set. These totals represent marginal probabilities for the sample, which are the probabilities based on a single variable without regard to any other variables. For instance, a probability based solely on the mach_learn variable is a marginal probability:
\[\begin{aligned} P(\text
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[2]/p[2]/span, line 1, column 5
A probability of outcomes for two or more variables or processes is called a joint probability:
\[\begin{aligned} P(\text
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[2]/p[4]/span, line 1, column 5
It is common to substitute a comma for “and” in a joint probability, although using either the word “and” or a comma is acceptable:
\[P(\textCallstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[2]/p[6]/span/span, line 1, column 5
means the same thing as
\[P(\textCallstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[2]/p[8]/span/span, line 1, column 5
If a probability is based on a single variable, it is a marginal probability. The probability of outcomes for two or more variables or processes is called a joint probability.
We use table proportions to summarize joint probabilities for the photo_classify sample. These proportions are computed by dividing each count in Figure 3.11 by the table’s total, 1822, to obtain the proportions in Figure 3.13. The joint probability distribution of the mach_learn and truth variables is shown in Figure 3.14.
| : | : | Total | |
|---|---|---|---|
| : | 0.1081 | 0.0121 | 0.1202 |
| : | 0.0615 | 0.8183 | 0.8798 |
| Total | 0.1696 | 0.8304 | 1.00 |
| Joint outcome | Probability |
|---|---|
| is and is | 0.1081 |
| is and is | 0.0121 |
| is and is | 0.0615 |
| is and is | 0.8183 |
| Total | 1.0000 |
Verify Figure 3.14 represents a probability distribution: events are disjoint, all probabilities are non-negative, and the probabilities sum to 1.
- Answer
-
Each of the four outcome combination are disjoint, all probabilities are indeed non-negative, and the sum of the probabilities is 0.1081 + 0.0121 + 0.0615 + 0.8183 = 1.00.
We can compute marginal probabilities using joint probabilities in simple cases. For example, the probability a randomly selected photo from the data set is about fashion is found by summing the outcomes where takes value :
\[\begin{aligned} P(\text{\underline
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[2]/p[11]/span[1], line 1, column 6
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[2]/p[11]/span[2], line 1, column 5
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[2]/p[11]/span[3], line 1, column 5
Defining Conditional Probability
The ML classifier predicts whether a photo is about fashion, even if it is not perfect. We would like to better understand how to use information from a variable like mach_learn to improve our probability estimation of a second variable, which in this example is truth.
The probability that a random photo from the data set is about fashion is about 0.17. If we knew the machine learning classifier predicted the photo was about fashion, could we get a better estimate of the probability the photo is actually about fashion? Absolutely. To do so, we limit our view to only those 219 cases where the ML classifier predicted that the photo was about fashion and look at the fraction where the photo was actually about fashion:
\[\begin{aligned} P(\text
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[3]/p[3]/span, line 1, column 6
We call this a conditional probability because we computed the probability under a condition: the ML classifier prediction said the photo was about fashion.
There are two parts to a conditional probability, outcome of interest and the condition. It is useful to think of the condition as information we know to be true, and this information usually can be described as a known outcome or event. We generally separate the text inside our probability notation into the outcome of interest and the condition with a vertical bar:
\[\begin{aligned} && P(\text
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[3]/p[6]/span[1], line 1, column 6
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[3]/p[6]/span[2], line 1, column 6
The vertical bar “\(|\)” is read as given.
In the last equation, we computed the probability a photo was about fashion based on the condition that the ML algorithm predicted it was about fashion as a fraction:
\[\begin{aligned} & P(\text
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[3]/p[9]/span, line 1, column 6
We considered only those cases that met the condition, mach_learn is pred_fashion, and then we computed the ratio of those cases that satisfied our outcome of interest, photo was actually about fashion.
Frequently, marginal and joint probabilities are provided instead of count data. For example, disease rates are commonly listed in percentages rather than in a count format. We would like to be able to compute conditional probabilities even when no counts are available, and we use the last equation as a template to understand this technique.
We considered only those cases that satisfied the condition, where the ML algorithm predicted fashion. Of these cases, the conditional probability was the fraction representing the outcome of interest, that the photo was about fashion. Suppose we were provided only the information in Figure [photoClassifyProbTable], i.e. only probability data. Then if we took a sample of 1000 photos, we would anticipate about 12.0% or \(0.120\times 1000 = 120\) would be predicted to be about fashion ( is ). Similarly, we would expect about 10.8% or \(0.108\times 1000 = 108\) to meet both the information criteria and represent our outcome of interest. Then the conditional probability can be computed as
\[\begin{aligned} &P(\text
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[3]/p[13]/span[1], line 1, column 6
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[3]/p[13]/span[2], line 1, column 5
Here we are examining exactly the fraction of two probabilities, 0.108 and 0.120, which we can write as
\[\begin{aligned} P(\text
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[3]/p[15]/span[1], line 1, column 6
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[3]/p[15]/span[2], line 1, column 5
The fraction of these probabilities is an example of the general formula for conditional probability.
Definition text
Conditional probability The conditional probability of outcome \(A\) given condition \(B\) is computed as the following:
\[\begin{aligned} P(A | B) = \frac{P(A\text{ and }B)}{P(B)} \end{aligned}\]
[fashionProbOfMLNotGivenTruthNot] (a) Write out the following statement in conditional probability notation: “The probability that the ML prediction was correct, if the photo was about fashion”. Here the condition is now based on the photo’s status, not the ML algorithm.
(b) Determine the probability from part (a). Table [photoClassifyProbTable] may be helpful.
- Answer
-
Add texts here. Do not delete this text first.
- Determine the probability that the algorithm is incorrect if it is known the photo is about fashion.
- Using the answers from part (a) and Guided Practice [fashionProbOfMLNotGivenTruthNot](b), compute \[\begin{aligned} &P(\text(click for details)\ |\ \text
Callstack: at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[3]/section[3]/div/ol/li[2]/span[1], line 1, column 5(click for details)) \\ &\qquad +\ P(\textCallstack: at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[3]/section[3]/div/ol/li[2]/span[2], line 1, column 6(click for details)\ |\ \textCallstack: at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[3]/section[3]/div/ol/li[2]/span[3], line 1, column 5(click for details))\end{aligned}\]Callstack: at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[3]/section[3]/div/ol/li[2]/span[4], line 1, column 6 - Provide an intuitive argument to explain why the sum in (b) is 1.
- Answer
-
Add texts here. Do not delete this text first.
Smallpox in Boston, 1721
The data set provides a sample of 6,224 individuals from the year 1721 who were exposed to smallpox in Boston. Doctors at the time believed that inoculation, which involves exposing a person to the disease in a controlled form, could reduce the likelihood of death.
Each case represents one person with two variables: and . The variable takes two levels: or , indicating whether the person was inoculated or not. The variable has outcomes or . These data are summarized in Tables [smallpoxContingencyTable] and [smallpoxProbabilityTable].
| Total | ||||
| 238 | 5136 | 5374 | ||
| [0pt] | 6 | 844 | 850 | |
| Total | 244 | 5980 | 6224 |
| Total | ||||
| 0.0382 | 0.8252 | 0.8634 | ||
| [0pt] | 0.0010 | 0.1356 | 0.1366 | |
| Total | 0.0392 | 0.9608 | 1.0000 |
[probDiedIfNotInoculated] Write out, in formal notation, the probability a randomly selected person who was not inoculated died from smallpox, and find this
Determine the probability that an inoculated person died from smallpox. How does this result compare with the result of Guided Practice [probDiedIfNotInoculated]?
[SmallpoxInoculationObsExpExercise] The people of Boston self-selected whether or not to be inoculated. (a) Is this study observational or was this an experiment? (b) Can we infer any causal connection using these data? (c) What are some potential confounding variables that might influence whether someone or and also affect whether that person was inoculated?
General multiplication rule
Section 1.7 introduced the Multiplication Rule for independent processes. Here we provide the for events that might not be independent.
General Multiplication Rule If \(A\) and \(B\) represent two outcomes or events, then
\[\begin{aligned} P(A\text{ and }B) = P(A | B)\times P(B)\end{aligned}\]
It is useful to think of \(A\) as the outcome of interest and \(B\) as the condition.
This General Multiplication Rule is simply a rearrangement of the conditional probability equation.
Consider the data set. Suppose we are given only two pieces of information: 96.08% of residents were not inoculated, and 85.88% of the residents who were not inoculated ended up surviving. How could we compute the probability that a resident was not inoculated and lived? We will compute our answer using the General Multiplication Rule and then verify it using Figure [smallpoxProbabilityTable]. We want to determine
\[\begin{aligned} P(\text
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[5]/p[7]/span, line 1, column 7
and we are given that
\[\begin{aligned} P(\text
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[5]/p[9]/span[1], line 1, column 7
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[5]/p[9]/span[2], line 1, column 11
Among the 96.08% of people who were not inoculated, 85.88% survived:
\[\begin{aligned} P(\text
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[5]/p[11]/span, line 1, column 7
This is equivalent to the General Multiplication Rule. We can confirm this probability in Figure [smallpoxProbabilityTable] at the intersection of and (with a small rounding error).
Use \(P(\) = \() = 0.0392\) and \(P(\) = \(|\) = \() = 0.9754\) to determine the probability that a person was both inoculated and lived.
If 97.54% of the inoculated people lived, what proportion of inoculated people must have died?
Sum of conditional probabilities Let \(A_1\), ..., \(A_k\) represent all the disjoint outcomes for a variable or process. Then if \(B\) is an event, possibly for another variable or process, we have:
\[\begin{aligned} P(A_1|B) + \cdots + P(A_k|B) = 1\end{aligned}\]
The rule for complements also holds when an event and its complement are conditioned on the same information:
\[\begin{aligned} P(A | B) = 1 - P(A^c | B)\end{aligned}\]
Based on the probabilities computed above, does it appear that inoculation is effective at reducing the risk of death from smallpox?
Independence considerations in conditional probability
If two events are independent, then knowing the outcome of one should provide no information about the other. We can show this is mathematically true using conditional probabilities.
[condProbOfRollingA1AfterOne1] Let \(X\) and \(Y\) represent the outcomes of rolling two dice.
- What is the probability that the first die, \(X\), is ?
- What is the probability that both \(X\) and \(Y\) are ?
- Use the formula for conditional probability to compute \(P(Y =\) \(\ |\ X =\) \()\).
- What is \(P(Y=1)\)? Is this different from the answer from part (c)? Explain.
We can show in Guided Practice [condProbOfRollingA1AfterOne1](c) that the conditioning information has no influence by using the Multiplication Rule for independence processes:
\[\begin{aligned} P(Y=\text{ {1}}\ |\ X=\text{ {1}}) &= \frac{P(Y=\text{ {1} and }X=\text{ {1}})} {P(X=\text{ {1}})} \\ &= \frac{P(Y=\text{ {1}}) \times \color{oiGB}P(X=\text{ {1}})} {\color{oiGB}P(X=\text{ {1}})} \\ &= P(Y=\text{ {1}}) \\\end{aligned}\]
Ron is watching a roulette table in a casino and notices that the last five outcomes were . He figures that the chances of getting six times in a row is very small (about \(1/64\)) and puts his paycheck on red. What is wrong with his reasoning?
Tree diagrams
are a tool to organize outcomes and probabilities around the structure of the data. They are most useful when two or more processes occur in a sequence and each process is conditioned on its predecessors.
The data fit this description. We see the population as split by : and . Following this split, survival rates were observed for each group. This structure is reflected in the shown in Figure [smallpoxTreeDiagram]. The first branch for is said to be the branch while the other branches are .
Tree diagrams are annotated with marginal and conditional probabilities, as shown in Figure [smallpoxTreeDiagram]. This tree diagram splits the smallpox data by into the and groups with respective marginal probabilities 0.0392 and 0.9608. The secondary branches are conditioned on the first, so we assign conditional probabilities to these branches. For example, the top branch in Figure [smallpoxTreeDiagram] is the probability that = conditioned on the information that = . We may (and usually do) construct joint probabilities at the end of each branch in our tree by multiplying the numbers we come across as we move from left to right. These joint probabilities are computed using the General Multiplication Rule:
\[\begin{aligned} & P(\text
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[7]/p[4]/span[1], line 1, column 11
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[7]/p[4]/span[2], line 1, column 11
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[7]/p[4]/span[3], line 1, column 7
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[7]/p[4]/span[4], line 1, column 11
Consider the midterm and final for a statistics class. Suppose 13% of students earned an on the midterm. Of those students who earned an on the midterm, 47% received an on the final, and 11% of the students who earned lower than an on the midterm received an on the final. You randomly pick up a final exam and notice the student received an . What is the probability that this student earned an on the midterm? [exerciseForTreeDiagramOfStudentGettingAOnMidtermGivenThatSheGotAOnFinal] The end-goal is to find \(P(\textCallstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[7]/p[5]/span[3]/span[1], line 1, column 8
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[7]/p[5]/span[3]/span[2], line 1, column 6
\[\begin{aligned} P(\text
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[7]/p[6]/span[1], line 1, column 8
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[7]/p[6]/span[2], line 1, column 6
However, this information is not provided, and it is not obvious how to calculate these probabilities. Since we aren’t sure how to proceed, it is useful to organize the information into a tree diagram:
When constructing a tree diagram, variables provided with marginal probabilities are often used to create the tree’s primary branches; in this case, the marginal probabilities are provided for midterm grades. The final grades, which correspond to the conditional probabilities provided, will be shown on the secondary branches.
With the tree diagram constructed, we may compute the required probabilities:
\[\begin{aligned} &P(\text
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[7]/p[10]/span[1], line 1, column 8
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[7]/p[10]/span[2], line 1, column 8
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[7]/p[10]/span[3], line 1, column 8
The marginal probability, \(P(\) = \()\), was calculated by adding up all the joint probabilities on the right side of the tree that correspond to = . We may now finally take the ratio of the two probabilities:
\[\begin{aligned} P(\text
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[7]/p[12]/span[1], line 1, column 8
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[7]/p[12]/span[2], line 1, column 6
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[7]/p[12]/span[3], line 1, column 8
Callstack:
at (Bookshelves/Introductory_Statistics/Introduction_to_Statistics_4e_(Diez_et_al.)/03:_Probability/3.02:_Conditional_Probability), /content/body/div[7]/p[12]/span[4], line 1, column 6
The probability the student also earned an A on the midterm is about 0.39.
After an introductory statistics course, 78% of students can successfully construct tree diagrams. Of those who can construct tree diagrams, 97% passed, while only 57% of those students who could not construct tree diagrams passed. (a) Organize this information into a tree diagram. (b) What is the probability that a randomly selected student passed? (c) Compute the probability a student is able to construct a tree diagram if it is known that she passed.
Bayes’ Theorem
In many instances, we are given a conditional probability of the form
\[\begin{aligned} P(\text{statement about variable 1 } | \text{ statement about variable 2})\end{aligned}\]
but we would really like to know the inverted conditional probability:
\[\begin{aligned} P(\text{statement about variable 2 } | \text{ statement about variable 1})\end{aligned}\]
Tree diagrams can be used to find the second conditional probability when given the first. However, sometimes it is not possible to draw the scenario in a tree diagram. In these cases, we can apply a very useful and general formula: Bayes’ Theorem.
We first take a critical look at an example of inverting conditional probabilities where we still apply a tree diagram.
In Canada, about 0.35% of women over 40 will develop breast cancer in any given year. A common screening test for cancer is the mammogram, but this test is not perfect. In about 11% of patients with breast cancer, the test gives a : it indicates a woman does not have breast cancer when she does have breast cancer. Similarly, the test gives a in 7% of patients who do not have breast cancer: it indicates these patients have breast cancer when they actually do not. If we tested a random woman over 40 for breast cancer using a mammogram and the test came back positive – that is, the test suggested the patient has cancer – what is the probability that the patient actually has breast cancer?
Add example text here.
Solution
Add example text here.
[probBreastCancerGivenPositiveTestExample]
Notice that we are given sufficient information to quickly compute the probability of testing positive if a woman has breast cancer (\(1.00-0.11=0.89\)). However, we seek the inverted probability of cancer given a positive test result. (Watch out for the non-intuitive medical language: a positive test result suggests the possible presence of cancer in a mammogram screening.) This inverted probability may be broken into two pieces:
\[\begin{aligned} P(\text{has BC } | \text{ mammogram$^+$}) = \frac{P(\text{has BC and mammogram$^+$})}{P(\text{mammogram$^+$})}\end{aligned}\]
where “has BC” is an abbreviation for the patient having breast cancer and “mammogram\(^+\)” means the mammogram screening was positive. We can construct a tree diagram for these probabilities:
The probability the patient has breast cancer and the mammogram is positive is
\[\begin{aligned} P(\text{has BC and mammogram$^+$}) &= P(\text{mammogram$^+$ } | \text{ has BC})P(\text{has BC}) \\ &= 0.89\times 0.0035 = 0.00312\end{aligned}\]
The probability of a positive test result is the sum of the two corresponding scenarios:
\[\begin{aligned} P(\text{\underline{\color{black}mammogram$^+$}}) &= P(\text{\underline{\color{black}mammogram$^+$} and has BC}) \\ &\qquad\qquad + P(\text{\underline{\color{black}mammogram$^+$} and no BC})\\ &= P(\text{has BC})P(\text{mammogram$^+$ } | \text{ has BC}) \\ &\qquad\qquad + P(\text{no BC})P(\text{mammogram$^+$ } | \text{ no BC}) \\ &= 0.0035\times 0.89 + 0.9965\times 0.07 = 0.07288\end{aligned}\]
Then if the mammogram screening is positive for a patient, the probability the patient has breast cancer is
\[\begin{aligned} P(\text{has BC } | \text{ mammogram$^+$}) &= \frac{P(\text{has BC and mammogram$^+$})}{P(\text{mammogram$^+$})}\\ &= \frac{0.00312}{0.07288} \approx 0.0428\end{aligned}\]
That is, even if a patient has a positive mammogram screening, there is still only a 4% chance that she has breast cancer.
Example [probBreastCancerGivenPositiveTestExample] highlights why doctors often run more tests regardless of a first positive test result. When a medical condition is rare, a single positive test isn’t generally definitive.
Consider again the last equation of Example [probBreastCancerGivenPositiveTestExample]. Using the tree diagram, we can see that the numerator (the top of the fraction) is equal to the following product:
\[\begin{aligned} P(\text{has BC and mammogram$^+$}) = P(\text{mammogram$^+$ } | \text{ has BC})P(\text{has BC})\end{aligned}\]
The denominator – the probability the screening was positive – is equal to the sum of probabilities for each positive screening scenario:
\[\begin{aligned} P(\text{\underline{\color{black}mammogram$^+$}}) &= P(\text{\underline{\color{black}mammogram$^+$} and no BC}) + P(\text{\underline{\color{black}mammogram$^+$} and has BC})\end{aligned}\]
In the example, each of the probabilities on the right side was broken down into a product of a conditional probability and marginal probability using the tree diagram.
\[\begin{aligned} P(\text{mammogram$^+$}) &= P(\text{mammogram$^+$ and no BC}) + P(\text{mammogram$^+$ and has BC}) \\ &= P(\text{mammogram$^+$ } | \text{ no BC})P(\text{no BC}) \\ &\qquad\qquad + P(\text{mammogram$^+$ } | \text{ has BC})P(\text{has BC})\end{aligned}\]
We can see an application of Bayes’ Theorem by substituting the resulting probability expressions into the numerator and denominator of the original conditional probability.
\[\begin{aligned} & P(\text{has BC } | \text{ mammogram$^+$}) \\ & \qquad= \frac{P(\text{mammogram$^+$ } | \text{ has BC})P(\text{has BC})} {P(\text{mammogram$^+$ } | \text{ no BC})P(\text{no BC}) + P(\text{mammogram$^+$ } | \text{ has BC})P(\text{has BC})}\end{aligned}\]
Bayes’ Theorem: inverting probabilities Consider the following conditional probability for variable 1 and variable 2:
\[\begin{aligned} P(\text{outcome $A_1$ of variable 1 } | \text{ outcome $B$ of variable 2})\end{aligned}\]
Bayes’ Theorem states that this conditional probability can be identified as the following fraction:
\[\begin{aligned} \frac{P(B | A_1) P(A_1)} {P(B | A_1) P(A_1) + P(B | A_2) P(A_2) + \cdots + P(B | A_k) P(A_k)}\end{aligned}\]
where \(A_2\), \(A_3\), ..., and \(A_k\) represent all other possible outcomes of the first variable.
Bayes’ Theorem is a generalization of what we have done using tree diagrams. The numerator identifies the probability of getting both \(A_1\) and \(B\). The denominator is the marginal probability of getting \(B\). This bottom component of the fraction appears long and complicated since we have to add up probabilities from all of the different ways to get \(B\). We always completed this step when using tree diagrams. However, we usually did it in a separate step so it didn’t seem as complex.
To apply Bayes’ Theorem correctly, there are two preparatory steps:
- First identify the marginal probabilities of each possible outcome of the first variable: \(P(A_1)\), \(P(A_2)\), ..., \(P(A_k)\).
- Then identify the probability of the outcome \(B\), conditioned on each possible scenario for the first variable: \(P(B | A_1)\), \(P(B | A_2)\), ..., \(P(B | A_k)\).
Once each of these probabilities are identified, they can be applied directly within the formula. Bayes’ Theorem tends to be a good option when there are so many scenarios that drawing a tree diagram would be complex.
[exerciseForParkingLotOnCampusBeingFullAndWhetherOrNotThereIsASportingEvent] Jose visits campus every Thursday evening. However, some days the parking garage is full, often due to college events. There are academic events on 35% of evenings, sporting events on 20% of evenings, and no events on 45% of evenings. When there is an academic event, the garage fills up about 25% of the time, and it fills up 70% of evenings with sporting events. On evenings when there are no events, it only fills up about 5% of the time. If Jose comes to campus and finds the garage full, what is the probability that there is a sporting event? Use a tree diagram to solve this problem.
Here we solve the same problem presented in Guided Practice [exerciseForParkingLotOnCampusBeingFullAndWhetherOrNotThereIsASportingEvent], except this time we use Bayes’ Theorem. The outcome of interest is whether there is a sporting event (call this \(A_1\)), and the condition is that the lot is full (\(B\)). Let \(A_2\) represent an academic event and \(A_3\) represent there being no event on campus. Then the given probabilities can be written as
\[\begin{aligned} &P(A_1) = 0.2 &&P(A_2) = 0.35 &&P(A_3) = 0.45 \\ &P(B | A_1) = 0.7 &&P(B | A_2) = 0.25 &&P(B | A_3) = 0.05\end{aligned}\]
Bayes’ Theorem can be used to compute the probability of a sporting event (\(A_1\)) under the condition that the parking lot is full (\(B\)):
\[\begin{aligned} P(A_1 | B) &= \frac{P(B | A_1) P(A_1)}{P(B | A_1) P(A_1) + P(B | A_2) P(A_2) + P(B | A_3) P(A_3)} \\ &= \frac{(0.7)(0.2)}{(0.7)(0.2) + (0.25)(0.35) + (0.05)(0.45)} \\ &= 0.56 \end{aligned}\]
Based on the information that the garage is full, there is a 56% probability that a sporting event is being held on campus that evening.
[exerciseForParkingLotOnCampusBeingFullAndWhetherOrNotThereIsAnAcademicEvent] Use the information in the previous exercise and example to verify the probability that there is an academic event conditioned on the parking lot being full is 0.35.
[exerciseForParkingLotOnCampusBeingFullAndWhetherOrNotThereIsNoEvent] In Guided Practice [exerciseForParkingLotOnCampusBeingFullAndWhetherOrNotThereIsASportingEvent] and [exerciseForParkingLotOnCampusBeingFullAndWhetherOrNotThereIsAnAcademicEvent], you found that if the parking lot is full, the probability there is a sporting event is 0.56 and the probability there is an academic event is 0.35. Using this information, compute \(P(\)no event \(|\) the lot is full\()\).
The last several exercises offered a way to update our belief about whether there is a sporting event, academic event, or no event going on at the school based on the information that the parking lot was full. This strategy of updating beliefs using Bayes’ Theorem is actually the foundation of an entire section of statistics called . While Bayesian statistics is very important and useful, we will not have time to cover much more of it in this book.


