# 3.7: Conditional Probability and Bayes' Rule

- Page ID
- 40378

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)In many situations, additional information about the result of a probability experiment is known (or at least assumed to be known) and given that information the probability of some other event is desired. For this scenario, we compute what is referred to as *conditional probability*.

For events \(A\) and \(B\), with \(P(B) > 0\), the * conditional probability *of \(A\) given \(B\), denoted \(P(A\ |\ B)\), is given by

\[P(A\ |\ B) = \frac{P(A \cap B)}{P(B)}.\notag\]

In computing a conditional probability we *assume* that we know the outcome of the experiment is in event \(B\) and then, given that additional information, we calculate the probability that the outcome is also in event \(A\). This is useful in practice given that partial information about the outcome of an experiment is often known, as the next example demonstrates.

Continuing in the context of Example 1.2.1, where we considered tossing a fair coin twice, define \(D\) to be the event that at least one tails is recorded:

$$D = \{ht, th, tt\}\notag$$

Let's calculate the conditional probability of \(A\) given \(D\), i.e., the probability that at least one heads * *is recorded (event \(A\)) assuming that at least one tails is recorded (event \(D\)). Recalling that outcomes in this sample space are equally likely, we apply the definition of conditional probability (Definition 2.1.1) and find

$$P(A\ |\ D) = \frac{P(A\cap D)}{P(D)} = \frac{P(\{ht, th\})}{P(\{ht, th, tt\})}= \frac{(2/4)}{(3/4)} = \frac{2}{3} \approx 0.67.\notag$$

Note that in Example 1.2.1 we found the * un*-conditional probability of \(A\) to be \(P(A) = 0.75\). So, knowing that at least one tails was recorded, i.e., assuming event \(D\) occurred, the conditional probability of \(A\) given \(D\) decreased. This is because, if event \(D\) occurs, then the outcome \(hh\) in \(A\) cannot occur, thereby decreasing the chances that event \(A\) occurs.

Suppose we randomly draw a card from a standard deck of 52 playing cards.

- If we know that the card is a King, what is the probability that the card is a club?
- If we instead know that the card is black, what is the probability that the card is a club?

**Answer**-
In order to compute the necessary probabilities, first note that the sample space is given by the set of cards in a standard deck of playing cards. So the number of outcomes in the sample space is 52. Next, note that the outcomes are equally likely, since we are

*randomly*drawing the card from the deck.For part (a), we are looking for the conditional probability that the randomly selected card is club, given that it is a King. If we let \(C\) denote the event that the card is a club and \(K\) the event that it is a King, then we are looking to compute $$P(C\ |\ K) = \frac{P(C\cap K)}{P(K)}.\label{condproba}$$ To compute these probabilities, we count the number of outcomes in the following events:

$$ \text{# of outcomes in}\ C = \#\ \text{of clubs in standard deck}\ = 13 \notag$$

$$ \text{# of outcomes in}\ K = \#\ \text{of Kings in standard deck}\ = 4 \notag$$

$$ \text{# of outcomes in}\ C\cap K = \#\ \text{of King of clubs in standard deck}\ = 1 \notag$$

The probabilities in Equation \ref{condproba} are then given by dividing the counts of outcomes in each event by the total number of outcomes in the sample space (by the boxed Equation 1.3.2 in Section 1.3): \[P(C\ |\ K) = \frac{P(C\cap K)}{P(K)} = \frac{(1/52)}{(4/52)} = \frac{1}{4} = 0.25.\notag\]

For part (b), we are looking for the conditional probability that the randomly selected card is club, given that it is instead black. If we let \(B\) denote the event that the card is black, then we are looking to compute $$P(C\ |\ B) = \frac{P(C\cap B)}{P(B)}.\label{condprobb}$$ To compute these probabilities, we count the number of outcomes in the following events:

$$ \text{# of outcomes in}\ B = \#\ \text{of black cards in standard deck}\ = 26 \notag$$

$$ \text{# of outcomes in}\ C\cap B = \#\ \text{of black clubs in standard deck}\ = 13\notag $$

The probabilities in Equation \ref{condprobb} are then given by dividing the counts of outcomes in each event by the total number of outcomes in the sample space: \[P(C\ |\ B) = \frac{P(C\cap B)}{P(B)} = \frac{(13/52)}{(26/52)} = \frac{13}{26} = 0.5.\notag\]

**Remark: **Exercise demonstrates the following fact. For sample spaces with equally likely outcomes, conditional probabilities are calculated using

\[\boxed{P(A\ |\ B) = \frac{\text{number of outcomes in}\ A\cap B}{\text{number of outcomes in}\ B}.}\]

In other words, if we know that the outcome of the probability experiment is in the event \(B\), then we restrict our focus to the outcomes in that event that are also in \(A\). We can think of this as event \(B\) taking the place of the sample space, since we know the outcome must lie in that event.

## Properties of Conditional Probability

As with unconditional probability, we also have some useful properties for conditional probabilities. The first property below, referred to as the *Multiplication Law*, is simply a rearrangement of the probabilities used to define conditional probability. The Multiplication Law provides a way for computing the probability of an intersection of events when the conditional probabilities are known.

\(P(A \cap B) = P(A\ |\ B) P(B) = P(B\ |\ A) P(A)\)

The next two properties are useful when a *partition* of the sample space exists, where a partition is a way of dividing up the outcomes in the sample space into non-overlapping sets. A partition is formally defined in the *Law of Total Probability* below. In many cases, when a partition exists, it is easy to compute the conditional probability of an event in the sample space given an event in the partition. The Law of Total Probability then provides a way of using those conditional probabilities of an event, given the partition to compute the unconditional probability of the event. Following the Law of Total Probability, we state *Bayes' Rule*, which is really just an application of the Multiplication Law. Bayes' Rule is used to calculate what are informally referred to as "reverse conditional probabilities", which are the conditional probabilities of an event in a partition of the sample space, given any other event.

Suppose events \(B_1, B_2, \ldots, B_n,\) satisfy the following:

- \(\Omega = B_1 \cup B_2 \cup \cdots \cup B_n\)
- \(B_i\cap B_j = \varnothing\), for every \(i\neq j\)
- \(P(B_i)>0\), for \(i=1, \ldots, n\)

We say that the events \(B_1, B_2, \ldots, B_n,\) **partition** the sample space \(\Omega\). Then for any event \(A\), we can write

\[P(A) = P(A\ |\ B_1) P(B_1) + \cdots + P(A\ |\ B_n) P(B_n).\notag\]

Let \(B_1, B_2, \ldots, B_n,\) partition the sample space \(\Omega\) and let \(A\) be an event with \(P(A)> 0\). Then, for \(j=1,\ldots, n\), we have

\[P(B_j\ |\ A) = \frac{P(A\ |\ B_j) P(B_j)}{P(A)}.\notag\]

A common application of the Law of Total Probability and Bayes' Rule is in the context of medical diagnostic testing.

Consider a test that can diagnose kidney cancer. The test correctly detects when a patient has cancer 90% of the time. Also, if a person does not have cancer, the test correctly indicates so 99.9% of the time. Finally, suppose it is known that 1 in every 10,000 individuals has kidney cancer. We find the probability that a patient has kidney cancer, given that the test indicates she does.

First, note that we are finding a conditional probability. If we let \(A\) denote the event that the patient tests positive for cancer, and we let \(B_1\) denote the event that the patient actually has cancer, then we want

$$P(B_1\ |\ A).\notag$$

If we let \(B_2 = B_1^c\), then we have a partition of all patients (which is the sample space) given by \(B_1\) and \(B_2\).

In the first paragraph of this example, we are given the following probabilities:

$$\textcolor{BurntOrange}{\text{test correctly detects cancer 90% of time:}}\quad \textcolor{BurntOrange}{P(A\ |\ B_1) = 0.9} \notag$$

$$\textcolor{goldenrod}{\text{test correctly detects no cancer 99.9% of time:}}\quad \textcolor{goldenrod}{P(A^c\ |\ B_2) = 0.999} \Rightarrow P(A\ |\ B_2) = 1-P(A^c\ |\ B_2) = 0.001 \notag$$

$$\textcolor{red}{\text{1 in every 10,000 individuals has cancer:}}\quad \textcolor{red}{P(B_1) = 0.0001} \Rightarrow P(B_2) = 1 - P(B_1) = 0.9999 \notag$$

Since we have a partition of the sample space, we apply the Law of Total Probability to find \(P(A)\):

$$P(A) = \textcolor{BurntOrange}{P(A\ |\ B_1)} \textcolor{red}{P(B_1)} + P(A\ |\ B_2) P(B_2) = (\textcolor{BurntOrange}{0.9})(\textcolor{red}{0.0001}) + (0.001)(0.9999) = 0.0010899\notag$$

Next, we apply Bayes' Rule to find the desired conditional probability:

$$P(B_1\ |\ A) = \frac{P(A\ |\ B_1) P(B_1)}{P(A)} = \frac{(0.9)(0.0001)}{0.0010899} \approx 0.08\notag$$

This implies that only about 8% of patients that test positive under this particular test actually have kidney cancer, which is not very good.