Skip to main content
Statistics LibreTexts

7.4: Probability

  • Page ID
    64112

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    You and a friend want to see a movie but you both want to see different ones. To decide on the movie, you and your friend are going to flip a coin. This is not an uncommon method for making such a choice. This is because of the instinctive idea that using a coin flip is a fair way to decide between two choices. Otherwise, we would not consider such a manner of choosing to a be a fair way to make such a decision, or at the very least you would have a clear preference for which side of the coin will correspond to your movie and which to theirs. The idea of fairness, meaning that you and your friend are using the coin to evenly, but randomly, choose between your choices, implies that you implicitly believe that neither you nor your friend have an advantage in the outcome of the movie choice.

    On the other hand, there are cases where we inherently feel that some outcomes are very likely while other outcomes are not. On a given day you might feel the need to take an umbrella with you to work or school, even though it is not raining when you leave. This may be because you either have listened to a weather report or looked at the skies and decided rain was likely enough that taking the umbrella was warranted. However, on most days you probably leave the umbrella at home, deciding it is likely you will not need it. On those days you have decided that the likelihood of raining is so low that you will take a chance by not taking your umbrella.

    Consider rolling a standard six-sided die once. In most cases we would consider such a die to be fair if each side of the dice is not any more or less likely to be observed than any other side of the die. If we are comfortable with such an assertion, we can conclude that we would roll an even number (the faces with two, four or six dots) just as often as we would roll an odd number (the faces with one, three or five dots). Hence, rolling a die and observing whether an even or odd number is rolled is the same as flipping a coin and observing whether heads is flipped. However, if we consider just rolling a six, then this outcome happens less often than rolling one of the remaining faces.

    Statisticians, mathematicians, and scientists use a mathematical measure called probability to talk about how likely it is for a random outcome to occur.

    Definition: Probability

    The probability that an outcome from a random experiment occurs is a number between 0 and 1, inclusive, that reflects the proportion of time the outcome would occur in a very large number of replications of the experiment under the same conditions for each replication.

    You will also encounter another measure of how often random events occur, called chance. Chance is the same measure as probability but expressed as a percentage instead of a proportion.

    Definition: Chance

    The chance that an outcome from a random experiment occurs is a percentage between 0% and 100%, inclusive, that reflects the percentage of time the outcome would occur in a very large number of replications of the experiment under the same conditions for each replication.

    Hence, if we have determined that an outcome has a probability equal to 𝑝, then the chance of the same outcome is equal to 𝑝×100%.

    This is an informal and intuitive definition of probability and chance that is based on a particular mathematical interpretation of probabilities known as frequentism. This is not the formal mathematical definition used by statisticians and mathematicians, but it is a convenient definition that allows for easy intuitive interpretation. Indeed, the mathematical definition of probability took a little over 250 years to develop from the initial investigations by Blaise Pascal (Figure \(\PageIndex{1}\) and Pierre Fermat (Figure 7.4) in the 1650s to the pioneering mathematical work of Andrey Kolmogorov (Figure \(\PageIndex{3}\)) in the early twentieth century.

    undefined
    Figure \(\PageIndex{1}\): French mathematician Blaise Pascal (1623–1662) who, along with Pierre Fermat, developed the first widely known mathematical approach to probability theory (public domain image).
    A close-up of a person

AI-generated content may be incorrect.
    Figure \(\PageIndex{2}\): French mathematician Pierre Fermat (1607–1665) who, along with Blaise Pascal, developed the first mathematical approach to probability theory (public domain image).
    A person in a trench coat and tie

AI-generated content may be incorrect.
    Figure \(\PageIndex{3}\): Russian mathematician Andrey Kolmogorov (1903–1987), who developed the first complete and coherent mathematical theory of probability in the early twentieth century (photograph by Konrad Jacobs. This image is licensed under the Creative Commons Attribution-Share Alike 2.0 Germany license).

    The purpose here is not for you to become experts at computing probabilities for the outcomes of complex experiments. However, we will consider one simple case where probabilities are easy to compute for the sake of illustration and interpretation. Suppose an experiment has a set number of \(n\) possible outcomes. If we can assume that all the possible outcomes are equally likely to occur, then the probability that any one of the outcomes will occur is equal to \(1/n\). More generally, the probability of one of \(m\) possible outcomes from a set of \(n\) equally likely outcomes is \(m/n\) with \(m\leq n\). This is the method for computing probabilities that was used by Pascal and Fermat in the mid-seventeenth century. Today this method is often called the classical method for computing probabilities.

    Consider the simple example of a single coin flip. There are two possible outcomes for a coin flip: heads and tails. We will consider the simplified case where strange outcomes like the coin landing on edge are excluded. If we assume that the coin is fair and is flipped using a fair method, then we are equivalently assuming that heads and tails are equally likely to occur, and the classical method then states that the probability of flipping heads is \(1/2\) and the probability of flipping tails is \(1/2\). If you would rather use chance, then the chance that heads if flipped is 50% and the chance that tails if flipped is also 50%. The definition of probability then implies that if we flipped such a coin a very large number of times, then the proportion of times we would observe heads would be close to \(1/2\) and the proportion of times we observed tails would also be very close to \(1/2\). Probabilistic theory tells us that the proportion will probably not be exactly equal to \(1/2\) but would tend to get closer as the number of times we flip the coin increases. From an intuitive viewpoint this would mean that when you see a probability near \(1/2\), or a chance near 50%, the behavior is somewhat like observing heads in a coin flip.

    Now consider a standard six-sided die (Figure \(\PageIndex{4}\)), where there are six possible outcomes so that \(n=6\). If we assume that the die is fair, and that it is rolled in a fair way, then each outcome should be equally likely with a probability equal to \(1/6\), about 0.1667 as a decimal, or about a 17% chance. The definition of probability then implies that if we rolled such a die a very large number of times, the proportion of times we would observe 1 would be close to 0.1667. Because you are probably very familiar with rolling a die, you can use this experience to help you think about probabilities. From an intuitive viewpoint this would mean that when you see a probability near 0.17, or a chance near 17%, the behavior is somewhat like rolling a 1 with a fair die.

    A green die with yellow numbers on it

AI-generated content may be incorrect.
    Figure \(\PageIndex{4}\): A standard six-sided die (public domain photograph created by Alan M. Polansky).

    If we were interested in other rolls, the classical method is equally effective in computing those probabilities as well. For example, if we want to compute the probability of rolling either a one or a two, then \(m=2\), \(n=6\), and the probability is \(2/6=1/3\). The probability of rolling an even number, either a 2, 4, or 6, is \(3/6=1/2\).

    As anyone who has played various role-playing games knows, a die can have a different number of other faces than the usual six. Figure \(\PageIndex{5}\) shows a die with twenty sides. If such a die is fair and is rolled in a fair way, the probability of observing a 1 is \(1/20=0.05\), or a chance of 5%. The definition of probability then implies that if we rolled such a die a very large number of times, the proportion of times we would observe 1 would be close to 0.05. From an intuitive viewpoint this would mean that when you see a probability near 0.05, or a chance near 5%, then the behavior is somewhat like rolling a 1 with a fair die that has twenty sides. As a last example we can consider a specialized die that has 100 sides (Figure \(\PageIndex{6}\)). If such a die is fair and is rolled in a fair way, the probability of observing a 1 is \(1/100=0.01\), or a chance of 1%. The definition of probability then implies that if we rolled such a die a very large number of times, the proportion of times we would observe 1 would be close to 0.01. From an intuitive viewpoint this would mean that when you see a probability near 0.01, or a chance near 1%, the behavior is somewhat like rolling a 1 with a fair die that has one hundred sides.

    A green die with yellow numbers on it

AI-generated content may be incorrect.
    Figure \(\PageIndex{5}\): A twenty-sided die (public domain photograph created by Alan M. Polansky)
    A gold and silver ball with numbers on it

AI-generated content may be incorrect.
    Figure \(\PageIndex{6}\): A one hundred-sided die (public domain photograph created by Alan M. Polansky)

    Conceptually we can use these examples to try to intuitively understand how likely outcomes are in other applications. If a research study reports a probability equal to some number \(p\), we can intuitively visualize a fair die with roughly \(1/p\) sides and the outcome that the study is referring to would happen as often as rolling a die with \(1/p\) sides and observing a 1. Note that \(1/p\) will often not be a whole number, but you can usually round the number up or down to a whole number without affecting things too much. For example, a research study may report that the probability that no women would be hired from a pool of applications, assuming there was no gender discrimination in the application evaluation process, is 0.005. Setting \(1/p=1/0.005=200\) it follows that so that the outcome would occur about as often as rolling a fair die with 200 sides in a fair way and observing a 1. Similarly, the probability of winning the jackpot in a popular high-stakes lottery is 0.000000003422. Setting \(p=0.000000003422\), it follows that

    \[ \frac{1}{p}=\frac{1}{0.000000003422}=292,226,767, \nonumber \]

    so that the outcome would occur about as often as rolling a fair die with 292,226,767 sides in a fair way and observing a 1.

    Beyond games of chance, probability plays a major role in research studies. We will consider many of these types of studies, though we will usually not go into specific details about how the probabilities were computed. However, a few examples will help you understand the types of calculations that researchers must use to get the probabilities that they compute.

    A typical, but greatly simplified, argument in an employment discrimination case can be considered using the following example. Suppose that seven equally qualified individuals apply for a job at a large local company. Three of the applicants are Hispanic, two are African American, and two are white. The two white individuals are hired from the pool of applicants. Was there discrimination in the hiring process?

    One method for approaching this question is to consider it as a problem in probability theory. If the applicants are equally qualified and the hiring process is completely fair with respect to race, it can be argued that the two individuals selected for employment are selected at random with respect to race. That is, this problem can be thought of as putting all the candidates' races on slips of paper and drawing two of them at random, such as is done in a raffle, where the drawing is done in such a way that each slip of paper is equally likely to be drawn. All the possible draws, without regard to the order in which the papers are drawn, are listed in Table 7.1. In the table the three Hispanic candidates are labeled as H1, H2, and H3, the two African American candidates are labeled as A1 and A2, and the two white candidates are labeled as W1 and W2. The mathematical theory of probability can be used to show that if the selection of each slip of paper is fair on each draw, then each of these twenty-one possibilities will have equal probability of \(1/21=0.0476\), or about 0.05 for simplicity.

    Table 7.1 The 21 possible choices of two individuals from a hiring pool of seven individuals, three of whom are Hispanic (H1, H2, and H3), two of whom are African American (A1 and A2), and two of whom are white (W1 and W2).

    H1, H2

    H1, H3

    H1, A1

    H1, A2

    H1,W1

    H1, W2

    H2, H3

    H2, A1

    H2, A2

    H2, W1

    H2, W2

    H3, A1

    H3, A2

    H3, W1

    H3, W2

     

    A1, A2

    A1, W1

    A1, W2

     
     

    A2, W1

    A2, W2

    W1, W2

     

    In observing the possible selections listed in Table 7.1, there is only one which corresponds to both white candidates being selected. Therefore, we can argue that if the selection of the candidates is completely fair with respect to race, then the probability that both white candidates would be selected at random is about \(1/21\). Now we can interpret the probability in terms of the methodology outlined above. If the selection process was fair, then the probability that two white candidates would be selected would happen about as often as rolling a 1 with a twenty one-sided fair die. We can either believe that the process was unfair, and the two white candidates were selected out of racial bias, or we can believe that a rare event occurred that would only happen about 1 in 21 times.

    In the final analysis we do not know for sure which has happened, but can we try to weigh this outcome as evidence that discrimination has been proven beyond a reasonable doubt? That is, does the 5% chance that the selection could have happened create enough reasonable doubt to conclude that the corresponding selection was fair? This is something that everyone can only answer for themselves, but traditionally, in statistical calculations, this probability would be low enough to conclude that there must have been at least some bias in favor of the white candidates.

    Suppose that there had been twenty-five candidates, of whom only two were white. We could construct a table like the one shown in Table 7.1. For that table there would be 300 possible selections, only one of which corresponds to both white candidates being selected. Therefore, we can argue that if the selection of the candidates is completely fair with respect to race, then the probability that both white candidates would be selected at random is about \(1/300\approx 0.0031\). Now we can interpret the probability in terms of the methodology outlined above. If the selection process was fair, then the probability that the two white candidates would be selected is as likely as rolling a 1 with a three hundred-sided fair die. Thus, we can either believe that the process was unfair, and the two white candidates were selected out of racial bias, or we can believe that a rare event occurred that would only happen in about 1 in 300 times.

    This type of evidence is usually not the only type of evidence presented in discrimination litigation, but it can provide a powerful tool to aid in describing how unusual some hiring outcomes can be. Similar calculations can even be used to aid human resource departments in determining if there is bias in their advertising and recruitment techniques. For example, say it is known that 10% of a recruitment population is known to have a certain characteristic, and we have cultivated a pool of 100 applicants. We would expect about 10 individuals in the pool, which is 10% of the hiring pool, to have that characteristic. If only four people in the hiring pool have that characteristic, then we have some evidence that the cultivation process may be biased. We can get more concrete information by computing the probability that we would get four or fewer applicants with that characteristic. That probability turns out to be about 0.0237. Hence, the probability of us getting four or fewer individuals with this characteristic is about the same as rolling a 1 with a 42-sided die, based on the calculation \(1/p\approx42.171\). Once again, it is up to the individual to determine whether this probability is small enough to conclude that the recruitment process is biased.

    This type of argument provides a preview of the process of formal statistical inference that will be covered later in the book. In that process a hypothesis is proposed, and then data are gathered to find evidence against the truth of the hypothesis. The evidence is evaluated by computing the probability that more evidence could be found against the hypothesis under the assumption that the hypothesis is true. If the probability is small, then there is an indication that the hypothesis is false. In the example above the hypothesis is that the recruitment process is not biased. The probability 0.0237 is the probability that even more evidence than would be found in the data could suggest the process is biased. Because this probability is relatively small, there is an indication of a considerable amount of evidence against the hypothesis that the recruitment process is not biased, and therefore we might be convinced that the process is biased.


    This page titled 7.4: Probability is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by .

    • Was this article helpful?