4.2: Geometric Distribution
- Page ID
- 56923
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)How long should we expect to flip a coin until it turns up heads? Or how many times should we expect to roll a die until we get a 1? These questions can be answered using the geometric distribution. We first formalize each trial – such as a single coin flip or die toss – using the Bernoulli distribution, and then we combine these with our tools from probability (Chapter 3) to construct the geometric distribution.
Bernoulli Distribution
Many health insurance plans in the United States have a deductible, where the insured individual is responsible for costs up to the deductible, and then the costs above the deductible are shared between the individual and insurance company for the remainder of the year.
Suppose a health insurance company found that 70% of the people they insure stay below their deductible in any given year. Each of these people can be thought of as a trial. We label a person a success if her healthcare costs do not exceed the deductible. We label a person a failure if she does exceed her deductible in the year. Because 70% of the individuals will not hit their deductible, we denote the as \(p = 0.7\). The probability of a failure is sometimes denoted with \(q = 1 - p\), which would be 0.3 for the insurance example.
When an individual trial only has two possible outcomes, often labeled as success or failure, it is called a Bernoulli random variable. We chose to label a person who does not hit her deductible as a “success” and all others as “failures”. However, we could just as easily have reversed these labels. The mathematical framework we will build does not depend on which outcome is labeled a success and which a failure, as long as we are consistent.
Bernoulli random variables are often denoted as for a success and for a failure. In addition to being convenient in entering data, it is also mathematically handy. Suppose we observe ten trials:
Then the sample proportion, \(\hat{p}\), is the sample mean of these observations:
\[\begin{aligned} \hat{p} = \dfrac{\text{# of successes}}{\text{# of trials}} = \dfrac{1+1+1+0+1+0+0+1+1+0}{10} = 0.6\end{aligned}\]
This mathematical inquiry of Bernoulli random variables can be extended even further. Because and are numerical outcomes, we can define the mean and standard deviation of a Bernoulli random variable. (See Exercises 4.15 and 4.16.)
If \(X\) is a random variable that takes value 1 with probability of success \(p\) and 0 with probability \(1-p\), then \(X\) is a Bernoulli random variable with mean and standard deviation
\[\begin{aligned} \mu &= p &\sigma&= \sqrt{p(1-p)} \end{aligned}\]
In general, it is useful to think about a Bernoulli random variable as a random process with only two outcomes: a success or failure. Then we build our mathematical framework using the numerical labels and for successes and failures, respectively.
Geometric Distribution
The geometric distribution is used to describe how many trials it takes to observe a success. Let’s first look at an example.
Suppose we are working at the insurance company and need to find a case where the person did not exceed her (or his) deductible as a case study. If the probability a person will not exceed her deductible is 0.7 and we are drawing people at random, what are the chances that the first person will not have exceeded her deductible, i.e. be a success? The second person? The third? What about we pull \(n - 1\) cases before we find the first success, i.e. the first success is the \(n^{th}\) person? (If the first success is the fifth person, then we say \(n=5\).)
Solution
The probability of stopping after the first person is just the chance the first person will not hit her (or his) deductible: 0.7. The probability the second person is the first to hit her deductible is
\[\begin{aligned} &P(\text{second person is the first to not hit deductible}) \\[4pt] &\quad = P(\text{the first will, the second won't}) = (0.3{})(0.7{}) = 0.21{} \end{aligned}\]
Likewise, the probability it will be the third case is \((0.3{})(0.3{})(0.7{}) = 0.063\).
If the first success is on the \(n^{th}\) person, then there are \(n-1\) failures and finally 1 success, which corresponds to the probability \((0.3{})^{n-1}(0.7{})\). This is the same as \((1-0.7{})^{n-1}(0.7{})\).
Example \(\PageIndex{1}\) illustrates what the geometric distribution, which describes the waiting time until a success for independent and identically distributed (iid) Bernoulli random variables. In this case, the independence aspect just means the individuals in the example don’t affect each other, and identical means they each have the same probability of success.
The geometric distribution from Example \(\PageIndex{1}\) is shown in Figure 4.8. In general, the probabilities for a geometric distribution decrease fast.
While this text will not derive the formulas for the mean (expected) number of trials needed to find the first success or the standard deviation or variance of this distribution, we present general formulas for each.
If the probability of a success in one trial is \(p\) and the probability of a failure is \(1-p\), then the probability of finding the first success in the \(n^{th}\) trial is given by
\[\begin{aligned} (1-p)^{n-1}p \end{aligned}\]
The mean (i.e. expected value), variance, and standard deviation of this wait time are given by
\[\begin{aligned} \mu &= \frac{1}{p} &\sigma^2 &=\frac{1-p}{p^2} &\sigma &= \sqrt{\frac{1-p}{p^2}} \end{aligned}\]
It is no accident that we use the symbol \(\mu\) for both the mean and expected value. The mean and the expected value are one and the same.
It takes, on average, \(1/p\) trials to get a success under the geometric distribution. This mathematical result is consistent with what we would expect intuitively. If the probability of a success is high (e.g. 0.8), then we don’t usually wait very long for a success: \(1/0.8 = 1.25\) trials on average. If the probability of a success is low (e.g. 0.1), then we would expect to view many trials before we see a success: \(1/0.1 = 10\) trials.
The probability that a particular case would not exceed their deductible is said to be 0.7. If we were to examine cases until we found one that where the person did not hit her deductible, how many cases should we expect to check?
- Answer
-
We would expect to see about \(1/0.7 ≈ 1.43\) individuals to find the first success.
What is the chance that we would find the first success within the first 3 cases?
Solution
This is the chance it is the first (\(n=1\)), second (\(n=2\)), or third (\(n=3\)) case is the first success, which are three disjoint outcomes. Because the individuals in the sample are randomly sampled from a large population, they are independent. We compute the probability of each case and add the separate results:
\[\begin{aligned} &P(n=1, 2, \text{ or }3) \\[4pt] & \quad = P(n=1)+P(n=2)+P(n=3) \\[4pt] & \quad = (0.3{})^{1-1}(0.7{}) + (0.3{})^{2-1}(0.7{}) + (0.3{})^{3-1}(0.7{}) \\[4pt] & \quad = 0.973{} \end{aligned}\]
There is a probability of 0.973 that we would find a successful case within 3 cases.
Determine a more clever way to solve Example \(\PageIndex{2}\). Show that you get the same result.
- Answer
-
First find the probability of the complement: P(no success in first 3 trials) = 0.33 = 0.027. Next, compute one minus this probability: 1 − P(no success in 3 trials) = 1 − 0.027 = 0.973.
Suppose a car insurer has determined that 88% of its drivers will not exceed their deductible in a given year. If someone at the company were to randomly draw driver files until they found one that had not exceeded their deductible, what is the expected number of drivers the insurance employee must check? What is the standard deviation of the number of driver files that must be drawn?
Solution
In this example, a success is again when someone will not exceed the insurance deductible, which has probability \(p = 0.88\). The expected number of people to be checked is \(1 / p = 1 / 0.88 = 1.14\) and the standard deviation is \(\sqrt{(1-p)/p^2} = 0.39\).
Using the results from Example \(\PageIndex{3}\), \(\mu = 1.14\) and \(\sigma = 0.39\), would it be appropriate to use the normal model to find what proportion of experiments would end in 3 or fewer trials?
- Answer
-
No. The geometric distribution is always right skewed and can never be well-approximated by the normal model.
The independence assumption is crucial to the geometric distribution’s accurate description of a scenario. Mathematically, we can see that to construct the probability of the success on the \(n^{th}\) trial, we had to use the Multiplication Rule for Independent Processes. It is no simple task to generalize the geometric model for dependent trials


