5.5: The Geometric Distribution

Last updated
Save as PDF

Page ID: 26061

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

There are three main characteristics of a geometric experiment.

There are one or more Bernoulli trials with all failures except the last one, which is a success. In other words, you keep repeating what you are doing until the first success. Then you stop. For example, you throw a dart at a bullseye until you hit the bullseye. The first time you hit the bullseye is a "success" so you stop throwing the dart. It might take six tries until you hit the bullseye. You can think of the trials as failure, failure, failure, failure, failure, success, STOP.
In theory, the number of trials could go on forever. There must be at least one trial.
The probability, \(p\), of a success and the probability, \(q\), of a failure is the same for each trial. \(p + q = 1\) and \(q = 1 − p\). For example, the probability of rolling a three when you throw one fair die is \(\dfrac{1}{6}\). This is true no matter how many times you roll the die. Suppose you want to know the probability of getting the first three on the fifth roll. On rolls one through four, you do not get a face with a three. The probability for each of the rolls is \(q = \dfrac{5}{6}\), the probability of a failure. The probability of getting a three on the fifth roll is \(\left(\dfrac{5}{6}\right)\left(\dfrac{5}{6}\right)\left(\dfrac{5}{6}\right)\left(\dfrac{5}{6}\right)\left(\dfrac{1}{6}\right) = 0.0804\).

\(X =\) the number of independent trials until the first success.

You play a game of chance that you can either win or lose (there are no other possibilities) until you lose. Your probability of losing is \(p = 0.57\). What is the probability that it takes five games until you lose? Let \(X =\) the number of games you play until you lose (includes the losing game). Then \(X\) takes on the values 1, 2, 3, ... (could go on indefinitely). The probability question is \(P(x = 5)\).

Example \(\PageIndex{1}\)

Example \(\PageIndex{2}\)

A safety engineer feels that 35% of all industrial accidents in her plant are caused by failure of employees to follow instructions. She decides to look at the accident reports (selected randomly and replaced in the pile after reading) until she finds one that shows an accident caused by failure of employees to follow instructions. On average, how many reports would the safety engineer expect to look at until she finds a report showing an accident caused by employee failure to follow instructions? What is the probability that the safety engineer will have to examine at least three reports until she finds a report showing an accident caused by employee failure to follow instructions?

Let \(X\) = the number of accidents the safety engineer must examine until she finds a report showing an accident caused by employee failure to follow instructions. \(X\) takes on the values 1, 2, 3, .... The first question asks you to find the expected value or the mean. The second question asks you to find \(P(x \geq 3)\). ("At least" translates to a "greater than or equal to" symbol).

Example \(\PageIndex{3}\)

Suppose that you are looking for a student at your college who lives within five miles of you. You know that 55% of the 25,000 students do live within five miles of you. You randomly contact students from the college until one says he or she lives within five miles of you. What is the probability that you need to contact four people?

This is a geometric problem because you may have a number of failures before you have the one success you desire. Also, the probability of a success stays the same each time you ask a student if he or she lives within five miles of you. There is no definite number of trials (number of times you ask a student).

Let \(X =\) the number of ____________ you must ask ____________ one says yes.
What values does \(X\) take on?
What are \(p\) and \(q\)?
The probability question is \(P(\)_______\()\).

Solution

Let \(X =\) the number of students you must ask until one says yes.
1, 2, 3, …, (total number of students)
\(p = 0.55; q = 0.45\)
\(P(x = 4)\)

Notation for the Geometric: \(G =\) Geometric Probability Distribution Function

\(X \sim G(p)\)

Read this as "\(X\) is a random variable with a geometric distribution." The parameter is \(p\); \(p =\) the probability of a success for each trial.

Example \(\PageIndex{4}\)

Assume that the probability of a defective computer component is 0.02. Components are randomly selected. Find the probability that the first defect is caused by the seventh component tested. How many components do you expect to test until one is found to be defective?

Let \(X\) = the number of computer components tested until the first defect is found.

\(X\) takes on the values 1, 2, 3, ... where \(p = 0.02\). \(X \sim G(0.02)\)

Find \(P(x = 7)\). \(P(x = 7) = 0.0177\).

To find the probability that \(x = 7\),

Enter 2^nd, DISTR
Scroll down and select geometpdf(
Press ENTER
Enter 0.02, 7); press ENTER to see the result: \(P(x = 7) = 0.0177\)

To find the probability that \(x \leq 7\), follow the same instructions EXCEPT select E: geometcdf as the distribution function.

The probability that the seventh component is the first defect is 0.0177.

The graph of \(X \sim G(0.02)\) is:

This graph shows a geometric probability distribution. It consists of bars that peak at the left and slope downwards with each successive bar to the right. The values on the x-axis count the number of computer components tested until the defect is found. The y-axis is scaled from 0 to 0.02 in increments of 0.005. — Figure \(\PageIndex{1}\)

The y-axis contains the probability of \(x\), where \(X =\) the number of computer components tested.

The number of components that you would expect to test until you find the first defective one is the mean, \(\mu = 50\).

The formula for the mean is

\[\mu = \dfrac{1}{\text{p}} = \dfrac{1}{0.02} = 50\]

The formula for the variance is

\[\sigma^{2} = \left(\dfrac{1}{p}\right)\left(\dfrac{1}{p} - 1 \right) = \left(\dfrac{1}{0.02}\right)\left(\dfrac{1}{0.02} - 1 \right) = 2,450\]

The standard deviation is

\[\sigma = \sqrt{\left(\dfrac{1}{p}\right)\left(\dfrac{1}{p} - 1\right)} = \sqrt{\left(\dfrac{1}{0.02}\right)\left(\dfrac{1}{0.02} - 1\right)} = 49.5\]

Example \(\PageIndex{5}\)

The lifetime risk of developing pancreatic cancer is about one in 78 (1.28%). Let \(X =\) the number of people you ask until one says he or she has pancreatic cancer. Then \(X\) is a discrete random variable with a geometric distribution: \(X \sim G\left(\dfrac{1}{78}\right)\) or \(X \sim G(0.0128)\).

What is the probability of that you ask ten people before one says he or she has pancreatic cancer?
What is the probability that you must ask 20 people?
Find the (i) mean and (ii) standard deviation of \(X\).

Answer

\(P(x = 10) = \text{geometpdf}(0.0128, 10) = 0.0114\)
\(P(x = 20) = \text{geometpdf}(0.0128, 20) = 0.01\)
1. Mean \(= \mu = \dfrac{1}{p} = \dfrac{1}{0.0128} = 78\)
2. Standard Deviation \(= \sigma = \sqrt{\dfrac{1-p}{p^{2}}} = \sqrt{\dfrac{1-0.0128}{0.0128^{2}}} \approx 77.6234\)

References

“Millennials: A Portrait of Generation Next,” PewResearchCenter. Available online at www.pewsocialtrends.org/files...-to-change.pdf (accessed May 15, 2013).
“Millennials: Confident. Connected. Open to Change.” Executive Summary by PewResearch Social & Demographic Trends, 2013. Available online at http://www.pewsocialtrends.org/2010/...pen-to-change/ (accessed May 15, 2013).
“Prevalence of HIV, total (% of populations ages 15-49),” The World Bank, 2013. Available online at http://data.worldbank.org/indicator/...last&sort=desc (accessed May 15, 2013).
Pryor, John H., Linda DeAngelo, Laura Palucki Blake, Sylvia Hurtado, Serge Tran. The American Freshman: National Norms Fall 2011. Los Angeles: Cooperative Institutional Research Program at the Higher Education Research Institute at UCLA, 2011. Also available online at http://heri.ucla.edu/PDFs/pubs/TFS/N...eshman2011.pdf (accessed May 15, 2013).
“Summary of the National Risk and Vulnerability Assessment 2007/8: A profile of Afghanistan,” The European Union and ICON-Institute. Available online at ec.europa.eu/europeaid/where/...summary_en.pdf (accessed May 15, 2013).
“The World FactBook,” Central Intelligence Agency. Available online at www.cia.gov/library/publicat...k/geos/af.html (accessed May 15, 2013).
“UNICEF reports on Female Literacy Centers in Afghanistan established to teach women and girls basic resading [sic] and writing skills,” UNICEF Television. Video available online at http://www.unicefusa.org/assets/vide...y-centers.html (accessed May 15, 2013).

Review

There are three characteristics of a geometric experiment:

There are one or more Bernoulli trials with all failures except the last one, which is a success.
In theory, the number of trials could go on forever. There must be at least one trial.
The probability, \(p\), of a success and the probability, \(q\), of a failure are the same for each trial.

In a geometric experiment, define the discrete random variable \(X\) as the number of independent trials until the first success. We say that \(X\) has a geometric distribution and write \(X \sim G(p)\) where \(p\) is the probability of success in a single trial. The mean of the geometric distribution \(X \sim G(p)\) is \(\mu = \dfrac{1-p}{p^{2}} = \sqrt{\dfrac{1}{p}\left(\dfrac{1}{p} - 1\right)}\).

Contributors and Attributions

Barbara Illowsky and Susan Dean (De Anza College) with many other contributing authors. Content produced by OpenStax College is licensed under a Creative Commons Attribution License 4.0 license. Download for free at http://cnx.org/contents/30189442-699...b91b9de@18.114.

Formula Review

\(X \sim G(p)\) means that the discrete random variable \(X\) has a geometric probability distribution with probability of success in a single trial \(p\).

\(X =\) the number of independent trials until the first success

\(X\) takes on the values \(x = 1, 2, 3, \dotsc\)

\(p =\) the probability of a success for any trial

\(q =\) the probability of a failure for any trial \(p + q = 1\)

\(q = 1 – p\)

The mean is \(\mu = \dfrac{1}{p}\).

The standard deviation is \(\sigma = \dfrac{1-p}{p^{2}} = \sqrt{\dfrac{1}{p}\left(\dfrac{1}{p} - 1\right)}\).

Use the following information to answer the next six exercises: The Higher Education Research Institute at UCLA collected data from 203,967 incoming first-time, full-time freshmen from 270 four-year colleges and universities in the U.S. 71.3% of those students replied that, yes, they believe that same-sex couples should have the right to legal marital status. Suppose that you randomly select freshman from the study until you find one who replies “yes.” You are interested in the number of freshmen you must ask.

Footnotes

¹”Prevalence of HIV, total (% of populations ages 15-49),” The World Bank, 2013. Available online at http://data.worldbank.org/indicator/...pi_data_value- last&sort=desc (accessed May 15, 2013).

Glossary

Geometric Distribution: a discrete random variable (RV) that arises from the Bernoulli trials; the trials are repeated until the first success. The geometric variable \(X\) is defined as the number of trials until the first success. Notation: \(X \sim G(p)\). The mean is \(\mu = \dfrac{1}{p}\) and the standard deviation is \(\sigma =\); \[\sqrt{\dfrac{1}{p}\left(\dfrac{1}{p} - 1\right)}\]; . The probability of exactly \(x\) failures before the first success is given by the formula: \(P(X = x) = p(1 –p)^{x-1}\).

Geometric Experiment

a statistical experiment with the following properties:

There are one or more Bernoulli trials with all failures except the last one, which is a success.
In theory, the number of trials could go on forever. There must be at least one trial.
The probability, \(p\), of a success and the probability, \(q\), of a failure do not change from trial to trial.

Search

Text Color

Text Size

Margin Size

Font Type