Skip to main content
Statistics LibreTexts

5.6: The Hypergeometric Distribution

  • Page ID
    26062
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    The hypergeometric distribution arises when one samples from a finite population, thus making the trials dependent on each other. There are five characteristics of a hypergeometric experiment.

    Characteristics of a hypergeometric experiment

    1. You take samples from two groups.
    2. You are concerned with a group of interest, called the first group.
    3. You sample without replacement from the combined groups. For example, you want to choose a softball team from a combined group of 11 men and 13 women. The team consists of ten players.
    4. Each pick is not independent, since sampling is without replacement. In the softball example, the probability of picking a woman first is \(\frac{13}{24}\). The probability of picking a man second is \(\frac{11}{23}\) if a woman was picked first. It is \(\frac{10}{23}\) if a man was picked first. The probability of the second pick depends on what happened in the first pick.
    5. You are not dealing with Bernoulli Trials.

    The outcomes of a hypergeometric experiment fit a hypergeometric probability distribution. The random variable \(X\) = the number of items from the group of interest.

    Example \(\PageIndex{1}\)

    A candy dish contains 100 jelly beans and 80 gumdrops. Fifty candies are picked at random. What is the probability that 35 of the 50 are gumdrops? The two groups are jelly beans and gumdrops. Since the probability question asks for the probability of picking gumdrops, the group of interest (first group) is gumdrops. The size of the group of interest (first group) is 80. The size of the second group is 100. The size of the sample is 50 (jelly beans or gumdrops). Let \(X =\) the number of gumdrops in the sample of 50. \(X\) takes on the values \(x = 0, 1, 2, ..., 50\). What is the probability statement written mathematically?

    Answer

    \(P(x = 35)\)

    Example \(\PageIndex{2}\)

    Suppose a shipment of 100 DVD players is known to have ten defective players. An inspector randomly chooses 12 for inspection. He is interested in determining the probability that, among the 12 players, at most two are defective. The two groups are the 90 non-defective DVD players and the 10 defective DVD players. The group of interest (first group) is the defective group because the probability question asks for the probability of at most two defective DVD players. The size of the sample is 12 DVD players. (They may be non-defective or defective.) Let \(X =\) the number of defective DVD players in the sample of 12. \(X\) takes on the values \(0, 1, 2, \dotsc, 10\). \(X\) may not take on the values 11 or 12. The sample size is 12, but there are only 10 defective DVD players. Write the probability statement mathematically.

    Answer

    \(P(x \leq 2)\)

    Example \(\PageIndex{3}\)

    You are president of an on-campus special events organization. You need a committee of seven students to plan a special birthday party for the president of the college. Your organization consists of 18 women and 15 men. You are interested in the number of men on your committee. If the members of the committee are randomly selected, what is the probability that your committee has more than four men?

    This is a hypergeometric problem because you are choosing your committee from two groups (men and women).

    1. Are you choosing with or without replacement?
    2. What is the group of interest?
    3. How many are in the group of interest?
    4. How many are in the other group?
    5. Let \(X =\) _________ on the committee. What values does \(X\) take on?
    6. The probability question is \(P(\)_______\()\).

    Solution

    1. without
    2. the men
    3. 15 men
    4. 18 women
    5. Let \(X =\) the number of men on the committee. \(x = 0, 1, 2, \dotsc, 7\).
    6. \(P(x > 4)\)

    Notation for the Hypergeometric: \(H =\) Hypergeometric Probability Distribution Function

    \[X \sim H(r, b, n)\]

    Read this as "\(X\) is a random variable with a hypergeometric distribution." The parameters are \(r, b\), and \(n\); \(r =\) the size of the group of interest (first group), \(b =\) the size of the second group, \(n =\) the size of the chosen sample.

    Example \(\PageIndex{4}\)

    A school site committee is to be chosen randomly from six men and five women. If the committee consists of four members chosen randomly, what is the probability that two of them are men? How many men do you expect to be on the committee?

    Let \(X\) = the number of men on the committee of four. The men are the group of interest (first group).

    \(X\) takes on the values \(0, 1, 2, 3, 4\), where \(r = 6, b = 5\), and \(n = 4\). \(X \sim H(6, 5, 4)\)

    Find \(P(x = 2)\). \(P(x = 2) = 0.4545\) (calculator or computer)

    Currently, the TI-83+ and TI-84 do not have hypergeometric probability functions. There are a number of computer packages, including Microsoft Excel, that do.

    The probability that there are two men on the committee is about 0.45.

    The graph of \(X \sim H(6, 5, 4)\) is:

    This graph shows a hypergeometric probability distribution. It has five bars that are slightly normally distributed. The x-axis shows values from 0 to 4 in increments of 1, representing the number of men on the four-person committee. The y-axis ranges from 0 to 0.5 in increments of 0.1.
    Figure \(\PageIndex{1}\).

    The y-axis contains the probability of \(X\), where \(X =\) the number of men on the committee.

    You would expect \(m = 2.18\) (about two) men on the committee.

    The formula for the mean is

    \[\mu = \frac{nr}{r+b} \frac{(4)(6)}{6+5} = 2.18\]

    Summary

    A hypergeometric experiment is a statistical experiment with the following properties:

    • You take samples from two groups.
    • You are concerned with a group of interest, called the first group.
    • You sample without replacement from the combined groups.
    • Each pick is not independent, since sampling is without replacement.
    • You are not dealing with Bernoulli Trials.

    The outcomes of a hypergeometric experiment fit a hypergeometric probability distribution. The random variable \(X\) = the number of items from the group of interest. The distribution of \(X\) is denoted \(X \sim H(r, b, n)\), where \(r =\) the size of the group of interest (first group), \(b =\) the size of the second group, and \(n =\) the size of the chosen sample. It follows that \(n \leq r + b\). The mean of \(X\) is \(\mu = \frac{nr}{r+b}\) and the standard deviation is \(\sigma = \sqrt{\frac{rbn(r+b-n)}{(r+b)^{2}(r+b-1)}}\).

    Formula Review

    \(X \sim H(r, b, n)\) means that the discrete random variable \(X\) has a hypergeometric probability distribution with \(r =\) the size of the group of interest (first group), \(b =\) the size of the second group, and \(n =\) the size of the chosen sample.

    \(X\) = the number of items from the group of interest that are in the chosen sample, and \(X\) may take on the values \(x = 0, 1, \dotsc,\) up to the size of the group of interest. (The minimum value for \(X\) may be larger than zero in some instances.)

    \(n \leq r + b\)

    The mean of \(X\) is given by the formula \(\mu = \frac{nr}{r+b}\) and the standard deviation is \(= \sqrt{\frac{rbn(r+b-n)}{(r+b)^{2}(r+b-1)}}\).

    Use the following information to answer the next five exercises: Suppose that a group of statistics students is divided into two groups: business majors and non-business majors. There are 16 business majors in the group and seven non-business majors in the group. A random sample of nine students is taken. We are interested in the number of business majors in the sample.

    Glossary

    Hypergeometric Experiment
    a statistical experiment with the following properties:
    1. You take samples from two groups.
    2. You are concerned with a group of interest, called the first group.
    3. You sample without replacement from the combined groups.
    4. Each pick is not independent, since sampling is without replacement.
    5. You are not dealing with Bernoulli Trials.
    Hypergeometric Probability
    a discrete random variable (RV) that is characterized by:
    1. A fixed number of trials.
    2. The probability of success is not the same from trial to trial.
    We sample from two groups of items when we are interested in only one group. \(X\) is defined as the number of successes out of the total number of items chosen. Notation: \(X \sim H(r, b, n)\), where \(r =\) the number of items in the group of interest, \(b =\) the number of items in the group not of interest, and \(n =\) the number of items chosen.

    Contributors and Attributions

    • Barbara Illowsky and Susan Dean (De Anza College) with many other contributing authors. Content produced by OpenStax College is licensed under a Creative Commons Attribution License 4.0 license. Download for free at http://cnx.org/contents/30189442-699...b91b9de@18.114.


    This page titled 5.6: The Hypergeometric Distribution is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by OpenStax via source content that was edited to the style and standards of the LibreTexts platform.