5.3: Discrete Distributions- Bernoulli and Binomial
- Page ID
- 58904
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)The Sock Drawer Scenario
You’re late to class and grab a sock, just one, from a messy drawer. The drawer has a mix of black and white socks, but you don’t know the exact count.
"Is this sock black?"
That’s a yes/no question. If you define “black sock = success”, then pulling a sock is a simple Bernoulli trial.
There are just two outcomes:
- Success (it's black), with some probability \( p \)
- Failure (not black), with probability \( 1 - p \)
Whether you realize it or not, this super simple setup — one random trial, two possible outcomes — is the starting point for some of the most powerful models in statistics. And it all starts with a sock.
Let’s break down what the Bernoulli distribution really is...
Definition: Bernoulli Distribution
The Bernoulli Distribution, named after Jacob Bernoulli, is the simplest kind of discrete probability distribution. It models the outcomes of a single experiment (called a Bernoulli trial) that has only two possible outcomes:
- Success, with probability \( p \)
- Failure, with probability \( 1 - p \)
Note that success and failure do not have to carry any specific meanings. We often use terminology such as "yes, true, success" versus "no, false, failure", but it could also be the difference between heads and tails of a coin flip or any other binary. We may also use the term 'success' to denote something with a negative connotation, such as developing side-effects from a medication, or an extreme weather event successfully materializing.
If we define a random variable \( X \) such that:
- \( X = 1 \) if the outcome is a success
- \( X = 0 \) if the outcome is a failure
Then we say:
\( X \sim \text{Bernoulli}(p) \)
For the Bernoulli distribution, we can find a formula for the probability mass function (PMF). Hopefully this illustrates the reason why we have different terminology. The distribution represents the overall distributions of probabilities and can tell us about complicated events for multiple outcomes, whereas the PMF gives specifically the probability of a single outcome. The PMF of the Bernoulli distribution can be summarized in the following way:
\[\begin{aligned}P(X = 1) &= p\\ P(X = 0) &= 1 - p\end{aligned}\]
For more complicated distributions we will need an algebraic expression. For the Bernoulli, it is given by:
\( P(X = x) = p^x (1 - p)^{1 - x} \) for \( x = 0 \) or \( x = 1 \)
Where:
- \( p \) = probability of success (a number between 0 and 1)
- \( X \) takes on only two values: 0 or 1
If \(X = 1\) then \(P(X = 1) = p^1\cdot(1-p)^{1-1} = p\) and if \(X = 0\), then \(P(X = 0) = p^0\cdot (1-p)^{1-0} = 1-p\)
The expected value (mean) of a Bernoulli random variable is:
\( \mu = E(X) = p \)
Example: Guessing a Multiple Choice Question
Suppose that you take a guess on a multiple choice test question with 4 answers. There is a 1 in 4 chance of guessing the correct answer. This can be considered a Bernoulli trial with \(p = 0.25\). Furthermore, in this scenario, X also represents the points allotted for this question. You get 1 point if you guess correctly, and 0 if you do not.
| Outcome | Random Variable \(X\) | \(P(X)\) |
|---|---|---|
| Guess incorrect | 0 | 0.75 |
| Guess correct | 1 | 0.25 |
There is not a lot of investigation that can be performed with the Bernoulli distribution. Instead we move on to the binomial distribution, which asks the question about a series of Bernoulli trials.
The Binomial Distribution
Definition: Binomial Distribution
The Binomial Distribution is a discrete probability distribution that models the number of successes in a fixed number of repeated, independent Bernoulli trials, where each trial has only two possible outcomes: success or failure.
You can use the Binomial distribution when:
- The number of trials, \( n \), is fixed.
- Each trial is independent of the others (the results of one trial do not affect subsequent trials)
- Each trial results in either success or failure (a Bernoulli trial).
- The probability of success, \( p \), is the same for every trial.
We sometimes call these trials IID or independent and identically distributed.
If a random variable \( X \) counts the number of successes, we write:
\( X \sim \text{Binomial}(n, p) \)
The probability of getting exactly \( x \) successes in \( n \) trials is given by the Binomial probability formula:
\( P(X = x) = {n \choose x} p^x (1 - p)^{n - x} \)
Where:
- \( n \) = number of trials
- \( x \) = number of successes
- \( p \) = probability of success on a single trial
- \( {n \choose x} \) = number of ways to choose \( x \) successes from \( n \) trials (combination from chapter 4)
After seeing the PMF, the expected value of a Binomial distribution is surprisingly simple! It is just given by the formula:
\[\mu = n\cdot p\]
Example: Multiple Choice Test
Let's revisit a multiple choice test, this time with 5 questions each with 4 possible choices. Since there is a 1 in 4 chance of guessing each question correct, we have:
- \(p = 0.25\)
- \(n = 5\)
- \(X\): number of correctly guessed answers out of 5.
We can use the PMF for the Binomial distribution to calculate the probabilities of getting a certain number of correct questions on the test. The following values were actually calculated with Excel, using the formula: =BINOM.DIST(x, n, p, FALSE)
| Number of Correct Answers (X) | Probability \( P(X) \) |
|---|---|
| 0 | \( P(X=0) = {5 \choose 0}(0.25)^0(0.75)^5 = 1 \cdot 1 \cdot 0.2373 = \mathbf{0.23731} \) |
| 1 | \( P(X=1) = {5 \choose 1}(0.25)^1(0.75)^4 = 5 \cdot 0.25 \cdot 0.3164 = \mathbf{0.39551} \) |
| 2 | \( P(X=2) = {5 \choose 2}(0.25)^2(0.75)^3 = 10 \cdot 0.0625 \cdot 0.4219 = \mathbf{0.26367} \) |
| 3 | \( P(X=3) = {5 \choose 3}(0.25)^3(0.75)^2 = 10 \cdot 0.0156 \cdot 0.5625 = \mathbf{0.08789} \) |
| 4 | \( P(X=4) = {5 \choose 4}(0.25)^4(0.75)^1 = 5 \cdot 0.0039 \cdot 0.75 = \mathbf{0.01465} \) |
| 5 | \( P(X=5) = {5 \choose 5}(0.25)^5(0.75)^0 = 1 \cdot 0.00098 \cdot 1 = \mathbf{0.00098} \) |
This is a probability distribution for this scenario! Feel free to double check that all the probability values sum to 1. We can answer some questions using this distribution:
- The most likely number of correct answers to obtain from guessing is 1, with a probability of 0.3955
- There is only a 0.00098 chance of getting them all correct! To put this into perspective, this is a little less than a 1 in 1000 chance.
- If a passing grade is getting at least 4 out of 5 correct, then there is a 0.01465 + 0.00098 = 0.01563 chance of passing by guessing.
- The expected value is: \(\mu 5\cdot 0.25 = 1.25\). It is impossible to get this exact value; it is the average number of correct answers that many people guessing would end up with. You can compare to the distribution below and note that 1.25 would be roughly in the middle of the distribution.
- Finally, note that this distribution is skewed right!

Example: Manufacturing
A common use of statistics is in quality control for engineering and manufacturing. During the process of manufacturing, a combination of sloppy tolerances, human error, environmental conditions, and luck all contribution to the potential for faulty manufacturing. Companies do a significant amount of calculations to determine how likely manufacturing errors are to occur and then factor that into the price and warranty for objects. When we have particularly high rates of manufacturing errors, especially if these can lead to injury, companies may be forced to recall products. Recalls are a good thing for the public versus letting a defect go unfixed, but companies can do their best to minimize the chance of these defects to avoid facing these costly recalls, warranties, and returns.
Let's consider the manufacture of a cheap LED light, where it is calculated 1 in every 10000 bulbs is defective, and labeled a failure. We have:
- \(P(\text{Success}) = 0.9999\)
- \(P(\text{Failure}) = 0.0001\)
Suppose that we have a batch of 100 bulbs shipping out. What is the probability that we have 1 defective bulb in the batch (which means 99 successes)? We use the binomial PMF to calculate in Excel: =BINOM.DIST(99, 100, 0.9999, FALSE)
\[ P(X = 99) = {100 \choose 99}(0.9999)^{99}\cdot (0.0001)^1 \approx 0.0099 \]
This is a small chance, but still big enough to be worrisome!


