5.2: Characteristics of the Normal Distribution and The Empirical Rule
- Page ID
- 48792
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)In the last lesson, we learned about continuous random variables and their probability distributions. A continuous random variable is special because it can take on any of an infinite number of possible values which cannot be listed in any order (or counted). Continuous probability distributions can take on a variety of shapes including uniform, bi- or multi-modal, or bell-shaped, etc., as long as the total area under the curve is 1.
A bell-shaped curve is one that is symmetric, and has a single mode in the center, and has two skinny tails. It is used to represent situations where the random variable is more likely to take on values closer to its average and less likely to take on extreme values. One distribution that we will be using often is called the normal distribution for which there is a formula allowing one to find the precise areas (probabilities) for any range of values of the continuous random variable. There are many examples of situations that warrant the use of the normal distribution.
For example, the heights of randomly selected men have a distribution that is approximately normal. The mean of this population is \(\mu=69\) inches, and the population standard deviation is \(\sigma=3\) inches.
Images are created with the graphing calculator, used with permission from Desmos Studio PBC.
Some symbols to note:
- \(\mu\) (“mu”) represents a population mean.
- \(\bar{x}\) (“x-bar”) represents a sample mean.
- \(\sigma\) (“sigma”) represents a population standard deviation.
- s (“s”) represents a sample standard deviation.
Characteristics of the Normal Distribution
A normal distribution with mean \(\mu\) and standard deviation has the following characteristics:
- The mean, median, and mode are equal. 50% of all values are below the mean and 50% are above it.
- The normal curve is bell-shaped and symmetric about its mean \(\mu\).
- The total area under the normal curve is 1.
- Normal curves extend endlessly in both directions, but the curve becomes so close to zero for values of the random variable that are more than 4 standard deviations above or below the mean that the area is negligible.
- The normal curve changes its curvature from a “cup” shape to a “cap” shape or vice-versa at one standard deviation below and above the mean. These points are called inflection points.
Images are created with the graphing calculator, used with permission from Desmos Studio PBC.
A normal distribution is determined by its mean and its standard deviation . All normal distributions have the same shape but one distribution might have a mean of 69 whereas another has a mean of 90. This manifests on the graph as a horizontal translation. One normal distribution might have a standard deviation of 10 whereas another might have a standard deviation of 2. Standard deviation is a measure of spread so the higher the standard deviation, the more spread out the data is, and we see a flatter curve. The lower the standard deviation, the less spread there is, resulting in a curve that is narrow with a tall peak in the middle. Standard deviation manifests in the graph as horizontal stretching or shrinking. Explore these characteristics using this desmos graph. Click the play button left of m to see how changing the mean changes the graph of the normal distribution. Click pause to stop the animation. Click the play button next to \(s\) to see how changing the standard deviation changes the graph of the normal distribution.
Images are created with the graphing calculator, used with permission from Desmos Studio PBC.
- Match the following normal distributions with the appropriate means and standard deviations. Explain your reasoning for your choices.
- \(\mu=12, \sigma=7\)
- \(\mu=3, \sigma=3\)
- \(\mu=3, \sigma=2\)
- \(\mu=12, \sigma=3\)
The Empirical Rule
All normal distributions are the same with respect to their mean and standard deviation. Using the normal distribution as the model, about 68% of values lie within one standard deviation of the mean. About 95% of values lie within two standard deviations from the mean, and about 99.7% of values lie within three standard deviations of the mean. These approximations are collectively known as the Empirical Rule.
The graph below is of a normal distribution with the mean \(\mu\), located in the center of the graph. We see that about 68% of the area under the curve is between \(\mu-1\sigma\) (one standard deviation below the mean) and \(\mu+1\sigma\) (one standard deviation above the mean).
Images are created with the graphing calculator, used with permission from Desmos Studio PBC.
In other words, this tells us that about 68% of all values in a normal population lie within one standard deviation of the mean.
For the region between \(\mu-2\sigma\) (two standard deviations below the mean) and \(\mu+2\sigma\) (two standard deviations above the mean), the area under the curve is about 95%.
Images are created with the graphing calculator, used with permission from Desmos Studio PBC.
In other words, this tells us that about 95% of all values in a normal population lie within two standard deviations of the mean.
For the region between \(\mu-3\sigma\) (three standard deviations below the mean) and \(\mu+3\sigma\) (three standard deviations above the mean), the area under the curve is about 99.7%.
Images are created with the graphing calculator, used with permission from Desmos Studio PBC.
In other words, this tells us that about 99.7% of all values in a normal population lie within three standard deviations of the mean. The remaining 0.3% are split among the more extreme values that lie beyond three standard deviations above or below the mean.
The Empirical Rule tells us that only about 5% of values in a normal distribution are more than two standard deviations from the mean. We will say that any value that is more than two standard deviations from the mean are considered unusual.
Computing Probabilities Using the Empirical Rule
- Recall that adult male heights are approximately normal. The average adult male height is 69 inches and the standard deviation is 3 inches.
- Use this information and what you know about the normal distribution to label the tick marks on the horizontal axis, noting that the distance between each tick mark is 1 standard deviation.
Images are created with the graphing calculator, used with permission from Desmos Studio PBC.
- Remembering that the total area under the probability curve is always 1, find the area of each of the eight regions above.
- Suppose that \(x\) represents a random adult man’s height. We want to know the probability that the man is between 72 and 78 inches tall. Using probability notation, this is written as \(P(72<x<78)\). Use the completed graph above to compute this probability.
- Find \(P(x \geq 75)\)
- Find \(P(x<66)\)
- Find \(P(60 \leq x \leq 69)\)
- Find \(P(66<x<75)\)
- Do we have enough information to find the proportion of adult men who are shorter than 64 inches tall? Explain.
- Use this information and what you know about the normal distribution to label the tick marks on the horizontal axis, noting that the distance between each tick mark is 1 standard deviation.
Z-scores
Determining if a value is unusual or not depends on the number of standard deviations it is from the mean. We measure the distance between values using the standard deviation to define the ruler. We define a Z-score to be the number of standard deviations a value is from the mean. We compute a Z-score for a given value of x using the following formula:
\[Z=\frac{x-\mu}{\sigma}\nonumber\]
- When Z is negative, x is below the mean.
- When Z is positive, x is above the mean.
- When Z is zero, x is equal to the mean.
- Label the horizontal axis with Z-scores below each x on the distribution of adult male heights.
Images are created with the graphing calculator, used with permission from Desmos Studio PBC.
- We have said that a value in a normal distribution is unusual if it is more than two standard deviations from the mean.
- What range of heights belongs to unusually tall men?
- What Z-scores correspond to these heights?
- What range of heights belongs to unusually short men?
- What Z-scores correspond to these heights?
- What range of heights belongs to unusually tall men?
Z-scores provide a standardized way to measure values. For example, when a value has a Z-score of 0.5, we know it is half of a standard deviation above the mean, even if we do not know the values of the mean and standard deviation. Z-scores can be used to compare values from two different populations by measuring their distance relative to each population's mean.
- A company is hiring a candidate for a high-level research position in a Chemistry lab. Only one position is available. The hiring committee narrows the choice down to two outstanding candidates. Both individuals are highly qualified for the job and have earned a doctorate in Chemistry from two different institutions. Each candidate was required to take a qualifying exam to earn their degrees. Their universities graded the qualifying exams on different scales. Candidate A earned 95 points on their qualifying exam where the average score was 73 and the standard deviation was 9. Candidate B earned 23 points on their qualifying exam where the average score was 18 and the standard deviation was 2.5 points. Using this information and Z-scores, who should be recommended for the position? Justify your answer.