4.5: Common Continuous Probability Distributions
- Page ID
- 41698
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Learning Objectives
- Sketch graphs of continuous random variable distributions based on a given description \(\text{geometric, }\left( \text{normal, }t\text{, and } \chi^{2} \right)\)
- Use basic geometry to determine probability measures in certain continuous random variable distributions
- Know key properties of other common continuous random variable distributions \(\left( \text{normal, }t\text{, and } \chi^{2} \right)\)
- Relate various regions of a continuous random variable's distribution with each other, specifically, relate any region to left-tail region(s)
Review and Preview
In Section \(4.4,\) we established the connection between area measures of regions under the probability density function of a continuous random variable and the probability of certain outcome intervals for that variable; namely, we showed that the area of the region over an interval is the probability value. We examined a few examples of probability distributions. In this section, we name and explore key properties of some of the most commonly used probability density functions in statistical work. We will end by examining how certain regions within our distributions can directly relate to other regions within our distributions using some basic geometric reasoning.
Distributions: Shapes from Basic Geometry
Our last section examined exercises involving two random variables with different distributions. As shown below in Figure \(\PageIndex{1},\) one is rectangular, and the other is triangular.
Figure \(\PageIndex{1}\): Two distributions with shapes from basic geometry
These two distribution shapes can be found in professions such as finance, ecology, business, education, and others. The rectangular-shaped distribution is a uniform probability distribution with a similar meaning to the uniform distribution on discrete random variables (such as a fair dice roll). Rather than each possible outcome having the same probability, every possible outcome has the same probability density. As a result, every interval of possible outcomes of the same width has an equal probability of occurring. For example, in the uniform distribution in Figure \(\PageIndex{1},\) an outcome between \(2\) and \(3\) is equally as likely as an outcome between \(7\) and \(8\). In the picture above, every interval of equal length contained in \([2,12]\) is equally likely. Naturally, there is not just one uniform distribution, but all uniform distributions on continuous variables form a family with some common properties. One such property is that all uniform distributions are symmetric. Using the idea of a "balance point for the center of mass," the mean of the uniform distribution above is at \(\mu = 7,\) the midpoint of the interval \([2,12].\) The median is also at \(7\) since \(50\%\) of the rectangle's area is below that \(x-\)value and \(50\%\) is above. In general, the mean and median of any uniform distribution will always be this midpoint value.
The triangular-shaped distribution on the right is commonly called a triangular probability distribution. Although the shape of the given triangular distribution in Figure \(\PageIndex{1}\) above is symmetric, this cannot be said of all triangular distributions. Again, we have a family of triangular distributions when probability density function curves form a triangular shape. With this symmetric triangular probability distribution in Figure \(\PageIndex{1},\) we can see the "balance point for the center of mass" to be at \(\mu=0.50\) on the horizontal scale and the median to also be at \(0.50.\) In some triangular distributions, determining the mean and the median is not as easy without more advanced skills in mathematics. Although we will not always determine these key statistical measures for every distribution, it is important to realize that these, and many of our summary statistics discussed in Chapter \(2,\) exist in probability distributions.
There are many distributions with shapes from basic geometry, such as semi-circles or trapezoids. But as discussed in the last section, the total area under the curve must always total \(1 = 100\%\) and the density function heights must always be non-negative \((f(x) \ge 0).\) If we can find the area of regions under the density functions over an interval, we can interpret those areas as probability values.
As a random variable, data on the daily growth of the height of wheat plants during a particular stage of development is believed to be uniformly distributed between \(\frac{1}{2}=0.5\) and \(\frac{5}{4} = 1.25 \) inches. Answer the following questions about this variable in the context of wheat growth.
- Sketch a graph of the probability distribution on the wheat's growth height, including appropriate labeling of both axes.
- Answer
-
Based on the information given, and choosing to work in decimal representation of values, we build the probability distribution graph by placing a horizontal line segment above our random variable's horizontal axis over the interval \([0.5, 1.25]\) and horizontal line segments on the \(x\)-axis for all other real numbers of the scale. Knowing that the total area under the distribution curve must equal \(1=100\%,\) and that our non-zero probability interval has a width of \(1.25 - 0.5\) \(=0.75,\) the height of the rectangular portion of the uniform distribution must satisfy\[\nonumber \text{height}= \frac{\text{area}}{\text{base}}=\frac{1}{0.75}=\frac{4}{3}\approx 1.3333.\]With full labeling of the axes, we produce the following graph of the probability distribution:
Figure \(\PageIndex{2}\): Probability distribution for the random variable daily wheat growth
- What is the distribution's expected value (mean) and median?
- Answer
-
As discussed above, the interval's midpoint \([0.5,1.25]\) produces both the expected value and the median in uniform distributions, \(\frac{0.5 + 1.25}{2}\) \(=0.875\) inches. When known within a context, units should be included where appropriate to bring meaning to reported values.
- Find the probability that a randomly selected wheat plant will grow at least \(1.0\) inch in a given day.
- Answer
-
After shading above the desired interval in our graphic, to find \(P \left(x \ge 1.0 \right),\) we see the area of the shaded region.
Figure \(\PageIndex{3}\): Finding \(P \left(x \ge 1.0 \right)\)
This shaded region within the uniform distribution is a rectangle with a base length of \(0.25\) inches and height with a density value of \(\frac{4}{3}\) \(\approx 1.3333,\) so \(P \left(x \ge 1.0 \right)\) \( \approx 0.25 \cdot 1.3333\) \( \approx 0.3333.\) We have a \(33.33\%\) probability of randomly selecting a wheat plant that will grow more than \(1\) inch in a given day.
- What proportion of wheat plants are expected to grow between \(0.6\) to \(0.75\) inches in a given day? In the uniform distribution, find \(P \left(0.6 \le x \le 0.75 \right).\)
- Answer
-
After shading above the desired interval in our graphic, to find \(P \left(0.6 \le x \le 0.75 \right),\) we see the area of the shaded region.
Figure \(\PageIndex{4}\): Finding \(P \left(0.6 \le x \le 0.75 \right)\)
This shaded region is a rectangle with base length of \(0.75 - 0.6\) \(= 0.15\) inches and height again with density value of \(\frac{4}{3}\) \(\approx 1.3333,\) so \(P \left(0.6 \le x \le 0.75 \right)\) \( \approx0.15 \cdot 1.3333\) \( = 0.2.\) This means \(20\%\) of wheat plants are expected to grow between \(0.6\) to \(0.75\) inches in a given day.
An oil company has data showing that an old oil field in central Kansas produces between \(1000\) and \(2500\) barrels of oil every day. Their data indicates the production distribution is triangular, with the most common daily production at \(1500\) barrels. Answer the following questions about this oil field:
- Sketch a graph of the probability distribution for the oil field's production, including appropriate labeling of both axes.
- Answer
-
Based on the information given and choosing to work in decimal representation of values, we build the probability distribution graph by placing a triangular shape above our random variable's horizontal axis over the interval \([1000, 2500]\) with the peak of the triangle occurring at \(1500.\) Also, we have horizontal line segments on the \(x\)-axis for all other real numbers on the scale. We recall that triangle area is found by \( \text{area} = \frac{\text{base} \cdot \text{height}}{2},\) so with basic algebraic manipulation we have \( \text{height} = \frac{\text{area} \cdot 2}{\text{base}}.\) Knowing that the total area under the distribution curve must equal \(1=100\%\) and that our non-zero probability interval has a width of \(2500 - 1000\) \(=1500,\) the highest point of the triangle-shaped distribution must satisfy\[\nonumber \text{height}= \frac{1 \cdot 2}{1500}=\frac{2}{1500}=\frac{1}{750}\approx 0.001333.\]With full labeling of the axes, we produce the following graph of the probability distribution.
Figure \(\PageIndex{5}\): Probability distribution for daily oil field production
- Find the probability that a randomly selected day will result in less than \(1500\) barrels of production.
- Answer
-
After shading above the desired interval in our graphic, to find \(P \left(x < 1500 \right),\) we see the area of the shaded region.
Figure \(\PageIndex{6}\): Finding \(P \left(x < 1500 \right)\)
The area of the shaded region is a simple triangle. This triangle has a base length of \(500\) barrels and height with a density value of \(\frac{1}{750}\) \(\approx 0.001333,\) so \(P \left(x < 1500 \right)\) \( \approx \frac{500 \cdot 0.001333}{2}\) \( \approx 0.3333.\) We have a \(33.33\%\) probability of randomly selecting a day in which this oil field will produce less than \(1500\) barrels.
- Find the probability that a randomly selected day will result in less than \(2000\) barrels of production.
- Answer
-
After shading above the desired interval in our graphic, to find \(P \left( x < 2000 \right),\) we see the area of the shaded region.
Figure \(\PageIndex{7}\): Finding \(P \left( x < 2000 \right)\)
The shaded region is not a triangle, but we might notice that the region is triangular from \(1000\) to \(1500.\) The rest of the region between \(1500\) and \(2000\) is a trapezoid. We can find the area of those two regions and add them together to get the total area.
We might approach this a bit easier by use of our complement rule on probabilities, that \(P(x < 2000)\) \(= 1 - P(x \ge 2000),\) and noticing the white region associated with \(P(x \ge 2000)\) is a simple right triangle. In that triangle, we see the base length of \(2500 - 2000\) \( = 500\) barrels. The height, however, is a bit more challenging to determine. This can be done in multiple ways (such as using the slope concept of lines with the graph scale). Let us develop the linear function for the line on the right side of the probability distribution to produce other density values if needed in future work. This also demonstrates that knowing the density function's mathematical formula can be helpful.
Using the point-slope approach, we note that the slope can be determined from the two points \((1500, \frac{1}{750})\) and \((2500,0).\)\[\nonumber \text{slope}=\frac{\frac{1}{750} - 0}{1500-2500} = -\frac{1}{750,000}\]Using our point slope-form of a line and choosing the point \((2500,0)\) to work with, we have the linear function\[ \nonumber y=-\frac{1}{750,000}(x - 2500) \]We see the density value is given as\[ \nonumber y=-\frac{1}{750,000}(2000 - 2500) =\frac{1}{1500}\approx 0.0006667.\]Now that we know the height of our white triangular region in our graphic, we can compute that triangle's area:\[\begin{align*} P(x < 2000) &= 1 - P(x \ge 2000) \\&=1 - \frac{500 \cdot \frac{1}{1500}}{2} \\&=1-\frac{1}{6} = \frac{5}{6} \approx 0.8333 \end{align*}\]We have an \(83.33\%\) probability of randomly selecting a day in which this oil field will produce less than \(2000\) barrels.
- What proportion of days will the oil field be expected to produce between \(1750\) and \(2000\) barrels? That is, find \(P \left(1750 \le x \le 2000 \right)\) in the triangular distribution.
- Answer
-
To find \(P \left(1750 \le x \le 2000 \right),\) we quickly shade our triangular distribution appropriately and notice the region is a trapezoid.
Figure \(\PageIndex{8}\): Finding \(P \left(1750 \le x \le 2000 \right)\)
To find the area of the trapezoid, we need both density values associated with productions of \(1750\) barrels and of \(2000\) barrels. Using our work from part \(3\) above, we know the density associated with \(2000\) barrels is \(\frac{1}{1500}\approx 0.0006667\) and, using our developed linear function, the density associated with \(1750\) barrels is \(-\frac{1}{750,000}(1750-2500)\) \( =-\frac{1}{750,000} \cdot -750\) \(=\frac{1}{1000}\) \(=0.001.\) These density values are the lengths of our parallel sides of the trapezoid and the width of the trapezoid is the interval width of \(2000-1750\) \( = 250.\) Our shaded region has trapezoidal area of:\[\begin{align*}\text{area of trapezoid} &= \frac{\frac{1}{1000} + \frac{1}{1500}}{2} \cdot 250 \\&= \frac{5}{24} \approx 0.2083 = 20.83\%. \end{align*}\]We can conclude that about \(20.83\%\) of days the oil field be expected to produce between \(1750\) and \(2000\) barrels.
The probability distribution for gauging measurement uncertainties (error size when taking measurements of objects) is sometimes modeled by a trapezoidal-shaped distribution. Suppose the following graph represents such a distribution.
Figure \(\PageIndex{9}\): Probability distribution for measurement bias
- Find the probability that a randomly selected value in this distribution is positive.
- Answer
-
To find the probability that a randomly selected value is positive. we shade the area under the probability density curve on the interval from \(0\) to \(4,\) and notice the region is a trapezoid.
Figure \(\PageIndex{10}\): Finding \(P(x>0)\)
The height of the trapezoid is \(0.20,\) and the base lengths are \(4\) and \(2.\) We thus find that the shaded area is \(\frac{4+2}{2}\cdot0.2\) \(=3\cdot0.20\) \(=0.60.\) We thus have the probability that a randomly selected value is positive is \(60\%.\)
- Find the probability that a randomly selected value in this distribution is at least \(2;\) that is, find \(P(x)\ge2.\)
- Answer
-
To find the probability that a randomly selected value is at least \(2,\) we shade the area under the probability density curve on the interval from \(2\) to \(4,\) and notice the region is a triangle.
Figure \(\PageIndex{11}\): Finding \(P(x\ge2)\)
The height of the triangle is \(0.20,\) and the base is \(2.\) We thus find that the shaded area is \(\frac{1}{2}\cdot2\cdot0.2\) \(=1\cdot0.20\) \(=0.20.\) We thus have the probability that a randomly selected value is at least \(2\) is \(20\%.\)
- Determine \(P(-1<x<1.5).\)
- Answer
-
To find \(P(-1<x<1.5),\) we shade the area under the probability density curve on the interval from \(-1\) to \(1.5,\) and notice the region is a rectangle.
Figure \(\PageIndex{12}\): Finding \(P(-1<x<1.5)\)
The height of the rectangle is \(0.20,\) and the base is \(1.5-(-1)\) \(=2.5.\) We thus find that the shaded area is \(2.5\cdot 0.20\) \(=0.50.\) We thus have \(P(-1<x<1.5)\) is \(50\%.\)
We continue to see how knowledge of geometric figures and creative geometric thinking on regions can help analyze the probability distributions of continuous random variables. Some distributions of continuous random variables have a more exciting and challenging distribution shape than those above. Let us examine some of those next.
Normal Distributions
One of the most commonly used probability distributions on continuous random variables is the normal distribution mentioned in Section \(2.7.\) A normal probability distribution, also called the Gaussian probability distribution, is a bell-shaped, perfectly symmetric probability density curve that is centered above a mean value and has the specific property that the two changes of concavity on the density curve (called inflection points) occur at exactly one-standard deviation from the center mean location with the horizontal scale. As shown in Figure \(\PageIndex{13}\) below, a normal distribution is located with a horizontal scale solely by the knowledge of the mean \(\mu\) value and the standard deviation \(\sigma.\)
Figure \(\PageIndex{13}\): General Normal Distribution
As is true of all valid probability distributions on continuous variables, the total area under the curve is equal to \(1=100\%.\) Usually, we do not label the probability density axis (the vertical axis), but we always scale the horizontal axis with our continuous random variable of interest. The term "normal" is used because this distribution has surprised statisticians and others with how often it is found in the analysis of random events and as the shape in many continuous variable distributions from data-based histograms.
Naturally, as different mean and standard deviation values occur in the many normally distributed random variables, there is a whole family of distributions called "normal probability distributions." In Figure \(\PageIndex{14}\) below, we see four different normal distributions. We should notice how each normal distribution is controlled by its mean and standard deviation. The mean locates the center of the bell-shaped curve on a given horizontal axis scale, and the standard deviation controls the spread/width of the curve on the same scale. We should notice how the height of the bell-shaped curve is larger with smaller standard deviations and smaller with larger standard deviations. This should make sense to us as we must maintain an area of \(1 = 100\%\) within the curve, so the curve's height must be associated with the spread. If sketching a distribution by hand, we will usually make the bell-shaped curve shape first, add a horizontal axis below the curve, and then scale that axis to meet the mean and standard deviation values required locations.
Figure \(\PageIndex{14}\): Four Different Normal Distributions
As mentioned in Section \(2.7,\) there is one special normal probability distribution called the standard normal or \(z\)-distribution; this refers to a specific normal distribution that has \(\mu = 0\) and \(\sigma = 1,\) producing the normal distribution curve shown in Figure \(\PageIndex{15}\) below.
Figure \(\PageIndex{15}\): The standard normal distribution
All the various random variables \(x\) that are normal probability distributions can be converted to the standard normal distribution through use of the \(z\)-score or standardization computation of\[z=\frac{x - \mu}{\sigma} \nonumber \]as discussed in Section \(2.7.\) This is illustrated in Figure \(\PageIndex{16},\) in which a normal distribution with \(\mu = 10\) and \(\sigma = 2\) has its raw \(x\)-axis also rescaled to the standard normal distribution scale.
Figure \(\PageIndex{16}\): Standardized scaling on a non-standard normal distribution
We can convert from \(z\)-score to raw scale value in normal distributions by solving our equation for \(x:\)\[\nonumber x=\mu + z\cdot\sigma.\]Let us now turn our focus on a quick review of the Empirical \(68-95-99.7\) Rule, from Section \(2.7.\) Our Empirical Rule gave us some approximate probability/area measures. As a reminder, we repeat the diagram in Figure \(\PageIndex{17}\) below:
Figure \(\PageIndex{17}\): Empirical Rule on Normal Distribution
Text Exercise \(2.7.7\) had us working with this diagram and finding the percentage of observations (i.e., probability of occurrence) in the normal distributions. We will not repeat that collection of exercise questions here but leave any review, as necessary, to you. It is common to use these approximate area values when working with normal distributions, provided we are interested in values (outcomes) tied to the mean and standard deviations.
As the Empirical Rule was based on specific intervals tied to integer multiples of standard deviations from the mean, the rule is a bit limiting for other general interval choices in which we might be interested. For example, suppose the weights of thirty-year-old men in Chicago are normally distributed with \(\mu\) \(= 190\) lbs. and \(\sigma\) \( = 11.2\) lbs. By our Empirical Rule, the probability of randomly selecting a thirty-year-old man in Chicago weighing between \(190 -2\cdot 11.2 = 167.6\) and \(190+2\cdot 11.2= 212.4\) lbs. is approximately \(95\%\) (or stated equivalently, about \(95\%\) of thirty-year-old men in Chicago weigh between \(167.6\) and \(212.4\) lbs.) But what if we were interested in the interval of weights between \(175\) and \(200\) lbs, as shown in Figure \(\PageIndex{18}\) below?
Figure \(\PageIndex{18}\): Normal distribution of the weights of thirty-year-old Chicago men
Our Empirical Rule does not apply to such varied intervals. However, if we could determine the area of this shaded region in this normal probability distribution, we could find the probability measure. This idea extends to any desired interval(s) we want to analyze. However, as the shaded region is not a basic geometric shape, as in our earlier work in this section, we cannot call on our knowledge of basic geometric formulas to find the area measures. We need other methods for handling such shaped regions. With this in mind, we will introduce technology-based methods in Section \(4.6\) for finding areas of any region(s) we want within any normal probability distribution. For now, we try some exercises to ensure we can sketch a described normal distribution or interpret key features of a given normal distribution graph.
- Create graphs of normal distributions that meet the given descriptions. Include labeling of the horizontal axis with appropriate scaling (in standard deviated units) and axis titles:
a. A soft drink bottler has data that suggest that the amount of drink actually placed in the cans by a specific bottling machine is normally distributed with \(\mu=12.1\) ounces and \(\sigma=0.5\) ounces (the machine is slightly over-filling on average from designed specifications.)
b. The average consumption of electricity by electric four-door passenger vehicles is believed to be normally distributed with \(\mu\) \( = 0.346\) kWh per mile with \(\sigma\) \( = 0.022\) kWh per mile, where kWh stands for kilowatt hour.
c. A tire company is about to begin large-scale manufacturing of a new tire made of newly developed materials. The tire's tread life has been tested, the research team found the tread life in miles produced a normal distribution with \(\mu\) \( = 72,000\) miles and \(\sigma\) \( = 7,000\) miles.
- Answer
-
- Given the key parameters of \(\mu=12.1\) ounces and \(\sigma=0.5\) ounces, we produce the following sketch, making sure to align our scale axis to this information by placing the value \(12.1\) directly below the peak of the normal distribution's PDF curve, and scaling out by values of \(0.5\) directly below the inflection points of our curve in order to incorporate the key spread measure of the standard deviation. Then keeping this distance consistent, we scale out farther left and right with more standard deviated units on the axis.
Figure \(\PageIndex{19}\): Normal distribution with \(\mu=12.1\) ounces and \(\sigma=0.5\) ounces
- Given the key parameters of \(\mu = 0.346\)kWh per mile with \(\sigma = 0.022\) kWh per mile, we produce the scaled normal distribution figure using the same approach as part a. directly above.
- Since \(\mu = 72,000\) miles and \(\sigma = 7,000\) miles, we produce the following sketch:
- For each of these normal distributions, give the mean \(\mu\) and the standard deviation \(\sigma\) of each graph.
I.II.
III.
Figure \(\PageIndex{22}\): Normal distributions with various means and standard deviations
- Answer
-
The location of the mean for each is the scale value directly below the highest point of the normal distribution. The standard deviation for each is the distance in the horizontal scale between the high point and either of the inflection points of the normal distribution. This produces the following results for each of the graphs.
- From the graphic of Distribution I, we see that \(\mu=-15 \) since the high point of the probability density curve is directly above that horizontal scale value. Also, since the labeled inflection points occur at a horizontal distance of \( | (-15) - (-12) |\) \(= | (-15) - (-18) |\) \(=3, \) then \(\sigma =3 .\)
- From the normal distribution figure, \(\mu=0.50 \) since the high point of the probability density curve is directly above that scale value. Also, since the labeled inflection points occur at a horizontal distance of \( | 0.50 - 0.53 |\) \(= | 0.50 - 0.47 |\) \(=0.03, \) then \(\sigma =0.03 .\)
- From the given normal distribution, \(\mu=252.3 \) since the high point of the probability density curve is directly above that value. Also, since the labeled inflection points occur at a horizontal distance of \( | 252.3 - 275.8 |\) \(= | 252.3 - 228.8 |\) \(=23.5, \) then \(\sigma =23.5.\)
- Which of these is the standard normal distribution and which are not.
I.II.
III.
Figure \(\PageIndex{23}\): Various probability distributions
- Answer
-
Distribution II is the standard normal distribution as the function is a bell-shaped symmetrical distribution with high point located above the horizontal axis value of \(0\) (implying a \(\mu=0\)) and inflection points at \(1\) unit away in terms of the horizontal axis scale (implying a standard deviation of \(\sigma=1.\)
Distribution I is a symmetrical distribution about \(0,\) however, the shape is triangular and not bell-shaped. So, Distribution I is not a normal distribution.
Distribution III is a bell-shaped symmetrical distribution with high point located above the horizontal axis value of \(0;\) however the inflection points at \(3\) units away on the horizontal axis scale are implying a standard deviation of \(\sigma=3.\) So although a normal distribution, the Distribution III is definitely not the standard normal distribution.
- Explain why each of the graphs below are not representative of a normal distribution.
I.II.
III.
Figure \(\PageIndex{24}\): Various graphs to be considered
- Answer
-
Distribution I is not a symmetrical distribution (though a bit bell-shaped.) Most would consider Distribution I a skewed right distribution, so not a normal distribution.
Distribution II is a nice symmetrical bell-shape, but is not a probability density function since some of the function values are below the horizontal axis. PDF values can only be non-negative. So Distribution II is not a normal distribution as all normal distributions are represented by valid PDFs.
Distribution III is neither symmetrical or bell-shaped, and hence not a normal distribution.
As we continue through the course, we will often find ourselves working with normal probability distributions to answer questions. But there are a few other distribution curves that we will find ourselves working with as well. We briefly examine two more such families of distributions next.
Other Distribution Families
Although normal distributions are arguably the most frequently examined and applied distribution in introductory statistics, we examine other families of probability distributions with important statistical applications. First, we will briefly discuss the Student's \(t\)-distributions.
These probability distributions are sometimes called \(t-\)distributions. As shown in Figure \(\PageIndex{25},\) these distributions initially appear to be much like the family of normal distributions since they are also a symmetric bell-shaped family of distributions and the total area under the density curves is \(1=100\%.\)
Figure \(\PageIndex{25}\): A Student's \(t\)-distribution with \(d.f.=10\)
All \(t\)-distributions are symmetric with a mean of \(\mu = 0,\) but their inflection points do not occur at one standard deviation from the mean as in normal distributions. The \(t\)-distributions have thicker tails than the standard normal distribution; meaning, a \(t\)-distribution is more likely to see an outcome far from the mean than a normal distribution. While normal distributions are defined by a given \(\mu\) and \(\sigma,\) the spread of the \(t\)-distributions is controlled by a value called a degree-of-freedom, \(d.f.\) In future sections, this value will be related to sample size, and the origin of the name will be clearer. As this degree of freedom value increases in size, the related \(t\)-distributions become more and more like the standard normal distribution. Figure \(\PageIndex{26}\) shows sketches of several \(t\)-distributions as well as the standard normal distribution for comparison.
Figure \(\PageIndex{26}\): Several \(t-\)distributions and the standard normal distribution
These \(t\)-distributions will become very important in our future work. For now, we understand that they are another particular family of probability distributions and that probability in the distribution can be determined by finding area measures of specified regions.
- Optional \(t\)-distribution discussion for the mathematically inclined
-
The behavior of \(t\)-distributions can be explained by the fact that they are defined using a rational function; whereas, normal curves are defined using an exponential function. For example, the formula for the probability density function of a \(t\)-distribution with \(1\) degree of freedom is given by\[\nonumber f(x)= \frac{1}{\pi(1+x^2)}.\] In general, if one has \(d\) degrees of freedom, the probability density function is given by \[\nonumber f(x)=\frac{c_d}{(1+x^2/d)^{(d+1)/2}}, \]where \(c_d\) is a suitable constant which makes the total area \(=1.\)
A third family of probability distributions common to beginning statistics analysis are the Chi-squared distributions (written in the Greek letter, \(\chi^{2}\)-distributions). This Greek letter \(\chi\) is pronounced as \(k\bar{i},\) similar to the pronunciation of the first two letters in the word kite. This family of distributions is not symmetrical but instead positively skewed in shape with non-zero density values and all domain values larger than \(0;\) see Figure \(\PageIndex{27}\) below.
Figure \(\PageIndex{27}\): A \(\chi^{2}\)-distribution with \(d.f.=5\)
Similar to the \(t\)-distributions, \(\chi^{2}\)-distributions are controlled by a degree of freedom value. In Figure \(\PageIndex{28}\) are sketches of several \(\chi^{2}\)-distributions. Notice in the figure that as the degree of freedom value increases in size, the distribution approaches a bell-shaped curve. Some interesting properties of these distributions are that the high point occurs two units in the scale before the degree of freedom value and that the expected value is its degree of freedom value.
Figure \(\PageIndex{28}\): Several \(\chi^2\)-distributions
As with the \(t-\)distributions, we will not formally delve into the \(\chi^{2}\) density functions mathematical formulas. It will be sufficient for us to understand the shape of these distributions and, with future work, recognize when they are to be used.
- Which of the following cannot possibly be a \(t\)-distribution? Explain.
I.
II.
III.
Figure \(\PageIndex{29}\): Various probability distributions
- Answer
-
Distribution I cannot be a \(t\)-distribution since it is a skewed distribution. All \(t\)-distributions are bell-shaped and symmetric about the horizontal scale value of \(0.\)
Distribution II, although it is bell-shaped and symmetrical, cannot be a \(t\)-distribution since the curve is symmetric about horizontal scale value of \(12\) instead of \(0.\)
Distribution III might be a \(t\)-distribution since the curve is bell-shaped and symmetric about the horizontal scale value of \(0.\) We do note that there are other probability density curves that might make a very similar shape and be positioned as shown in a scale axis. To know for sure, more information would be needed, such as several probability density values.
- The graph below gives three \(t\)-distributions and the standard normal distribution. Which of the \(t\)-distributions has the largest degree-of-freedom value? Explain.
Figure \(\PageIndex{30}\): Various \(t\)-distributions plotted with the standard normal distribution
- Answer
-
As the degree-of-freedom increases on \(t\)-distributions, the distributions begin to come very close to the standard normal distribution. Hence Distribution III must have the largest degree-of-freedom value. We also note that Distribution I must have the smallest degree-of-freedom value since the shape of the \(t\)-distribution is wider and shorter than the rest of the distributions shown.
- Which of the following are possible \(\chi^{2}\)-distributions? Estimate the degree of freedom value for any that appear to be possible \(\chi^{2}\)-distributions.
I.
II.
III.
Figure \(\PageIndex{31}\): Various probability distributions
- Answer
-
Distribution I can possibly be a \(\chi^{2}\)-distribution since the graph is a positively skewed distribution with only density measures tied to non-negative horizontal scale measures. Because the high-point on the curve occurs at a horizontal scale value of \(3,\) the degree-of-freedom value is \(3 +2\) \( = 5.\)
Distribution II cannot be a \(\chi^{2}\)-distribution since the graph is skewed negatively instead of positively.
Distribution III can possibly be a \(\chi^{2}\)-distribution since the graph is a positively skewed distribution with only density measures tied to non-negative horizontal scale measures. Because the high-point on the curve occurs at a horizontal scale value of \(16\), the degree-of-freedom value is \(16 +2\) \(= 18.\)
- What would be the most likely shape (uniform, normal, skewed right similar to \(\chi^{2}\) for each of the random variables described below?
a. Ages of coins in circulation
b. Birth weight of babies born in Hays during the time interval of \(2020-2023.\)
c. Position of one tire valve (in degrees) on vehicle wheel when the vehicle stops at various times in the day
d. IQ scores of all senior class students in the United States
e. Income of adult Kansas residents
- Answer
-
- Ages of coins in circulation would likely be a positively skewed distribution since there are many more coins of young ages, and very few coin of older ages.
- Birth weight of babies born in Hays during those years is likely to be a normal distribution, there will be a few light babies and a few heavy babies born, but most babies will be around the same weight as the average.
- The position of the tire valve on a vehicle wheel (as measured in degrees) is likely to be uniformly distributed. One position is just as likely as another in some random-length trip with the vehicle.
- IQ scores are likely to be normally distributed, there will be a few high IQ individuals and a few low IQ individuals, but most senior class students in Kansas will have close to the average IQ.
- Incomes are likely to be a positively skewed distribution. Incomes cannot be negative (in a normal meaning of income), and incomes will increase to some high point, before trailing off to those few making very high incomes.
There are many other families of distributions in statistics; however, our current list will be sufficient for an introductory course. We now return to the issue of finding area measures of regions in all these various probability distributions on continuous random variables.
Geometric Connections on Related Regions in Continuous Distributions
Before moving to a new section, we will explore how we can use basic geometric reasoning with the area of regions in all types of continuous random variable distributions to develop a general approach that works for various families of continuous probability distributions. In much of our future work with these various distributions, we need to tie any interval-designated region of interest to a left-region: regions that cover the left tail of our distribution. We must be discussing intervals that have the less than inequality in them: either \(x < a\) or \(x \le a\) where \(a\) is an outcome value for the random variable.
We will use what is given if our region of interest is already a left region. For example, suppose we are working with a specific \(t\)-distribution in which we need to find \(P(t < 1.2).\) Creating a sketch of our region as shown in Figure \(\PageIndex{32},\) we notice that our region completely covers the left tail of the distribution. In the next section, we will see how our computational technology can produce the area measure of only left-tail regions.
Figure \(\PageIndex{32}\): Left-tailed region in a probability distribution
But what if our region of interest is a right-tail region? For example, suppose we are working with some normal probability distribution in which we desire to find \(P(x \ge 25)\). Creating a sketch of our region as shown in Figure \(\PageIndex{33},\) we notice that our region covers the right tail of the distribution and not the left.
Figure \(\PageIndex{33}\): Right-tailed region in a probability distribution
Using our geometric reasoning and complement property on probabilities, we notice that the white region under the curve is the complement to the shaded region. The complement probability \(P(x < 25)\) is a left-tailed region. We can know \(P(x \ge 25)\) by relating to \(1.0000 - P(x < 25).\) Using this geometric reasoning, we can relate any right-tailed region to a left-tail region by applying our complement concept.
Some regions are neither left- or right-tailed regions. For example, suppose we are in a specific \(\chi^{2}\)-distribution and we need to find the probability value \(P(4 \le \chi^{2} \le 8).\) Creating a sketch of our region in Figure \(\PageIndex{34},\) we see that our region covers neither a left- or a right-tailed region. We have what is called a central or between region.
Figure \(\PageIndex{34}\): Between/Central region in a probability distribution
Using basic geometric reasoning, we can relate this region of interest to the left-tailed areas. Notice that the left region associated with the interval inequality \(\chi^{2} \le 8)\) covers our region of interest but also the undesired left-tailed region related to the inequality \( \chi^{2} \le 4 \). Suppose we remove the left region associated to \( \chi^{2} < 4 \) from the larger left region related to \(\chi^{2} \le 8,\) then all that remains is the original region of interest between \(4\) and \(8.\) See this illustrated in the related diagram below.
Figure \(\PageIndex{35}\): Visualization of a central region as the difference of two left regions
In terms of probability notation, we have\[ \nonumber P(4 \le \chi^{2} \le 6)=P(\chi^{2} \le 6)-P(\chi^{2} \le 4). \]
All of the above is summarized in Figure \(\PageIndex{36}\) (we note that although the figure contains only bell-shaped distributions, the shape of the specific continuous probability distribution does not change the general geometric reasoning).
Figure \(\PageIndex{36}\): Transforming any region into a related left-tail region
For any interval of interest, a right-region area can be found using a complement concept: the total area of \(1\) minus the related complement left-region area. A between-region area can be found by the total left-region area from the right boundary of the interval minus the total left-region area from the left boundary of the interval. Stated in symbolic representation:\[ \nonumber \begin{align*} &P(x > b)= 1.0000 - P(x \le b) \\ &P(a < x < b)=P(x<b) - P(x \le a). \end{align*}\]Once we fully grasp how to relate any region of interest in a continuous variable's probability distribution to only left-region measures, we should be able to reason similarly if only right-region area measures are available. In this chapter's next section, we will introduce technology that computes only left-region area measures.
Describe how the area of the shaded region in each of the given probability distributions can be expressed in terms of left-tail region(s). If the shaded region is already of left-tailed type, state so.
- \(P(x \ge 12) \) in the distribution
Figure \(\PageIndex{37}\): Probability distribution with shaded region
- Answer
-
Since this is a right-tailed region, we use the complement approach with a left-tailed region:\[ \nonumber P(x\ge12)= 1.0000 - P(x<12). \]
- \(P(x < 0.45)\) in the distribution
Figure \(\PageIndex{38}\): Probability distribution with shaded region
- Answer
-
Since this is already a left-tailed region, no change is needed:\[ \nonumber P(x<7) = P(x<7).\]
- \(P(-3 < x \le 6)\) in the distribution
Figure \(\PageIndex{39}\): Probability distribution with shaded region
- Answer
-
Since this is a central region, we use subtraction between two left-tailed regions:\[ \nonumber P(-3<x\le6)= P(x<6) - P(x\le-3). \]
- \(P(x >16)\) in the distribution
Figure \(\PageIndex{40}\): Probability distribution with shaded region
- Answer
-
Since this is a right-tailed region, we use the complement approach:\[ \nonumber P(x>16)= 1.0000 - P(x\le16). \]
- \(P(6 < x < 16)\) in the distribution
Figure \(\PageIndex{41}\): Probability distribution with shaded region
- Answer
-
Since this is a central region, we use subtraction between two left-tailed regions:\[ \nonumber P(-8<x\le10)= P(x<10) - P(x\le-8). \]
Summary
This section introduced several different families of continuous random variable probability distributions. We used geometric area formulas to determine the probability of outcomes for some random variables. We also looked into other notable families of continuous random variable probability distributions, such as the normal and \(t\)-distributions. Finally, we examined geometric region relationships on these distributions and how any interval region can be considered within only left-tail region(s).
In the next section, we focus on using these area relationships with technology-based cumulative distribution functions in normal distributions. These special area accumulation functions will provide accurate area measures of regions in these distributions.