3.5: Formulas for Chapter 3
- Page ID
- 58259
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Formulas for Chapter 3
Measures of Central Tendency
Mean of a Population
The mean of a population is the average of all the values in an entire population. It is found by adding all the population data values together and dividing by the total number of values in the population. This measure is used to describe the central value of a complete data set.
\( \mu = \dfrac{\sum x}{N} \)
Where:
- \(\mu\) is the population mean
- \(\sum x\) is the sum of all values in the population
- \(N\) is the number of values in the population
Mean of a Sample
The mean of raw data is calculated by adding all the data values and dividing by the total number of values.
\( \bar{x} = \dfrac{\sum x}{n} \)
Where:
- \(\bar{x}\) = the ,mean.
- \(\sum x\) = the sum of all data values
- \(n\) = the number of values.
Mean for Grouped Data
The mean of grouped data is an estimate of the average for data that has been organized into frequency groups or classes. It is calculated by multiplying each class midpoint by its frequency, adding those products, and dividing by the total frequency.
\( \bar{x} = \dfrac{\sum f \cdot X_M}{n} \)
Where:
- \( \bar{x} \) is the estimated mean
- \( f \) is the frequency of each class
- \( X _M\) is the midpoint of each class
- \( \sum f \cdot x \) is the sum of the products of midpoints and frequencies
- \(n = \sum f \) is the total number of data values
Weighted Mean
The mean for weighted data is a type of average where each data value is assigned a weight based on its importance or frequency. It is calculated by multiplying each value by its weight, adding those products, and then dividing by the total of the weights.
\( \bar{x} = \dfrac{\sum w \cdot x}{\sum w} \)
Where:
- \( \bar{x} \) is the weighted mean
- \( x \) is each data value
- \( w \) is the weight of each value
- \( \sum w \cdot x \) is the sum of the weighted values
- \( n = \sum w \) is the sum of the weights
Median for Sample Data
The median, denoted as MD, is the middle value in a data set when the numbers are arranged in order from smallest to largest. To find it, first sort the data. If there is an odd number of values, the median is the value at the midpoint. If there is an even number of values, the median is the average of the two numbers at the midpoint. The median is useful because it represents the center of the data and is not affected by extreme values.
Midrange for Sample Data
The midrange, denoted as MR, is the value halfway between the smallest and largest numbers in a data set. It is found by adding the minimum and maximum values and dividing the result by 2. The midrange gives a quick sense of the central location of the data and is easy to calculate.
\( MR = \dfrac{\text{Minimum value} + \text{Maximum value}}{2} \)
Mode for Sample Data
The mode is the value that appears most often in a data set. It shows which item occurs the most and is useful for identifying the most common or popular value. A data set can have one mode, more than one, or none at all if no number repeats.
Measures of Variation
Range of Sample Data
The range is a simple measure of variation that shows the difference between the highest and lowest values in a data set. It gives a quick sense of how spread out the data is, but does not account for how values are distributed between the extremes.
\( \text{Range} = \text{Maximum value} - \text{Minimum value} \)
Variance for Sample Data: Traditional Formula
The sample variance for raw data is a measure of how much the values in a sample differ from the sample mean. It represents the average of the squared differences between each value and the mean and is used to understand the variability within a sample.
\( s^2 = \dfrac{\sum (x - \bar{x})^2}{n - 1} \)
Where:
- \( s^2 \) is the sample variance
- \( x \) is each data value
- \( \bar{x} \) is the sample mean
- \( n \) is the number of data values
- \( \sum (x - \bar{x})^2 \) is the sum of squared differences from the mean
Variance for Sample Data: Shortcut Formula
The sample variance using the shortcut formula is an alternative method to calculate variance without first computing the mean. It is useful when working with raw data, especially large data sets, as it simplifies calculations by using the sum of the values and the sum of the squared values.
\( s^2 = \dfrac{n \sum x^2 - (\sum x)^2}{n(n - 1)} \)
Where:
- \( s^2 \) is the sample variance
- \( x \) is each data value
- \( n \) is the number of data values
- \( \sum x \) is the sum of all data values
- \( \sum x^2 \) is the sum of the squares of the data values
Variance for Grouped Data
The grouped variance is an estimate of the variance for data that has been organized into frequency groups or classes. It measures how spread out the data is by using the class midpoints and their frequencies to approximate the average squared deviation from the mean.
\( s^2 = \dfrac{n \sum f \cdot (X_M)^2 - (\sum f \cdot X_M)^2}{n(n - 1)} \)
Where:
- \( s^2 \) is the sample variance
- \( f \) is the frequency of each class
- \( X_M \) is the midpoint of each class
- \( \sum fx \) is the sum of the products of frequency and midpoint
- \( \sum fx^2 \) is the sum of the products of frequency and the square of the midpoint
- \( n \) is the total number of data values
Standard Deviation for Sample and Grouped Data
The standard deviation is a measure of how spread out the values in a data set are around the mean. It shows the average distance of each data point from the mean and is useful for understanding the consistency or variability of the data. It is calculated by taking the square root of the variance. This applies whether the variance is from a population, a sample, or grouped data.
\(\text{Standard Deviation} = \sqrt{\text{Variance}}\)
Where:
- \( \sigma = \sqrt{\sigma^2} \) for population standard deviation
- \( s = \sqrt{s^2} \) for sample or grouped sample standard deviation
Standard Deviation of the Population
The population standard deviation is a measure of how spread out the values in an entire population are from the population mean. It shows the average distance of each data point from the mean and is used to understand variability in a full data set.
\( \sigma = \sqrt{\dfrac{\sum (x - \mu)^2}{N}} \)
Where:
- \( \sigma \) is the population standard deviation
- \( x \) is each data value
- \( \mu \) is the population mean
- \( N \) is the number of values in the population
- \( \sum (x - \mu)^2 \) is the sum of squared differences from the mean
Measures of Position
Z-Score
A z-score tells you how many standard deviations a data value is from the mean. It helps compare values from different data sets or identify how unusual a value is within a distribution. A positive z-score means the value is above the mean, while a negative z-score means it is below the mean.
\( z = \dfrac{x - \bar{x}}{s} \)
- \( z \) is the z-score
- \( x \) is the data value
- \( \bar{x} \) is the sample mean
- \( s \) is the sample standard deviation
Percentiles
Percentiles are values that divide a data set into 100 equal parts, showing the relative standing of a value within the data. Each percentile indicates the percentage of data that falls below it. For example, the 70th percentile means that 70% of the data values are less than or equal to that point.
Quartiles
Quartiles divide a data set into four equal parts after the values have been arranged in order from least to greatest. The three quartiles mark the boundaries of these parts. The first quartile, or Q1, is the value that separates the lowest 25% of the data from the rest. The second quartile, or Q2, is the median, which splits the data in half. The third quartile, or Q3, marks the point where 75% of the data falls below it. To find the quartiles, you first sort the data set. Then, find the median (Q2), and use the lower half of the data to find Q1 and the upper half to find Q3.
Interquartile Range
The interquartile range (IQR) is a measure of statistical spread that shows the range of the middle 50% of a data set. It is calculated by subtracting the first quartile (Q1) from the third quartile (Q3). The IQR helps identify the spread of the central portion of the data and is useful for detecting outliers.
\( IQR = Q_3 - Q_1 \)
Authors
"3.5: Formulas for Chapter 3" by Toros Berberyan, Tracy Nguyen, and Alfie Swan is licensed under CC BY 4.0


