Skip to main content
Statistics LibreTexts

3.5: Formulas for Chapter 3

  • Page ID
    58259
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Formulas for Chapter 3

    Measures of Central Tendency

    Mean of a Population

    The mean of a population is the average of all the values in an entire population. It is found by adding all the population data values together and dividing by the total number of values in the population. This measure is used to describe the central value of a complete data set.

    \( \mu = \dfrac{\sum x}{N} \)

    Where:

    • \(\mu\) is the population mean
    • \(\sum x\) is the sum of all values in the population
    • \(N\) is the number of values in the population

    Mean of a Sample

    The mean of raw data is calculated by adding all the data values and dividing by the total number of values.

    \( \bar{x} = \dfrac{\sum x}{n} \)

    Where:

    • \(\bar{x}\) = the ,mean.
    • \(\sum x\) = the sum of all data values
    • \(n\) = the number of values.

    Mean for Grouped Data

    The mean of grouped data is an estimate of the average for data that has been organized into frequency groups or classes. It is calculated by multiplying each class midpoint by its frequency, adding those products, and dividing by the total frequency.

    \( \bar{x} = \dfrac{\sum f \cdot X_M}{n} \)

    Where:

    • \( \bar{x} \) is the estimated mean
    • \( f \) is the frequency of each class
    • \( X _M\) is the midpoint of each class
    • \( \sum f \cdot x \) is the sum of the products of midpoints and frequencies
    • \(n = \sum f \) is the total number of data values

    Weighted Mean

    The mean for weighted data is a type of average where each data value is assigned a weight based on its importance or frequency. It is calculated by multiplying each value by its weight, adding those products, and then dividing by the total of the weights.

    \( \bar{x} = \dfrac{\sum w \cdot x}{\sum w} \)

    Where:

    • \( \bar{x} \) is the weighted mean
    • \( x \) is each data value
    • \( w \) is the weight of each value
    • \( \sum w \cdot x \) is the sum of the weighted values
    • \( n = \sum w \) is the sum of the weights

    Median for Sample Data

    The median, denoted as MD, is the middle value in a data set when the numbers are arranged in order from smallest to largest. To find it, first sort the data. If there is an odd number of values, the median is the value at the midpoint. If there is an even number of values, the median is the average of the two numbers at the midpoint. The median is useful because it represents the center of the data and is not affected by extreme values.

    Midrange for Sample Data

    The midrange, denoted as MR, is the value halfway between the smallest and largest numbers in a data set. It is found by adding the minimum and maximum values and dividing the result by 2. The midrange gives a quick sense of the central location of the data and is easy to calculate.

    \( MR = \dfrac{\text{Minimum value} + \text{Maximum value}}{2} \)

    Mode for Sample Data

    The mode is the value that appears most often in a data set. It shows which item occurs the most and is useful for identifying the most common or popular value. A data set can have one mode, more than one, or none at all if no number repeats.

    Measures of Variation

    Range of Sample Data

    The range is a simple measure of variation that shows the difference between the highest and lowest values in a data set. It gives a quick sense of how spread out the data is, but does not account for how values are distributed between the extremes.

    \( \text{Range} = \text{Maximum value} - \text{Minimum value} \)

    Variance for Sample Data: Traditional Formula

    The sample variance for raw data is a measure of how much the values in a sample differ from the sample mean. It represents the average of the squared differences between each value and the mean and is used to understand the variability within a sample.

    \( s^2 = \dfrac{\sum (x - \bar{x})^2}{n - 1} \)

    Where:

    • \( s^2 \) is the sample variance
    • \( x \) is each data value
    • \( \bar{x} \) is the sample mean
    • \( n \) is the number of data values
    • \( \sum (x - \bar{x})^2 \) is the sum of squared differences from the mean

    Variance for Sample Data: Shortcut Formula

    The sample variance using the shortcut formula is an alternative method to calculate variance without first computing the mean. It is useful when working with raw data, especially large data sets, as it simplifies calculations by using the sum of the values and the sum of the squared values.

    \( s^2 = \dfrac{n \sum x^2 - (\sum x)^2}{n(n - 1)} \)

    Where:

    • \( s^2 \) is the sample variance
    • \( x \) is each data value
    • \( n \) is the number of data values
    • \( \sum x \) is the sum of all data values
    • \( \sum x^2 \) is the sum of the squares of the data values

    Variance for Grouped Data

    The grouped variance is an estimate of the variance for data that has been organized into frequency groups or classes. It measures how spread out the data is by using the class midpoints and their frequencies to approximate the average squared deviation from the mean.

    \( s^2 = \dfrac{n \sum f \cdot (X_M)^2 - (\sum f \cdot X_M)^2}{n(n - 1)} \)

    Where:

    • \( s^2 \) is the sample variance
    • \( f \) is the frequency of each class
    • \( X_M \) is the midpoint of each class
    • \( \sum fx \) is the sum of the products of frequency and midpoint
    • \( \sum fx^2 \) is the sum of the products of frequency and the square of the midpoint
    • \( n \) is the total number of data values

    Standard Deviation for Sample and Grouped Data

    The standard deviation is a measure of how spread out the values in a data set are around the mean. It shows the average distance of each data point from the mean and is useful for understanding the consistency or variability of the data. It is calculated by taking the square root of the variance. This applies whether the variance is from a population, a sample, or grouped data.

    \(\text{Standard Deviation} = \sqrt{\text{Variance}}\)

    Where:

    • \( \sigma = \sqrt{\sigma^2} \) for population standard deviation
    • \( s = \sqrt{s^2} \) for sample or grouped sample standard deviation

    Standard Deviation of the Population

    The population standard deviation is a measure of how spread out the values in an entire population are from the population mean. It shows the average distance of each data point from the mean and is used to understand variability in a full data set.

    \( \sigma = \sqrt{\dfrac{\sum (x - \mu)^2}{N}} \)

    Where:

    • \( \sigma \) is the population standard deviation
    • \( x \) is each data value
    • \( \mu \) is the population mean
    • \( N \) is the number of values in the population
    • \( \sum (x - \mu)^2 \) is the sum of squared differences from the mean

    Measures of Position

    Z-Score

    A z-score tells you how many standard deviations a data value is from the mean. It helps compare values from different data sets or identify how unusual a value is within a distribution. A positive z-score means the value is above the mean, while a negative z-score means it is below the mean.

    \( z = \dfrac{x - \bar{x}}{s} \)

    • \( z \) is the z-score
    • \( x \) is the data value
    • \( \bar{x} \) is the sample mean
    • \( s \) is the sample standard deviation

    Percentiles

    Percentiles are values that divide a data set into 100 equal parts, showing the relative standing of a value within the data. Each percentile indicates the percentage of data that falls below it. For example, the 70th percentile means that 70% of the data values are less than or equal to that point.

    Quartiles

    Quartiles divide a data set into four equal parts after the values have been arranged in order from least to greatest. The three quartiles mark the boundaries of these parts. The first quartile, or Q1, is the value that separates the lowest 25% of the data from the rest. The second quartile, or Q2, is the median, which splits the data in half. The third quartile, or Q3, marks the point where 75% of the data falls below it. To find the quartiles, you first sort the data set. Then, find the median (Q2), and use the lower half of the data to find Q1 and the upper half to find Q3.

    Interquartile Range

    The interquartile range (IQR) is a measure of statistical spread that shows the range of the middle 50% of a data set. It is calculated by subtracting the first quartile (Q1) from the third quartile (Q3). The IQR helps identify the spread of the central portion of the data and is useful for detecting outliers.

    \( IQR = Q_3 - Q_1 \)


    This page titled 3.5: Formulas for Chapter 3 is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by Toros Berberyan, Tracy Nguyen, and Alfie Swan.

    • Was this article helpful?