Skip to main content
Statistics LibreTexts

10.7: Formulas for Chapter 10

  • Page ID
    56560
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Formulas for Chapter 10

    Correlation Coefficient

    The correlation coefficient, commonly denoted as \(r\), is a statistical measure that describes the strength and direction of a linear relationship between two quantitative variables. Its value ranges from -1 to 1, where values close to 1 indicate a strong positive linear relationship, values close to -1 indicate a strong negative linear relationship, and values near 0 suggest little to no linear correlation.

    \(r = \dfrac{n\sum xy - \sum x \sum y}{\sqrt{ \left( n\sum x^2 - (\sum x)^2 \right)\left( n\sum y^2 - (\sum y)^2 \right) }}\)

    Where:

    • \( r \) is the correlation coefficient
    • \( n \) is the number of paired data values
    • \( \sum xy \) is the sum of the products of each x and y value
    • \( \sum x \) is the sum of all x values
    • \( \sum y \) is the sum of all y values
    • \( \sum x^2 \) is the sum of the squares of all x values
    • \( \sum y^2 \) is the sum of the squares of all y values

    Test Statistic for the t-Test of Correlation

    The t-test for the correlation coefficient is used to evaluate whether the observed sample correlation reflects a true linear relationship in the population. It tests the null hypothesis that the population correlation \(\rho = 0\), indicating no linear relationship. The formula below shows \(r\) multiplied by the square root of the ratio involving the sample size.

    \(t = r \cdot \sqrt{\dfrac{n - 2}{1 - r^2}}\)

    • \( t \) is the test statistic for the correlation
    • \( r \) is the sample correlation coefficient
    • \( n \) is the number of paired data points
    • \( n - 2 \) is the degrees of freedom for the t-distribution
    • \( 1 - r^2 \) is the unexplained variation (proportion of variability not explained by the linear relationship)

    Line of Regression and Regression Coefficients

    The line of regression, also called the least squares regression line, is a linear equation that best fits a set of data by minimizing the sum of the squared differences between the observed values and the predicted values. It is used to describe the relationship between an independent variable \(x\)and a dependent variable \(y\). The equation of the regression line is displayed below.

    \(y' = a + bx\)

    Where \(y'\) is the predicted value of \(y\) for a given \(x\), \(a\) is the y-intercept, and \(b\) is the slope of the line.

    The formulas for calculating the regression coefficients are provided below.

    \(b = \dfrac{n\sum xy - \sum x \sum y}{n\sum x^2 - (\sum x)^2}\)

    \(a = \dfrac{\sum y \cdot \sum x^2 - \sum x \cdot \sum xy}{n \sum x^2 - (\sum x)^2}\)

    Where:

    • \( y' \) is the predicted value of the dependent variable
    • \( a \) is the y-intercept of the regression line
    • \( b \) is the slope of the regression line
    • \( x \) is the independent variable
    • \( n \) is the number of data points
    • \( \sum xy \) is the sum of the products of each x and y value
    • \( \sum x \) is the sum of the x values
    • \( \sum y \) is the sum of the y values
    • \( \sum x^2 \) is the sum of the squares of the x values

    Coefficient of Determination

    The coefficient of determination, denoted as \(r^2\), represents the proportion of the variance in the dependent variable that is predictable from the independent variable. It provides a measure of how well the regression line fits the data. A value of \(r^2\) closer to 1 indicates that a greater proportion of variation in the dependent variable is explained by the model, while a value closer to 0 suggests that the model explains very little of the variation.

    Where:

    • \( r^2 \) is the coefficient of determination
    • \( r \) is the correlation coefficient
    • \( r^2 \) represents the proportion of variation in the dependent variable explained by the linear model
    • The value of \( r^2 \) ranges from 0 to 1

    Standard Error of the Estimate

    The standard error of the estimate, denoted as \(S_{\text{est}}\), measures the typical distance that the actual data points fall from the predicted values on the regression line. It indicates how well the regression model predicts the dependent variable. A smaller standard error means the model provides a better fit to the data.

    \(S_{\text{est}} = \sqrt{ \dfrac{ \sum y^2 - a \sum y - b \sum xy }{n - 2} }\)

    Where:

    • \( S_{\text{est}} \) is the standard error of the estimate
    • \( \sum y^2 \) is the sum of the squares of the observed \( y \)-values
    • \( \sum y \) is the sum of the observed \( y \)-values
    • \( \sum xy \) is the sum of the products of the corresponding \( x \) and \( y \) values
    • \( a \) is the y-intercept of the regression line
    • \( b \) is the slope of the regression line
    • \( n \) is the number of data points

    This page titled 10.7: Formulas for Chapter 10 is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by Toros Berberyan, Tracy Nguyen, and Alfie Swan.