13: Linear Regression and Correlation
- Page ID
- 4630
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)- 13.0: Introduction to Linear Regression and Correlation
- This page emphasizes the importance of correlation and regression analysis in understanding relationships between numeric variables, such as education, experience, and income. It highlights the significance of predictive models in fields like economics and political science for data-driven decision-making.
- 13.1: The Correlation Coefficient r
- This page explains univariate, bivariate, and multivariate data types, with a focus on bivariate data analysis using time series, cross-section, and panel data. It defines the correlation coefficient, which measures the strength and direction of linear relationships between two variables, ranging from -1 to 1. It clarifies that correlation does not imply causation and emphasizes the use of software for complex analyses along with the importance of visualizing relationships through scatter plots.
- 13.2: Testing the Significance of the Correlation Coefficient
- This page discusses the correlation coefficient \(r\), which measures the linear relationship between two variables. It differentiates between the sample correlation coefficient and the population correlation coefficient \(ρ\), along with hypothesis testing to assess the significance of \(ρ\). The null hypothesis asserts no significant relationship, while the alternative suggests otherwise.
- 13.3: Linear Equations
- This page explains linear regression for two variables, featuring the equation \(y=a+bx\), where \(a\) is the y-intercept and \(b\) is the slope. The independent variable \(x\) affects the dependent variable \(y\), with the slope reflecting the line's steepness and the y-intercept showing the value of \(y\) when \(x=0\). Examples illustrate the quantitative expression of relationships, such as costs based on hours worked and earnings per tutoring session.
- 13.4: The Regression Equation
- This page covers regression analysis, focusing on the estimation of variable dependence through linear models. It details the ordinary least squares (OLS) method, including residuals and the significance of error variance for hypothesis testing. Key concepts like multicollinearity, which complicates the isolation of independent variable effects, are discussed, along with the importance of the multiple correlation coefficient \(R^2\).
- 13.5: Interpretation of Regression Coefficients- Elasticity and Logarithmic Transformation
- This page discusses Ordinary Least Squares (OLS) regression and the concept of elasticity, which measures responsiveness to market changes like price or income. It outlines various estimations, including four cases: 1) the impact of unit changes in \(X\) on \(Y\); 2) a semi-log approach for percentage changes; 3) assessing unit changes in \(Y\) from percentage changes in \(X\); and 4) a log-log case for direct elasticity estimates.
- 13.6: Predicting with a Regression Equation
- This page discusses the importance of estimated regression equations for predicting the impact of independent variables on a dependent variable, essential for policy-making. The Gauss-Markov theorem ensures unbiased point estimates with minimum variance. It distinguishes between confidence intervals for estimating mean impacts across experiments and prediction intervals for single outcomes, noting that their reliability decreases for predictions outside the data range.
- 13.7: How to Use Microsoft Excel® for Regression Analysis
- This page details the development of regression analysis, highlighting its integration with Microsoft Excel for practical application. It explains how to use the Analysis ToolPak for data setup and regression execution, using a demand curve for roses as an example. Key outputs, including R-square and hypothesis testing, are discussed to assess variable relationships and model validity.
- 13.8: Linear Regression - Income and Assets (Worksheet)
- A statistics Worksheet: The student will calculate a 90% confidence interval using the given data. The student will determine the relationship between the confidence level and the percentage of constructed intervals that contain the population mean.
- 13.9: Key Terms
- This page defines key statistical terms related to linear modeling, including symbols for Y-Intercept ('a') and Slope ('b'). It explains bivariate and multivariate models, measures like the Coefficient of Determination (\(R^2\)), which assesses the explanatory power of independent variables, and the Correlation Coefficient (R), reflecting the strength of relationships. Additionally, it covers concepts such as residuals, predicted values, and error calculations.
- 13.10: Chapter Review
- This page discusses linear equations and regression analysis, detailing how linear equations represent variable relationships (y = mx + b) with slope and y-intercept. Regression analysis models these relationships, assuming linearity, while nonlinear relationships can be approximated through transformations (e.g., double logarithmic or quadratic). The text highlights the applicability and significance of regression techniques in data understanding.
- 13.11: Practice
- This page explores the correlation coefficient (r) in statistics, its calculation, interpretation, and implications for hypothesizing about correlations and regressions. It emphasizes the difference between correlation and causation, and the significance of sample characteristics and data scatter on regression accuracy.
- 13.12: Solutions
- This page explains correlation coefficients and regression analysis, detailing their interpretations and the concept that correlation does not imply causation. It introduces the coefficient of determination and provides examples of t-tests in regression. The text highlights the importance of controlling for other variables when interpreting relationships and discusses various statistical principles and calculations related to these concepts.
Curated and edited by Kristin Kuter | Saint Mary's College, Notre Dame, IN