A scatter plot should be checked for outliers. An outlier is a point that seems out of place when compared with the other points. Some of these points can affect the equation of the regression line.
Should linear regression be used with this data set?
A regression analysis for the data set was run on Excel.
If we test for a significant correlation:
\(H_{0}: \rho = 0\)
\(H_{1}: \rho \neq 0\)
The correlation is \(r = 0.844\) and the p-value is 0.002, which is less than \(\alpha\) = 0.05, so we would reject \(H_{0}\) and conclude there is a significant relationship between \(x\) and \(y\).
However, if look at the scatterplot in Figure 12-18, with the regression equation we can clearly see that the point \((8,8)\) is an outlier. The outlier is pulling the slope up towards the point \((8,8)\).
If we were to take out the outlier point \((8,8)\) and run the regression analysis again on the modified data set we get the following Excel output.
See Figure 12-19: note the correlation is now 0 and the p-value is 1, so there is no relationship at all between \(x\) and \(y\).
This type of outlier is called a leverage point. Leverage points are positioned far away from the main cluster of data points on the \(x\)-axis.
There is another type of outlier called an influential point. Influential points are positioned far away from the main cluster of data points on the \(y\)-axis. There is an option in most software packages to get the “standardized” residuals. Standardized residuals are z-scores of the residuals. Any standardized residual that is not between \(-2\) and \(2\) may be an outlier. If it is not between \(-3\) and \(3\) then the point is an outlier. When this happens, the points are called influential points or influential observations.
Use technology to compute the standardized residuals. Should linear regression be used with this data set?
A regression analysis for the given data set was run on Excel, producing the following results:
The point \((2, 10)\) shown in Figure 12-20 is pulling the left side of the line up and away from the points that form a line. This influential point changes the \(y\)-intercept and slope.