8.1: Minimizing Error using Derivatives

Last updated
Save as PDF

Page ID: 7239

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\dsum}{\displaystyle\sum\limits} \)

\( \newcommand{\dint}{\displaystyle\int\limits} \)

\( \newcommand{\dlim}{\displaystyle\lim\limits} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\(\newcommand{\longvect}{\overrightarrow}\)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

In calculus, the derivative is a measure of the slope of any function of x, or f(x)f(x), at each given value of xx. For the function f(x)f(x), the derivative is denoted as f′(x)f′(x) or, pronounced as “f prime x”. Because the formula for ∑ϵ2∑ϵ2 is known and can be treated as a function, the derivative of that function permits the calculation of the change in the sum of the squared error over each possible value of ^αα^ and ^ββ^. For that reason, we need to find the derivative for ∑ϵ2∑ϵ2 with respect to changes in ^αα^ and ^ββ^. That, in turn, will permit us to “derive” the values of ^αα^ and ^ββ^ that result in the lowest possible ∑ϵ2∑ϵ2.

Look – we understand that this all sounds complicated. But it’s not all that complicated. In this chapter, we will walk through all the steps so you’ll see that it's really rather simple and, well, elegant. You will see that differential calculus (the kind of calculus that is concerned with rates of change) is built on a set of clearly defined rules for finding the derivative for any function f(x)f(x). It’s like solving a puzzle. The next section outlines these rules, so we can start solving puzzles.

8.1.1 Rules of Derivation

Derivative Rules

Power Rule
Constant Rule
A Constant Times a Function
Differentiating a Sum
Product Rule
Quotient Rule
Chain Rule

The following sections provide examples of the application of each rule.

Rule 1: The Power Rule

Example:f(x)=x6f′(x)=6∗x6−1=6x5f(x)=x6f′(x)=6∗x6−1=6x5

A second example can be plotted in R. The function is f(x)=x2f(x)=x2 and therefore, using the power rule, the derivative is: f′(x)=2xf′(x)=2x.

x <- c(-5:5)
x

##  [1] -5 -4 -3 -2 -1  0  1  2  3  4  5

y <- x^2
y

##  [1] 25 16  9  4  1  0  1  4  9 16 25

plot(x,y, type="o", pch=19)

Figure \(\PageIndex{1}\): Calculating Slopes for (x,y)(x,y) Pairs

Rule 2: The Constant Rule

Example:f(x)=346f′(x)=0=10xf(x)=346f′(x)=0=10x

Rule 3: A Constant Times a Function

Example:f(x)=5x2f′(x)=5∗2x2−1=10xf(x)=5x2f′(x)=5∗2x2−1=10x

Rule 4: Differentiating a Sum

Example:
f(x)=4x2+32xf′(x)=(4x2)′+(32x)′=4∗2x2−1+32=8x+32f(x)=4x2+32xf′(x)=(4x2)′+(32x)′=4∗2x2−1+32=8x+32

Rule 5: The Product Rule

Example:f(x)=x3(x−5)f′(x)=(x3)′(x−5)+(x3)(x−5)′=3x2(x−5)+(x3)∗1=3x3−15x2+x3=4x3−15x2f(x)=x3(x−5)f′(x)=(x3)′(x−5)+(x3)(x−5)′=3x2(x−5)+(x3)∗1=3x3−15x2+x3=4x3−15x2

In a second example, the product rule is applied to the function y=f(x)=x2−6x+5y=f(x)=x2−6x+5. The derivative of this function is f′(x)=2x−6f′(x)=2x−6. This function can be plotted in R.

x <- c(-1:7)
x

## [1] -1  0  1  2  3  4  5  6  7

y <- x^2-6*x+5
y

## [1] 12  5  0 -3 -4 -3  0  5 12

plot(x,y, type="o", pch=19)
abline(h=0,v=0)

Figure \(\PageIndex{2}\): Plot of Function y=f(x)=x2−6x+5y=f(x)=x2−6x+5

We can also use the derivative and R to calculate the slope for each value of XX.

b <- 2*x-6
b

## [1] -8 -6 -4 -2  0  2  4  6  8

The values for XX, which are shown in Figure \(\PageIndex{2}\), range from -8 to +8 and return derivatives (slopes at a point) ranging from -25 to +25.

Rule 6: the Quotient Rule

Example:f(x)=xx2+5f′(x)=(x2+5)(x)′−(x2+5)′(x)(x2+5)2=(x2+5)−(2x)(x)(x2+5)2=−x2+5(x2+5)2f(x)=xx2+5f′(x)=(x2+5)(x)′−(x2+5)′(x)(x2+5)2=(x2+5)−(2x)(x)(x2+5)2=−x2+5(x2+5)2

Rule 7: The Chain Rule

Example:f(x)=(7x2−2x+13)5f′(x)=5(7x2−2x+13)4∗(7x2−2x+13)′=5(7x2−2x+13)4∗(14x−2)f(x)=(7x2−2x+13)5f′(x)=5(7x2−2x+13)4∗(7x2−2x+13)′=5(7x2−2x+13)4∗(14x−2)

8.1.2 Critical Points

Our goal is to use derivatives to find the values of ^αα^ and ^ββ^ that minimize the sum of the squared error. To do this we need to find the minima of a function. The minima is the smallest value that a function takes, whereas the maxima is the largest value. To find the minima and maxima, the critical points are key. The critical point is where the derivative of the function is equal to 00, or f′(x)=0f′(x)=0. Note that this is equivalent to the slope is equal to 00.

Example: Finding the Critical Points

To find the critical point for the function

y=f(x)=(x2−4x+5)y=f(x)=(x2−4x+5);

First find the derivative; f′(x)=2x−4f′(x)=2x−4
Set the derivative equal to 00; f′(x)=2x−4=0f′(x)=2x−4=0
Solve for xx; x=2x=2
Substitute 22 for xx into the function and solve for yy
Thus, the critical point (there’s only one in this case) of the function is (2,1)(2,1)

Once a critical point is identified, the next step is to determine whether that point is a minima or a maxima. The most straightforward way to do this is to identify the x,y coordinates and plot. This can be done in R, as we will show using the function y=f(x)=(x2−4x+5)y=f(x)=(x2−4x+5). The plot is shown in Figure \(\PageIndex{3}\).

x <- c(-5:5)
x

##  [1] -5 -4 -3 -2 -1  0  1  2  3  4  5

y <- x^2-4*x+5
y

##  [1] 50 37 26 17 10  5  2  1  2  5 10

plot(x,y, type="o", pch=19)

Figure \(\PageIndex{3}\): Identification of Critical Points

As can be seen, the critical point (2,1)(2,1) is a minima.

8.1.3 Partial Derivation

When an equation includes two variables, one can take a partial derivative with respect to only one variable, while the other variable is simply treated as a constant. This is particularly useful in our case because the function ∑ϵ2∑ϵ2 has two variables – ^αα^ and ^ββ^.

Let’s take an example. For the function y=f(x,z)=x3+4xz−5z2y=f(x,z)=x3+4xz−5z2, we first take the derivative of xx holding zz constant.

∂y∂x=∂f(x,z)∂x=3x2+4z∂y∂x=∂f(x,z)∂x=3x2+4z

Next we take the derivative of zz holding xx constant.

∂y∂z=∂f(x,z)∂z=4x−10z

Search

Text Color

Text Size

Margin Size

Font Type