Skip to main content
Statistics LibreTexts

8.1: Minimizing Error using Derivatives

  • Page ID
    7239
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    In calculus, the derivative is a measure of the slope of any function of x, or f(x)f(x), at each given value of xx. For the function f(x)f(x), the derivative is denoted as f′(x)f′(x) or, pronounced as “f prime x”. Because the formula for ∑ϵ2∑ϵ2 is known and can be treated as a function, the derivative of that function permits the calculation of the change in the sum of the squared error over each possible value of ^αα^ and ^ββ^. For that reason, we need to find the derivative for ∑ϵ2∑ϵ2 with respect to changes in ^αα^ and ^ββ^. That, in turn, will permit us to “derive” the values of ^αα^ and ^ββ^ that result in the lowest possible ∑ϵ2∑ϵ2.

    Look – we understand that this all sounds complicated. But it’s not all that complicated. In this chapter, we will walk through all the steps so you’ll see that it's really rather simple and, well, elegant. You will see that differential calculus (the kind of calculus that is concerned with rates of change) is built on a set of clearly defined rules for finding the derivative for any function f(x)f(x). It’s like solving a puzzle. The next section outlines these rules, so we can start solving puzzles.

    8.1.1 Rules of Derivation

    Derivative Rules

    1. Power Rule
    2. Constant Rule
    3. A Constant Times a Function
    4. Differentiating a Sum
    5. Product Rule
    6. Quotient Rule
    7. Chain Rule

    The following sections provide examples of the application of each rule.

    Rule 1: The Power Rule

    Example:f(x)=x6f′(x)=6∗x6−1=6x5f(x)=x6f′(x)=6∗x6−1=6x5

    A second example can be plotted in R. The function is f(x)=x2f(x)=x2 and therefore, using the power rule, the derivative is: f′(x)=2xf′(x)=2x.

    x <- c(-5:5)
    x
    ##  [1] -5 -4 -3 -2 -1  0  1  2  3  4  5
    y <- x^2
    y
    ##  [1] 25 16  9  4  1  0  1  4  9 16 25
    plot(x,y, type="o", pch=19)
    powerfun-1.png
    Figure \(\PageIndex{1}\): Calculating Slopes for (x,y)(x,y) Pairs

    Rule 2: The Constant Rule

    Example:f(x)=346f′(x)=0=10xf(x)=346f′(x)=0=10x

    Rule 3: A Constant Times a Function

    Example:f(x)=5x2f′(x)=5∗2x2−1=10xf(x)=5x2f′(x)=5∗2x2−1=10x

    Rule 4: Differentiating a Sum

    Example:
    f(x)=4x2+32xf′(x)=(4x2)′+(32x)′=4∗2x2−1+32=8x+32f(x)=4x2+32xf′(x)=(4x2)′+(32x)′=4∗2x2−1+32=8x+32

    Rule 5: The Product Rule

    Example:f(x)=x3(x−5)f′(x)=(x3)′(x−5)+(x3)(x−5)′=3x2(x−5)+(x3)∗1=3x3−15x2+x3=4x3−15x2f(x)=x3(x−5)f′(x)=(x3)′(x−5)+(x3)(x−5)′=3x2(x−5)+(x3)∗1=3x3−15x2+x3=4x3−15x2

    In a second example, the product rule is applied to the function y=f(x)=x2−6x+5y=f(x)=x2−6x+5. The derivative of this function is f′(x)=2x−6f′(x)=2x−6. This function can be plotted in R.

    x <- c(-1:7)
    x
    ## [1] -1  0  1  2  3  4  5  6  7
    y <- x^2-6*x+5
    y
    ## [1] 12  5  0 -3 -4 -3  0  5 12
    plot(x,y, type="o", pch=19)
    abline(h=0,v=0)
    prodfun-1.png
    Figure \(\PageIndex{2}\): Plot of Function y=f(x)=x2−6x+5y=f(x)=x2−6x+5

    We can also use the derivative and R to calculate the slope for each value of XX.

    b <- 2*x-6
    b
    ## [1] -8 -6 -4 -2  0  2  4  6  8

    The values for XX, which are shown in Figure \(\PageIndex{2}\), range from -8 to +8 and return derivatives (slopes at a point) ranging from -25 to +25.

    Rule 6: the Quotient Rule

    Example:f(x)=xx2+5f′(x)=(x2+5)(x)′−(x2+5)′(x)(x2+5)2=(x2+5)−(2x)(x)(x2+5)2=−x2+5(x2+5)2f(x)=xx2+5f′(x)=(x2+5)(x)′−(x2+5)′(x)(x2+5)2=(x2+5)−(2x)(x)(x2+5)2=−x2+5(x2+5)2

    Rule 7: The Chain Rule

    Example:f(x)=(7x2−2x+13)5f′(x)=5(7x2−2x+13)4∗(7x2−2x+13)′=5(7x2−2x+13)4∗(14x−2)f(x)=(7x2−2x+13)5f′(x)=5(7x2−2x+13)4∗(7x2−2x+13)′=5(7x2−2x+13)4∗(14x−2)

    8.1.2 Critical Points

    Our goal is to use derivatives to find the values of ^αα^ and ^ββ^ that minimize the sum of the squared error. To do this we need to find the minima of a function. The minima is the smallest value that a function takes, whereas the maxima is the largest value. To find the minima and maxima, the critical points are key. The critical point is where the derivative of the function is equal to 00, or f′(x)=0f′(x)=0. Note that this is equivalent to the slope is equal to 00.

    Example: Finding the Critical Points

    To find the critical point for the function

    y=f(x)=(x2−4x+5)y=f(x)=(x2−4x+5);

    • First find the derivative; f′(x)=2x−4f′(x)=2x−4
    • Set the derivative equal to 00; f′(x)=2x−4=0f′(x)=2x−4=0
    • Solve for xx; x=2x=2
    • Substitute 22 for xx into the function and solve for yy
    • Thus, the critical point (there’s only one in this case) of the function is (2,1)(2,1)

    Once a critical point is identified, the next step is to determine whether that point is a minima or a maxima. The most straightforward way to do this is to identify the x,y coordinates and plot. This can be done in R, as we will show using the function y=f(x)=(x2−4x+5)y=f(x)=(x2−4x+5). The plot is shown in Figure \(\PageIndex{3}\).

    x <- c(-5:5)
    x
    ##  [1] -5 -4 -3 -2 -1  0  1  2  3  4  5
    y <- x^2-4*x+5
    y
    ##  [1] 50 37 26 17 10  5  2  1  2  5 10
    plot(x,y, type="o", pch=19)
    crit-1.png
    Figure \(\PageIndex{3}\): Identification of Critical Points

    As can be seen, the critical point (2,1)(2,1) is a minima.

    8.1.3 Partial Derivation

    When an equation includes two variables, one can take a partial derivative with respect to only one variable, while the other variable is simply treated as a constant. This is particularly useful in our case because the function ∑ϵ2∑ϵ2 has two variables – ^αα^ and ^ββ^.

    Let’s take an example. For the function y=f(x,z)=x3+4xz−5z2y=f(x,z)=x3+4xz−5z2, we first take the derivative of xx holding zz constant.

    ∂y∂x=∂f(x,z)∂x=3x2+4z∂y∂x=∂f(x,z)∂x=3x2+4z

    Next we take the derivative of zz holding xx constant.

    ∂y∂z=∂f(x,z)∂z=4x−10z


    This page titled 8.1: Minimizing Error using Derivatives is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by Jenkins-Smith et al. (University of Oklahoma Libraries) via source content that was edited to the style and standards of the LibreTexts platform.