Skip to main content
Statistics LibreTexts

15.5: Measuring Accuracy

  • Page ID
    57780
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Naturally, the next questions concern issues of goodness-of-fit: How good is the model? This question can be answered in many ways using many related accuracy measures.

    Recall that in linear regression, we used \(R^2\) to help us determine how well the model fit the data — an \(R^2\) value close to 1.00 indicated good fit, while an \(R^2\) value close to 0.00 indicated a poor fit. If we recall, the \(R^2\) value — a PRE measure — was calculated using a ratio of the original variability in the data and the variability explained by the model. The \(R^2\) value was not the only PRE we have covered. Many others exist. Similar processes can be used in this context to create a pseudo-\(R^2\) measure.

    Note

    This measure is a pseudo-\(R^2\) measure primarily because it shares some of the characteristics of the true \(R^2\) measure, namely that it measures the decrease in prediction variability due to the model. It is a PRE measure.

    The reason it is not called the \(R^2\) measure is only because that name is taken elsewhere.

    Accuracy Rate

    Let us define the accuracy rate to be the number of correct predictions divided by the total number of predictions. This makes inherent sense as a measure of goodness of fit since it reads as the proportion of correct predictions.

    There is no native accuracy function in R (this should raise a red flag 🚩). However, the KnoxStats package provides one. The accuracy function takes four parameters: data (the data variable), y (the binary dependent variable), model (the model you fit with the data), and t (the threshold). The optional parameter, rate, tells the function to return the accuracy rate (default) or the number of accurate predictions (rate=FALSE).

    Thus, to determine the accuracy of this model for this data using the usual threshold value of \(\tau=0.500\), we would use

    accuracy(data=coin, y=coin$head, model=m1, t=0.500)
    

    The result of this command is 0.710, which agrees with our by-hand calculations. Thus, we conclude that this model correctly predicts 71% of the time for this data.

    Relative Accuracy

    Of course, having an accuracy rate of 0.710 does not tell us the entire story. Just as the \(R^2\) was based on a ratio of the model variance to the data (null) variance, a better accuracy number would be the accuracy of the model relative to the accuracy of the null model. The accuracy of the null model refers to merely selecting the modal category as our prediction. In this example, the modal category is Tails, as there were 61 Tails in the data. Thus, the accuracy of merely selecting the modal category is \(61 \div 100 = 0.610\). So, the relative accuracy is

    \begin{equation}
    A_R = \frac{0.710}{0.610} = 1.164
    \end{equation}

    Thus, the model does a 16.4% better job of prediction than does just predicting "Tail" all of the time.

    There actually is a proportional reduction in error (PRE) measurement associated with the relative accuracy. Recall that the \(R^2\) value was valuable because it measured the proportion of error explained by the model. For binary dependent variable regression, we can calculate something similar.

    \[ \mathrm{PRE} = 1 - \frac{ \text{error with model}}{ \text{error without model}} \]

    Here, we can see that a pseudo-\(R^2\) measure for this data and this model (and this threshold) is

    \begin{equation}
    1 - \frac{1-0.710}{1-0.610} \approx 0.2564
    \end{equation}

    Thus, we can state that this model (and this threshold) reduced the error by 25.64%. Note that there is a bad quality of this measure: while it can never be greater than 1.0, it can be less than zero. However, it will only be less than zero when your model is worse than no model at all.

    Note

    There are many different ways of calculating pseudo-\(R^2\) measures. Each of the measures are based on different definitions of "error" or of "variability," just as the \(R^2\) and the adjusted \(R^2\) are both based on different definitions of variability. Researchers do not agree on much about pseudo-\(R^2\) measures except that they are not useful in vacuo, and rarely useful in concert with other measures.

    This is why I am offering it here, alongside many other measures of fit. Getting to know your model results is just as important as getting to know your data.

    Maximum Accuracy

    In each of the above measures, we assumed our threshold was \(\tau = 0.500\). In some cases, this is a logical threshold. In some cases, it is chosen arbitrarily. If we treat \(\tau\) as a parameter, we may be able to get a better prediction model.

    The plan is straight forward: Calculate the accuracy for various values of the threshold. The threshold that gives us the best accuracy will be our optimal threshold. Doing this by hand is prohibitive. Using a script to loop through all threshold values is much easier:

    a = numeric()
    for(i in 1:100) {
      t = i/100
      a[i] = accuracy(coin, coin$head, m1, t=t)
    }
    

    Graphing these results gives the following figure.

    The model accuracy against various thresholds.
    Figure \(\PageIndex{1}\): A plot of the accuracy of the model against various thresholds. The horizontal line corresponds to the accuracy of selecting the modal category (the base accuracy). The vertical line corresponds to the threshold \(\tau=0.50\). The circled point represents the maximal threshold, \(\tau=0.44\) and accuracy \(=0.74\). The light blue envelope consists of a 95% confidence interval for coin accuracy, based on Monte Carlo simulation.

    Figure \(\PageIndex{1}\) above is a plot of the calculated accuracy for various thresholds. Note that the "optimal" threshold is not \(\tau=0.50\), but \(\tau=0.44\), and the maximal accuracy is \(0.74\) for that threshold. With that being said, however, that there is little difference in accuracies between this optimal threshold (\(\tau=0.44, A=0.73\)) and the traditional threshold (\(\tau=0.50, A=0.71\)).

    Note

    Recall that the standard deviation for (variability of) a binomial random variable is \(\sigma_x = \sqrt{n\pi(1-\pi)}\). This takes on a maximum value at \(\pi=0.500\)… the success probability for a fair coin. This means that we are least sure of our answer nearest \(\pi = 0.500\).

    The blue envelope of the figure above contains 95% of the calculated accuracies based on the true population; that is, 95% of the accuracy curves are contained in that envelope. It is very wide. It supports the contention that accuracy (relative or otherwise) matters little in the estimation of an optimal threshold \(\tau\).

    By the way, since this is all based on generated data, we know the true threshold for a fair coin: \(\tau=0.500\). Binomial random variables contain small amounts of information.

    Finally, here is the script to estimate the confidence bounds above using Monte Carlo simulation:

    B = 1e4
    acc = matrix(NA, ncol=100, nrow=B)
    
    for( j in 1:B ) {
      thisSample = sample(100, replace=TRUE)
      mod = glm(head[thisSample]~trial[thisSample], family=binomial)
      a = numeric()
      
      for(tau in 1:100) {
        a[tau] = accuracy( trial, head, mod, t=tau/100 )
      }
      acc[j, ] = a
    }
    

    From reading through this script, you should be able to tell what the variables head and trial represent. You should also be able to explain the purpose of each line. In fact, you should be able to use this code as the basis of future accuracy investigations.

    The ROC Curve

    There are other types of errors, more-specific types, that are definitely more useful and generalizable. If we look back to the coin logistic figure, we see that the threshold line (horizontal) and the corresponding trial line (vertical) divide the dataset into four parts. The lower-left quadrant are those Tails that are correctly predicted by the model and the threshold value to be Tails. The upper-right quadrant are those Heads that are correctly predicted to be Heads. The lower-right quadrant are Tails incorrectly predicted to be Heads. The upper-left quadrant are Heads incorrectly predicted to be Tails. These four types of errors are also referred to, respectively, as

    • True Negatives
    • True Positives
    • False Positives
    • False Negatives

    For our coin flipping example (and with \(\tau=0.500\)), we can write out a confusion matrix to show all four of these, both in magnitude and in rates:

    \begin{equation*}
    \left[ \begin{array}{cc} TP=22 & FP=12 \\[1em] FN=17 & TN=49 \\ \end{array} \right] \Longleftrightarrow \left[ \begin{array}{cc} TPR=\frac{22}{17+22} = 0.5641 & FPR=\frac{12}{49+12} = 0.1967 \\[1em] FNR=\frac{17}{17+22} = 0.4359 & TNR=\frac{49}{49+12} = 0.8033 \\ \end{array} \right]
    \end{equation*}

    The true negative rate (TNR) is also called specificity, and the true positive rate (TPR) is called the sensitivity. You will come across these two terms in the field of biostatistics and clinical trials because they mirror what physicians and biomedical researchers want out of their diagnostic tests.

    The receiver operating characteristic (ROC) curve is a graphical representation of the true positive rate against the false positive rate (sensitivity against \(1-\mathrm{specificity}\)) as the threshold is changed. Thus, to plot a ROC curve, one would calculate the true positive and the false positive rates for various values of the threshold, then plot the first against the second. The figure below shows the ROC curve for our coin model.

    The ROC curve for the coin flipping model.
    Figure \(\PageIndex{2}\): A receiver operating characteristic curve for the coin flipping model. The diagonal line represents a random model. The thicker line represents our model. The farther the ROC curve is above the random line, the better the model is at distinguishing between the two cases (Head and Tail, here). The area under the ROC curve is a measure of the goodness of the model. Here, \(A^\prime=0.7516\).

    In general, a model whose ROC curve is closer to the left and upper axes is the better model. As such, we can define a single number that tells us how good our model is — the area under the ROC curve (AUC, \(A^\prime\)).

    The area under the ROC curve is a useful number in that it equals the probability that a model will classify a positive instance higher than a negative one. In other words, \(A^\prime\) is the probability that the model scores a true Head (success) higher than a true Tail (failure). Calculating the area is very straight forward, in a geometry/Riemann Sum manner.

    The ROC curve using the ROC command.
    Figure \(\PageIndex{3}\): The receiver operating characteristic curve for the coin flipping model using the ROC command from the Epi package.
    R Feature

    There is an entire R package dedicated to ROC curves, Epi. To create ROC graphs and to calculate the area under the curve in that package, first load it using library(Epi), then use the command

    ROC(test, stat, plot="ROC")
    

    Here, test is the predicted probability of success for each datum from model (a continuous variable bounded by 0 and 1), stat is the binary dependent variable, and plot="ROC" produces a ROC plot. This graph (see the figure above) is a bit more useful than the simple graph in the previous figure, as it contains some useful statistics, including the AUC and the optimal threshold, \(\tau\), which is the threshold value closest to the upper-left corner.

    Sensitivity and Specificity

    When a binary classifier, such as a logistic regression model with a chosen probability cutoff, makes predictions, its performance is judged by two fundamental and often competing properties: sensitivity and specificity. These metrics move us beyond simple accuracy to understand precisely how a model is succeeding or failing.

    Sensitivity, also called the True Positive Rate or Recall, answers the question: Of all the actual positive cases, what proportion did the model correctly identify? It is calculated as True Positives divided by the sum of True Positives and False Negatives. A test or model with 95% sensitivity is excellent at catching positives; it misses only 5% of them. This is the critical measure when the cost of missing a positive is high, such as failing to diagnose a serious disease or failing to detect a security threat. In contrast, Specificity, the True Negative Rate, answers: Of all the actual negative cases, what proportion did the model correctly rule out? It is calculated as True Negatives divided by the sum of True Negatives and False Positives. A highly specific model, say 98%, is very reliable when it gives a positive prediction because it rarely falsely alarms on negatives.

    Crucially, sensitivity and specificity exist in a trade-off, governed by the decision threshold of the classifier. Setting a very low threshold for declaring a positive will catch nearly all true positives, resulting in high sensitivity, but will also falsely label many negatives as positives, resulting in low specificity. Conversely, a very high threshold will correctly rule out nearly all negatives, achieving high specificity, but will miss many true positives, yielding low sensitivity. This trade-off is formally visualized by the Receiver Operating Characteristic, or ROC, curve, which plots the sensitivity against one minus specificity across all possible thresholds. The choice of the optimal operating point on this curve is not a statistical decision, but a practical and ethical one, determined by the relative costs of false positives and false negatives in the specific application domain.

    Sensitivity, or the True Positive Rate (TPR), is defined as:

    \[
    \text{Sensitivity} = \frac{TP}{TP + FN}
    \]

    where \(TP\) represents True Positives and \(FN\) represents False Negatives.

    Specificity, or the True Negative Rate (TNR), is defined as:

    \[
    \text{Specificity} = \frac{TN}{TN + FP}
    \]

    where \(TN\) represents True Negatives and \(FP\) represents False Positives.

    These are implemented in KnoxStats as such:

    sensitivity(confMatrix)
    
    specificity(confMatrix)

    F and F1 Scores

    In many classification scenarios, a single metric that balances the competing priorities of precision and recall is desirable. The F-score, specifically the F1 score, provides this balance. It is the harmonic mean of precision and recall (sensitivity). Precision answers the question: "Of all instances the model labeled as positive, what proportion were actually positive?" It is calculated as \(\text{Precision} = TP / (TP + FP)\). Recall, as previously defined, is \(\text{Recall} = TP / (TP + FN)\). The harmonic mean, unlike a simple arithmetic mean, is particularly stringent; it yields a low score if either precision or recall is poor. The general formula for the F-score is:

    \[
    F_\beta = (1 + \beta^2) \cdot \frac{\text{Precision} \cdot \text{Recall}}{(\beta^2 \cdot \text{Precision}) + \text{Recall}}
    \]

    The \(\beta\) parameter allows the user to weight the importance of recall relative to precision. A \(\beta > 1\) weights recall more heavily, while a \(\beta < 1\) weights precision more heavily.

    The most common variant is the F1 score, where \(\beta = 1\), giving equal weight to precision and recall. Its formula simplifies to:

    \[
    F_1 = 2 \cdot \frac{\text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}} = \frac{2 \cdot TP}{2 \cdot TP + FP + FN}
    \]

    The F1 score is particularly useful when you have an unbalanced class distribution and a simple measure of accuracy would be misleading. For example, in fraud detection where 99% of transactions are legitimate, a model that simply predicts "not fraud" for every transaction would be 99% accurate but useless. The F1 score, by focusing on the correct identification of the rare positive class (fraud), provides a much more meaningful assessment of the model's practical utility in such contexts.

    The F and F1 scores are implemented in KnoxStats as such:

    Fscore(confMatrix, beta=3)
    
    F1score(confMatrix)
    

    Make sure that your confusion matrices are of the right format:

                       Reality
                       FALSE  TRUE
    Prediction  FALSE     53    26
                TRUE       8    13
    

    Error Costs

    Note that this optimal value is only optimal if the costs of making each type of error is the same. If the cost of a Type I Error is greater than that of a Type II Error (or vice-versa), then one should take those costs into consideration when determining the "optimal" threshold \(\tau\).

    Not all errors hurt the same.

    For instance, if one is modeling fraudulent credit card transactions, then a false positive would happen if the model flagged the transaction as being fraudulent (but it isn't). A false negative would occur if the model did not flag a transaction as fraudulent (but it is). A high false positive rate would inconvenience the credit card holder. It would reduce their ability to use it. A high false negative rate would inconvenience the credit card bank by forcing them to pay for fraudulent uses of the card.

    Not all errors hurt the same (or even the same people).

    For instance, given the following table, which of the two models should be used?

    Table \(\PageIndex{1}\): Example of different costs requiring us to select a different model
    Error Type Error Cost Model 1 Model 2
    FPR $50 0.10 0.15
    FNR $10 0.15 0.10

    The first model costs $6.50; the second, $8.50. Thus, we should select the first model (or threshold).

    Here is the code for the analysis on this page:

    library(KnoxStats)
    library(Epi)
    
    coin = read.csv("http://rfs.kvasaheim.com/data/coinflips.csv")
    attach(coin)
    summary(coin)
    
    
    m1 = glm(head~trial, family=binomial(link="logit"), data=coin)
    summary(m1)
    
    
    # Accuracy
    accuracy(data=coin, y=coin$head, model=m1, t=0.500)
    
    # Relative Accuracy
    accuracy(data=coin, y=coin$head, model=m1, t=0.500)/0.610
    
    # Maximum Accuracy
    a = numeric()
    for(i in 1:100) {
      t = i/100
      a[i] = accuracy(coin, coin$head, m1, t=t)
    }
    max(a)
    which.max(a)
    
    
    # MC Confidence Bounds
    B = 1e3
    acc = matrix(NA, ncol=100, nrow=B)
    for( j in 1:B ) {
      thisSample = sample(100, replace=TRUE)
      thisData = coin[thisSample,]
      mod = glm(head~trial, family=binomial(link="logit"), data=thisData)
      a2 = numeric()
      
      for(tau in 1:100) {
        a2[tau] = accuracy( data=thisData, y=thisData$head, model=mod, t=tau/100 )
      }
      acc[j, ] = a2
    }
    
    
    # The Bounds
    lcl=ucl=numeric()
    for(i in 1:100) {
      lcl[i] =quantile(acc[,i],0.025)
      ucl[i] =quantile(acc[,i],0.975)
    }
    
    
    # The Graphic
    
    par(xaxs="i", yaxs="i")
    par(family="serif", las=1)
    par(cex.lab=1.2, font.lab=2)
    par(mar=c(3,3,0,0)+0.75)
    
    plot.new()
    plot.window( xlim=c(0.05,0.95), ylim=c(0.3,0.82))
    
    title(xlab="Threshold", line=2.5)
    title(ylab="Raw Accuracy", line=2.75)
    
    polygon(x=c(1:100/100,rev(1:100/100)),y=c(ucl,rev(lcl)) , col="AliceBlue", border=4 )
    
    abline(v=0.50, lty=2, col="grey")
    abline(h=0.61, lty=2, col="grey")
    
    lines(1:100/100, a, lwd=2)
    points(1:100/100, a, pch=21, bg="navy")
    
    axis(1, at=0:10/10)
    axis(2, at=0:10/10)
    
    points(which.max(a)/100, max(a), cex=3, lwd=1)
    
    
    # ROC Curve
    ROC(test=predict(m1), stat=head, data=coin)
    
    
    
    # More Accuracy measures
    confMatrix = table(predict(m1)>which.max(a)/100, head)
    
    sensitivity(confMatrix)
    specificity(confMatrix)
    
    Fscore(confMatrix, beta=3)
    F1score(confMatrix)
    
    

    This page titled 15.5: Measuring Accuracy is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Ole Forsberg.

    • Was this article helpful?