3: K-Nearest Neighbors (KNN)

Last updated
Save as PDF

Page ID: 2483

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

3.1: K nearest neighbors

Assume we are given a dataset where \(X\) is a matrix of features from an observation and \(Y\) is a class label. We will use this notation throughout this article. \(k\)-nearest neighbors then, is a method of classification that estimates the conditional distribution of \(Y\) given \(X\) and classifies an observation to the class with the highest probability. Given a positive integer \(k\), \(k\)-nearest neighbors looks at the \(k\) observations closest to a test observation \(x_0\) and estimates the conditional probability that it belongs to class \(j\) using the formula

\[Pr(Y=j|X=x_0)=\frac{1}{k}\sum_{i\in \mathcal{N}_0}I(y_i=j)\]

where \(\mathcal{N}_0\) is the set of \(k\)-nearest observations and \(I(y_i=j)\) is an indicator variable that evaluates to 1 if a given observation \((x_i,y_i)\) in \(\mathcal{N}_0\) is a member of class \(j\), and 0 if otherwise. After estimating these probabilities, \(k\)-nearest neighbors assigns the observation \(x_0\) to the class which the previous probability is the greatest. The following plot can be used to illustrate how the algorithm works:

If we choose \(K = 3\), then we have 2 observations in Class B and one observation in Class A. So, we classify the red star to Class B.
If we choose \(K = 6\), then we have 2 observations in Class B but four observations in Class A. So, we classify the red star to Class A.

3.2: The Calculation of Distance

Since in k-NN algorithm, we need k nearest points, thus, the first step is calculating the distance between the input data point and other points in our training data. Suppose \(x\) is a point with coordinates \((x_1,x_2,...,x_p)\) and \(y\) is a point with coordinates \((y_1,y_2,...,y_p)\), then the distance between these two points is: \[d(x,y) = \sqrt{\sum_{i=1}^{p}(x_i - y_i)^2}\]

3.3: The Effect of K

As most statistical learning model does, if \(K\) is small, then, we use a smaller region of data to learn. This may cause over-fitting.This may cause under-fitting. And if \(K\) is large, then we use a larger region of data to learn. The following plot shows the effects of \(K\). as \(K\) gets larger, the decision boundary appears linear.