Analysis the Similarity and Difference between SVM, LDA and QDA (Heng Xu)

Last updated
Save as PDF

Page ID: 2493

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Connection between SVM, Covariance Adjusted SVM, LDA and QDA

We here want to show some interesting connection between SVM, Covariance Adjusted SVM, LDA and QDA. Under certain condition, these methods will construct very similar classifiers. But firstly, let’s briefly introduce the ideas of other three methods.

1. Covariance Adjusted SVM

Covariance Adjusted SVM is a modified SVM by adding pooled variance - covariance matrix into consideration. The idea that we want to maximize the margin in the direction of \(((\sum)^{\frac{1}{2}})\textbf{w}\). Here we use \(\sum\) to denote the pooled sample variance for class 1 and class 2, which equals to \(\frac{n_1s_1^2+n_2s_2^2}{N}\). And then, we form our model as

\[\begin{aligned} & \underset{w,b,\varepsilon_i}{\text{minimize}} & & \frac{1}{2}w^T\sum w + C\sum_{i = 1}^{n}\varepsilon_i\\ & \text{ s.t.} & & y_i(w^Tx_i+b) \geq 1 - \varepsilon_i,\; \text{and } \varepsilon_i \geq 0, i = 1, \ldots, n,\\ \end{aligned}\]

In fact, we can verify that Covariance Adjusted SVM is equivalent to multiply the square root of pooled sample variance to the training data and apply the SVM to the new data. The verification is following:

\[\begin{aligned} & \underset{w,b,\varepsilon_i}{\text{minimize}} & &\frac{1}{2}\textbf{w}^T((\sum)^{\frac{1}{2}})^T ((\sum)^{\frac{1}{2}})\textbf{w}+C\sum_{i = 1}^{n}\varepsilon_i\\ & \text{ s.t.} & &y_i(\textbf{w}^T(\sum)^{\frac{1}{2}}(\sum)^{-\frac{1}{2}}x_i+b) \geq 1, \text{for i = 1,...,n,}\\ \end{aligned}\]

In practice, such model performs at least as good as traditional SVM, and sometimes can correct some improper prediction by traditional SVM, which will be demonstrated in following comparison part.

2. LDA and QDA

Linear discriminant Analysis and Quadratic discriminate Analysis are popular traditional classification methods. These two methods assume each class are from multivariate Gaussian distribution and use statistical properties of the data, the variance - covariance and the mean, to establish the classifier. The mainly difference between LDA and QDA is that if we have observed or calculated that each class has similar variance - covariance matrix, we will use LDA to construct a straight line as our classifier;otherwise, if classes have different variance - covariance matrix, we will use QDA to construct a quadratic curve as our classifier. The following plot gives some basic ideas:

3. Comparison with SVM, Covariance Adjusted SVM, LDA and QDA

Case 1: 2 dimension, Same Variance-Covariance and merged heavily
We here look at the example for two classes generated from the same multivariate Gaussian Distribution. We can see that the shape of these two ellipse are similar and yellow class and red class merged heavily with each other. We here use LDA(blue solid line), linear SVM(green solid line) and linear Covariance Adjusted SVM(red dashed line) respectively and plot them on the data set. It is obvious that these three lines give basically the same prediction. In other words, in the case that two classes merged heavily with each other, using LDA (representing considering statistical information), SVM or Covariance adjusted SVM would likely give same prediction.
Case 2: 2 dimension, Same Variance-Covariance but merged not heavily
We here move two classes a little further with each other and also use LDA(blue solid line), linear SVM(black solid line) and linear Covariance Adjusted SVM(red dashed line). We here notice that unlike in merged heavily case, these three methods give really different classifiers.
Case 3: 2 dimension, different Variance-Covariance and merged heavily
Here we are dealing with the classes have different variance-covariance matrices but still merged heavily. But since she shape of two classes are different, here we use QDA(figure 3), SVM of Polynomial Kernel of degree 2(figure 1) and Covariance Adjusted SVM of Polynomial Kernel of degree 2(figure 2). We observe that these three methods also give similar classifiers, even though that Covariance Adjusted SVM of Polynomial Kernel of degree 2 neglects the yellow class on the left part.
Case 4: 2 dimension, different Variance-Covariance and merged heavily
We here want to see one more intense case. The yellow class and red class have different variance-covariance matrices and also cross each other. But still, there are many points mix with each other in the middle part. From previous experience, it is reasonable for these three methods give the similar prediction.
Case 5: 2 dimension, different Variance-Covariance but not merged heavily
We here decrease the amount of points that being merged together. Like the situation in case 2, SVM and covariance adjusted SVM are different from QDA.

4. Analysis

We mainly compare two pair of classifications: 1. Linear SVM, Covariance Adjusted SVM and LDA. 2. SVM of Polynomial Kernel of degree 2, Covariance Adjusted SVM of Polynomial Kernel of degree 2 and QDA. Based on the comparison, we might have the conclusion that:

In the case that different classes’ observations merge with each other, we might consider use LDA and QDA by using statistical properties of the data to construct the classifiers. However, if there are enough mixed points, SVM and Covariance Adjusted SVM will construct really similar classifiers. And if there are not such many points mixed with each other, SVM and Covariance Adjusted SVM will be more ’ambitious’ to find a best classifier based on the situations.