# Least squares principle

$$\newcommand{\vecs}{\overset { \rightharpoonup} {\mathbf{#1}} }$$ $$\newcommand{\vecd}{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$$$$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\kernel}{\mathrm{null}\,}$$ $$\newcommand{\range}{\mathrm{range}\,}$$ $$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$ $$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}{\| #1 \|}$$ $$\newcommand{\inner}{\langle #1, #2 \rangle}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\kernel}{\mathrm{null}\,}$$ $$\newcommand{\range}{\mathrm{range}\,}$$ $$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$ $$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}{\| #1 \|}$$ $$\newcommand{\inner}{\langle #1, #2 \rangle}$$ $$\newcommand{\Span}{\mathrm{span}}$$$$\newcommand{\AA}{\unicode[.8,0]{x212B}}$$

Least squares principle is a widely used method for obtaining the estimates of the parameters in a statistical model based on observed data. Suppose that we have measurements $$Y_1,\ldots,Y_n$$ which are noisy versions of known functions $$f_1(\beta),\ldots,f_n(\beta)$$ of an unknown parameter $$\beta$$. This means, we can write

$Y_i = f_i(\beta) + \varepsilon_i, i=1,\ldots,n$

where $$\varepsilon_1,\ldots,\varepsilon_n$$ are quantities that measure the departure of the observed measurements from the model, and are typically referred to as noise. Then the least squares estimate of $$\beta$$ from this model is defined as

$\widehat\beta = \min_{\beta} \sum_{i=1}^n(Y_i - f_i(\beta))^2$

The quantity $$f_i(\widehat\beta)$$ is then referred to as the fitted value of $$Y_i$$, and the difference $$Y_i - f_i(\widehat\beta)$$ is referred to as the corresponding residual. It should be noted that $$\widehat\beta$$ may not be unique. Also, even if it is unique it may not be available in a closed mathematical form. Usually, if each $$f_i$$ is a smooth function of $$\beta$$, one can obtain the estimate $$\widehat\beta$$ by using numerical optimization methods that rely on taking derivatives of the objective function. If the functions $$f_i(\beta)$$ are linear functions of $$\beta$$, as is the case in a linear regression problem, then one can obtain the estimate $$\widehat\beta$$ in a closed form.

## Contributors

• Debashis Paul

This page titled Least squares principle is shared under a not declared license and was authored, remixed, and/or curated by Debashis Paul.