# 12.2: Covariance and the Correlation Coefficient

- Page ID
- 10835

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

The mean value \(\mu_X = E[X]\) and the variance \(\sigma_X^2 = E[(X - \mu_X)^2]\) give important information about the distribution for real random variable \(X\). Can the expectation of an appropriate function of \((X, Y)\) give useful information about the joint distribution? A clue to one possibility is given in the expression

\(\text{Var}[X \pm Y] = \text{Var} [X] + \text{Var} [Y] \pm 2(E[XY] - E[X]E[Y])\)

The expression \(E[XY] - E[X]E[Y]\) vanishes if the pair is independent (and in some other cases). We note also that for \(\mu_X = E[X]\) and \(\mu_Y = E[Y]\)

\(E[(X - \mu_X) (Y - \mu_Y)] = E[XY] - \mu_X \mu_Y\)

To see this, expand the expression \((X - \mu_X)(Y - \mu_Y)\) and use linearity to get

\(E[(X - \mu_X) (Y - \mu_Y)] = E[XY - \mu_Y X - \mu_X Y + \mu_X \mu_Y] = E[XY] - \mu_Y E[X] - \mu_X E[Y] + \mu_X \mu_Y\)

which reduces directly to the desired expression. Now for given \(\omega\), \(X(\omega) - \mu_X\) is the variation of \(X\) from its mean and \(Y(\omega) - \mu_Y\) is the variation of \(Y\) from its mean. For this reason, the following terminology is used.

Definition: Covariance

The quantity \(\text{Cov} [X, Y] = E[(X - \mu_X)(Y - \mu_Y)]\) is called the *covariance* of \(X\) and \(Y\).

If we let \(X' = X - \mu_X\) and \(Y' = Y - \mu_Y\) be the ventered random variables, then

\(\text{Cov} [X, Y] = E[X'Y']\)

*Note* that the variance of \(X\) is the covariance of \(X\) with itself.

If we standardize, with \(X^* = (X - \mu_X)/\sigma_X\) and \(Y^* = (Y - \mu_Y)/\sigma_Y\), we have

Definition: Correlation Coefficient

The *correlation coefficient* \(\rho = \rho [X, Y]\) is the quantity

\(\rho [X,Y] = E[X^* Y^*] = \dfrac{E[(X - \mu_X)(Y - \mu_Y)]}{\sigma_X \sigma_Y}\)

Thus \(\rho = \text{Cov}[X, Y] / \sigma_X \sigma_Y\). We examine these concepts for information on the joint distribution. By Schwarz' inequality (E15), we have

\(\rho^2 = E^2 [X^* Y^*] \le E[(X^*)^2] E[(Y^*)^2] = 1\) with equality iff \(Y^* = cX^*\)

Now equality holds iff

\(1 = c^2 E^2[(X^*)^2] = c^2\) which implies \(c = \pm 1\) and \(\rho = \pm 1\)

We conclude \(-1 \le \rho \le 1\), with \(\rho = \pm 1\) iff \(Y^* = \pm X^*\)

**Relationship between \(\rho\) and the joint distribution**

- We consider first the distribution for the standardized pair \((X^*, Y^*)\)
- Since \(P(X^* \le r, Y^* \le s) = P(\dfrac{X - \mu_X}{\sigma_X} \le r, \dfrac{Y - \mu_Y}{\sigma_Y} \le s)\)

\(= P(X \le t = \sigma_X r + \mu_X, Y \le u = \sigma_Y s + \mu_Y)\)

we obtain the results for the distribution for \((X, Y)\) by the mapping

\(t = \sigma_X r + \mu_X\)

\(u = \sigma_Y s + \mu_Y\)

**Joint distribution for the standardized variables** \((X^*, Y^*)\), \((r, s) = (X^*, Y^*)(\omega)\)

\(\rho = 1\) iff \(X^* = Y^*\) iff all probability mass is on the line \(s = r\).

\(\rho = -1\) iff \(X^* = -Y^*\) iff all probability mass is on the line \(s = -r\).

If \(-1 < \rho < 1\), then at least some of the mass must fail to be on these lines.

**Figure 12.2.1**. Distance from point \((r,s)\) to the line \(s = r\).

The \(\rho = \pm 1\) lines for the \((X, Y)\) distribution are:

\(\dfrac{u - \mu_Y}{\sigma_Y} = \pm \dfrac{t - \mu_X}{\sigma_X}\) or \(u = \pm \dfrac{\sigma_Y}{\sigma_X}(t - \mu_X) + \mu_Y\)

Consider \(Z = Y^* - X^*\). Then \(E[\dfrac{1}{2} Z^2] = \dfrac{1}{2} E[(Y^* - X^*)^2]\). Reference to Figure 12.2.1 shows this is the average of the square of the distances of the points \((r, s) = (X^*, Y^*) (\omega)\) from the line \(s = r\) (i.e. the variance about the line \(s = r\)). Similarly for \(W = Y^* + X^*\). \(E[W^2/2]\) is the variance about \(s = -r\). Now

\(\dfrac{1}{2} E[(Y^* \pm X^*)^2] = \dfrac{1}{2}\{E[(Y^*)^2] + E[(X^*)^2] \pm 2E[X^* Y^*]\} = 1 \pm \rho\)

Thus

\(1 - \rho\) is the variance about \(s = r\) (the \(\rho = 1\) line)

\(1 + \rho\) is the variance about \(s = -r\) (the \(\rho = -1\) line)

Now since

\(E[(Y^* - X^*)^2] = E[(Y^* + X^*)^2]\) iff \(\rho = E[X^* Y^*] = 0\)

the condition \(\rho = 0\) is the condition for equality of the two variances.

**Transformation to the** \((X, Y)\) **plane**

\(t = \sigma_X r + \mu_X\) \(u = \sigma_Y s + \mu_Y\) \(r = \dfrac{t - \mu_X}{\sigma_X}\) \(s = \dfrac{u - \mu_Y}{\sigma_Y}\)

The \(\rho = 1\) line is:

\(\dfrac{u - \mu_Y}{\sigma_Y} = \dfrac{t - \mu_X}{\sigma_X}\) or \(u = \dfrac{\sigma_Y}{\sigma_X} (t - \mu_X) + \mu_Y\)

The \(\rho = -1\) line is:

\(\dfrac{u - \mu_Y}{\sigma_Y} = \dfrac{t - \mu_X}{\sigma_X}\) or \(u = -\dfrac{\sigma_Y}{\sigma_X} (t - \mu_X) + \mu_Y\)

\(1 - \rho\) is proportional to the variance abut the \(\rho = 1\) line and \(1 + \rho\) is proportional to the variance about the \(\rho = -1\) line. \(\rho = 0\) iff the variances about both are the same.

Example \(\PageIndex{1}\) Uncorrelated but not independent

Suppose the joint density for \(\{X, Y\}\) is constant on the unit circle about the origin. By the rectangle test, the pair cannot be independent. By symmetry, the \(\rho = 1\) line is \(u = t\) and the \(\rho = -1\) line is \(u = -t\). By symmetry, also, the variance about each of these lines is the same. Thus \(\rho = 0\), which is true iff \(\text{Cov}[X, Y] = 0\). This fact can be verified by calculation, if desired.

Example \(\PageIndex{2}\) Uniform marginal distributions

**Figure 12.2.2**. Uniform marginals but different correlation coefficients.

Consider the three distributions in Figure 12.2.2. In case (a), the distribution is uniform over the square centered at the origin with vertices at (1,1), (-1,1), (-1,-1), (1,-1). In case (b), the distribution is uniform over two squares, in the first and third quadrants with vertices (0,0), (1,0), (1,1), (0,1) and (0,0),

(-1,0), (-1,-1), (0,-1). In case (c) the two squares are in the second and fourth quadrants. The marginals are uniform on (-1,1) in each case, so that in each case

\(E[X] = E[Y] = 0\) and \(\text{Var} [X] = \text{Var} [Y] = 1/3\)

This means the \(\rho = 1\) line is \(u = t\) and the \(\rho = -1\) line is \(u = -t\).

a. By symmetry, \(E[XY] = 0\) (in fact the pair is independent) and \(\rho = 0\).

b. For every pair of possible values, the two signs must be the same, so \(E[XY] > 0\) which implies \(\rho > 0\). The actual value may be calculated to give \(\rho = 3/4\). Since \(1 - \rho < 1 + \rho\), the variance about the \(\rho = 1\) line is less than that about the \(\rho = -1\) line. This is evident from the figure.

c. \(E[XY] < 0\) and \(\rho < 0\). Since \(1 + \rho < 1 - \rho\), the variance about the \(\rho = -1\) line is less than that about the \(\rho = 1\) line. Again, examination of the figure confirms this.

Example \(\PageIndex{3}\) A pair of simple random variables

With the aid of m-functions and MATLAB we can easily caluclate the covariance and the correlation coefficient. We use the joint distribution for Example 9 in "Variance." In that example calculations show

\(E[XY] - E[X]E[Y] = -0.1633 = \text{Cov} [X,Y]\), \(\sigma_X = 1.8170\) and \(\sigma_Y = 1.9122\)

so that \(\rho = -0.04699\).

Example \(\PageIndex{4}\) An absolutely continuous pair

The pair \(\{X, Y\}\) has joint density function \(f_{XY} (t, u) = \dfrac{6}{5} (t + 2u)\) on the triangular region bounded by \(t = 0\), \(u = t\), and \(u = 1\). By the usual integration techniques, we have

\(f_X(t) = \dfrac{6}{5} (1 + t - 2t^2)\), \(0 \le t \le 1\) and \(f_Y (u) = 3u^2\), \(0 \le u \le 1\)

From this we obtain \(E[X] = 2/5\), \(\text{Var} [X] = 3/50\), \(E[Y] = 3/4\), and \(\text{Var} [Y] = 3/80\). To complete the picture we need

\(E[XY] = \dfrac{6}{5} \int_0^1 \int_t^1 (t^2 u + 2tu^2)\ dudt = 8/25\)

Then

\(\text{Cov} [X,Y] = E[XY] - E[X]E[Y] = 2/100\) and \(\rho = \dfrac{\text{Cov}[X,Y]}{\sigma_X \sigma_Y} = \dfrac{4}{30} \sqrt{10} \approx 0.4216\)

APPROXIMATION

tuappr Enter matrix [a b] of X-range endpoints [0 1] Enter matrix [c d] of Y-range endpoints [0 1] Enter number of X approximation points 200 Enter number of Y approximation points 200 Enter expression for joint density (6/5)*(t + 2*u).*(u>=t) Use array operations on X, Y, PX, PY, t, u, and P EX = total(t.*P) EX = 0.4012 % Theoretical = 0.4 EY = total(u.*P) EY = 0.7496 % Theoretical = 0.75 VX = total(t.^2.*P) - EX^2 VX = 0.0603 % Theoretical = 0.06 VY = total(u.^2.*P) - EY^2 VY = 0.0376 % Theoretical = 0.0375 CV = total(t.*u.*P) - EX*EY CV = 0.0201 % Theoretical = 0.02 rho = CV/sqrt(VX*VY) rho = 0.4212 % Theoretical = 0.4216

**Coefficient of linear correlation**

The parameter \(\rho\) is usually called the correlation coefficient. A more descriptive name would be *coefficient of linear correlation*. The following example shows that all probability mass may be on a curve, so that \(Y = g(X)\) (i.e., the value of *Y* is completely determined by the value of \(X\)), yet \(\rho = 0\).

Example \(\PageIndex{5}\) \(Y = g(X)\) but \(\rho = 0\)

Suppose \(X\) ~ uniform (-1, 1), so that \(f_X (t) = 1/2\), \(-1 < t < 1\) and \(E[X] = 0\). Let \(Y = g(X) = \cos X\). Then

\(\text{Cov} [X, Y] = E[XY] = \dfrac{1}{2} \int_{-1}^{1} t \cos t\ dt = 0\)

Thus \(\rho = 0\). Note that \(g\) could be any even function defined on (-1,1). In this case the integrand \(tg(t)\) is odd, so that the value of the integral is zero.

**Variance and covariance for linear combinations**

We generalize the property __(V4)__ on linear combinations. Consider the linear combinations

\(X = \sum_{i = 1}^{n} a_i X_i\) and \(Y = \sum_{j = 1}^{m} b_j Y_j\)

We wish to determine \(\text{Cov} [X, Y]\) and \(\text{Var}[X]\). It is convenient to work with the centered random variables \(X' = X - \mu_X\) and \(Y' = Y - \mu_Y\). Since by linearity of expectation,

\(\mu_X = \sum_{i = 1}^{n} a_i \mu_{X_i}\) and \(\mu_Y = \sum_{j = 1}^{m} b_j \mu_{Y_j}\)

we have

\(X' = \sum_{i = 1}^{n} a_i X_i - \sum_{i = 1}^{n} a_i \mu_{X_i} = \sum_{i = 1}^{n} a_i (X_i - \mu_{X_i}) = \sum_{i = 1}^{n} a_i X_i'\)

and similarly for \(Y'\). By definition

\(\text{Cov} (X, Y) = E[X'Y'] = E[\sum_{i, j} a_i b_j X_i' Y_j'] = \sum_{i,j} a_i b_j E[X_i' E_j'] = \sum_{i,j} a_i b_j \text{Cov} (X_i, Y_j)\)

In particular

\(\text{Var} (X) = \text{Cov} (X, X) = \sum_{i, j} a_i a_j \text{Cov} (X_i, X_j) = \sum_{i = 1}^{n} a_i^2 \text{Cov} (X_i, X_i) + \sum_{i \ne j} a_ia_j \text{Cov} (X_i, X_j)\)

Using the fact that \(a_ia_j \text{Cov} (X_i, X_j) = a_j a_i \text{Cov} (X_j, X_i)\), we have

\(\text{Var}[X] = \sum_{i = 1}^{n} a_i^2 \text{Var} [X_i] + 2\sum_{i <j} a_i a_j \text{Cov} (X_i, X_j)\)

Note that \(a_i^2\) does not depend upon the sign of \(a_i\). If the \(X_i\) form an independent class, or are otherwise uncorrelated, the expression for variance reduces to

\(\text{Var}[X] = \sum_{i = 1}^{n} a_i^2 \text{Var} [X_i]\)