OLS Derivation

Ordinary Least Squares (OLS) is a great low computing power way to obtain estimates for coefficients in a linear regression model. I wanted to detail the derivation of the solution since it can be confusing for anyone not familiar with matrix calculus.

First, the initial matrix equation is setup below. With X being a matrix of the data’s p covariates plus the regression constant. [This will be represented as a column of ones if you were to look at the data in the X matrix.] Y is the column matrix of the target variable and β is the column matrix of unknown coefficients. e is a column matrix of the residuals.

$latex \mathbf{Y = X} \boldsymbol{\beta} + \boldsymbol{e} &s=1$

Before manipulating the equation it is important to note you are not solving for X or Y, but instead β and will do this by minimizing the sum of squares for the residuals (SSE). So the equation can be rewritten by moving the error term to the left side of the equation.

$latex \boldsymbol{e} = \mathbf{Y – X} \boldsymbol{\beta}&s=1$

The SSE can be written as the product of the transposed residual column vector and its original column vector. [This is actually how you would obtain the sum of squares for any vector.]

$latex \mathrm{SSE} = \boldsymbol{e}’\boldsymbol{e} &s=1$

Since you transpose and multiply one side of the equation, you have to follow suit on the other side. Yielding

$latex \boldsymbol{e’e} = (\mathbf{Y – X} \boldsymbol{\beta})'(\mathbf{Y – X} \boldsymbol{\beta})&s=1$

The transpose operator can be distributed through out the quantity on the right side, so the right side can be multiplied out.

$latex \boldsymbol{e’e} = (\mathbf{Y’ – \boldsymbol{\beta}’X’})(\mathbf{Y – X} \boldsymbol{\beta})&s=1$

Using the rule that A’X = X’A, you can multiple out the right side and simplify it.

$latex \boldsymbol{e’e} = (\mathbf{Y’Y – Y’X\boldsymbol{\beta} – \boldsymbol{\beta}’X’Y} + \boldsymbol{\beta’\mathbf{X’X}\beta})&s=1$
$latex \boldsymbol{e’e} = (\mathbf{Y’Y – \boldsymbol{\beta}’X’Y – \boldsymbol{\beta}’X’Y} + \boldsymbol{\beta’\mathbf{X’X}\beta})&s=1$
$latex \boldsymbol{e’e} = (\mathbf{Y’Y – 2\boldsymbol{\beta}’X’Y} + \boldsymbol{\beta’\mathbf{X’X}\beta})&s=1$

To minimize the SSE, you have to take the partial derivative relative to β. Any terms without a β term in them will go to zero. Using the transpose rule from before you can see how the middle term yields -2X’Y using differentiation rules from Calc1. The last term is a bit tricky, but it derives to +2X’Xβ.

$latex \frac{\delta\boldsymbol{e’e}}{\delta\boldsymbol{\beta}} = \frac{\delta\mathbf{Y’Y}}{\delta\boldsymbol{\beta}} – \frac{2\boldsymbol{\beta}’\mathbf{X’Y}}{\delta\boldsymbol{\beta}} + \frac{\delta\boldsymbol{\beta’\mathbf{X’X}\beta}}{\delta\boldsymbol{\beta}}&s=1$

$latex \frac{\delta\boldsymbol{e’e}}{\delta\boldsymbol{\beta}} = – 2\mathbf{X’Y} + 2\boldsymbol{\mathbf{X’X}\beta}&s=1$

To find the minimum (it will never be a maximum if you have all the requirements for OLS fulfilled), the derivative of the SSE is set to zero.

$latex 0 = – 2\mathbf{X’Y} + 2\mathbf{X’X}\boldsymbol{\beta}&s=1$

$latex 0 = \mathbf{- X’Y} + \mathbf{X’X}\boldsymbol{\beta}&s=1$

Using some basic linear algebra and multiplying both sides by the inverse of (X’X)…

$latex (\mathbf{X’X})^{-1}\mathbf{X’X}\boldsymbol{\beta} = (\mathbf{X’X})^{-1}\mathbf{X’Y}&s=1$

…yields the solution for β

$latex \boldsymbol{\beta} = (\mathbf{X’X})^{-1}\mathbf{X’Y}&s=1$

References:

The Mathematical Derivation of Least Squares. Psychology 8815. Retrieved from: http://isites.harvard.edu/fs/docs/icb.topic515975.files/OLSDerivation.pdf

Chatterjee, S & Hadi, A. (2012). Regression analysis by example. Hoboken, NJ: John Wiley & Sons, Inc.