Linear Regression Solutions

There are primarily 2 solutions to evaluating linear regression

Analytical Closed Form Solution (OLS Method)

The closed-form solution for linear regression, also known as the Normal Equation, provides a direct mathematical formula to calculate the optimal parameters without iteration. This approach uses calculus to minimize the cost function by taking its derivative, setting it to zero, and solving for the weights.

The Formula

The closed-form solution is expressed as:

W=(XTX)1XTy

Where:

  • W is the weight vector (parameters to find)
  • X is the design matrix containing input features (N samples × D features)
  • y is the target vector containing output values
  • XT is the transpose of X
Derivation Steps

The derivation involves minimizing the least-squares loss function.

​The Loss Function is:

L(w)=i=1n(yiy^i)2L(w) = \sum_{i=1}^{n}(y_i – \hat{y}_i)^2

or

L(w)=i=1n(yi𝐱iT𝐰)2L(w) = \sum_{i=1}^{n}(y_i – \mathbf{x}_i^T\mathbf{w})^2

Where:

  • yi is the actual observed value
  • y^i is the predicted value from the model
  • w represents the model parameters (weights)
  • n is the number of data points

In vector notation, it can be written as:

L(w)=(Xwy)T(Xwy)=Xwy2

Closed form solution involves taking derivative of the above:

L(W)=12(XWy)T(XWy)

Taking the derivative with respect to WW and setting it to zero gives:

LW=XT(XWy)=0

Solving for W yields:

XTXW=XTy

W=(XTX)1XTy

Gradient Descent Method

To use gradient descent, we first need a way to measure how “wrong” our current model is. In linear regression, we typically use the Mean Squared Error (MSE):

J(m,b)=1ni=1n(yi(mxi+b))2J(m, b) = \frac{1}{n} \sum_{i=1}^{n} (y_i – (mx_i + b))^2

Where m is the slope and b is the intercept. Our goal is to find the values of m and b that minimize JJ

To find the “steepest direction,” we calculate the partial derivatives (the gradient) of the cost function with respect to each parameter.

Partial derivative of m: Jm=2ni=1nxi(yiy^i)\frac{\partial J}{\partial m} = \frac{2}{n} \sum_{i=1}^{n} -x_i(y_i – \hat{y}_i)

Partial derivative of b: Jb=2ni=1n(yiy^i)\frac{\partial J}{\partial b} = \frac{2}{n} \sum_{i=1}^{n} -(y_i – \hat{y}_i)

Once we have the gradient, we update our parameters using the Learning Rate (α\alpha). The learning rate determines how big of a “step” we take.

bnew=boldαJbb_{new} = b_{old} – \alpha \cdot \frac{\partial J}{\partial b}

mnew=moldαJmm_{new} = m_{old} – \alpha \cdot \frac{\partial J}{\partial m}


Posted

in

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *