Linear Regression Solutions

There are primarily 2 solutions to evaluating linear regression

Analytical Closed Form Solution (OLS Method)

The closed-form solution for linear regression, also known as the Normal Equation, provides a direct mathematical formula to calculate the optimal parameters without iteration. This approach uses calculus to minimize the cost function by taking its derivative, setting it to zero, and solving for the weights.

The Formula

The closed-form solution is expressed as:

$W = (X^{T} X)^{- 1} X^{T} y$

Where:

$W$ is the weight vector (parameters to find)
$X$ is the design matrix containing input features (N samples × D features)
$y$ is the target vector containing output values
$X^{T}$ is the transpose of X

Derivation Steps

The derivation involves minimizing the least-squares loss function.

The Loss Function is:

L(w) = \sum_{i=1}^{n}(y_i – \hat{y}_i)^2

L(w) = \sum_{i=1}^{n}(y_i – \mathbf{x}_i^T\mathbf{w})^2

Where:

$y_{i}$ is the actual observed value
${\hat{y}}_{i}$ is the predicted value from the model
$w$ represents the model parameters (weights)
$n$ is the number of data points

In vector notation, it can be written as:

$L (w) = (X w - y)^{T} (X w - y) = ∣ ∣ X w - y ∣ ∣^{2}$

Closed form solution involves taking derivative of the above:

$L (W) = \frac{1}{2} (X W - y)^{T} (X W - y)$

Taking the derivative with respect to $W$ W and setting it to zero gives:

$\frac{\partial L}{\partial W} = X^{T} (X W - y) = 0$

Solving for $W$ yields:

$X^{T} X W = X^{T} y$

$W = (X^{T} X)^{- 1} X^{T} y$

Gradient Descent Method

To use gradient descent, we first need a way to measure how “wrong” our current model is. In linear regression, we typically use the Mean Squared Error (MSE):

J(m, b) = \frac{1}{n} \sum_{i=1}^{n} (y_i – (mx_i + b))^2

Where m is the slope and b is the intercept. Our goal is to find the values of m and b that minimize $J$

To find the “steepest direction,” we calculate the partial derivatives (the gradient) of the cost function with respect to each parameter.

Partial derivative of m: $\frac{\partial J}{\partial m} = \frac{2}{n} \sum_{i=1}^{n} -x_i(y_i – \hat{y}_i)$

Partial derivative of b: $\frac{\partial J}{\partial b} = \frac{2}{n} \sum_{i=1}^{n} -(y_i – \hat{y}_i)$

Once we have the gradient, we update our parameters using the Learning Rate ( $\alpha$ ). The learning rate determines how big of a “step” we take.

$b_{new} = b_{old} – \alpha \cdot \frac{\partial J}{\partial b}$

$m_{new} = m_{old} – \alpha \cdot \frac{\partial J}{\partial m}$

Posted

February 16, 2026

ML Algos

Tags:

calculus, gradient descent, Least squares, Linear Regression, ML, MSE, OLS

Linear Regression Solutions

Analytical Closed Form Solution (OLS Method)

The Formula

Derivation Steps

Gradient Descent Method

Comments

Leave a Reply Cancel reply