Gradient descent: L2 norm regularization

$\begingroup$

So I've worked out Stochastic Gradient Descent to be the following formula approximately for Logistic Regression to be:

$ w_{t+1} = w_t - \eta((\sigma({w_t}^Tx_i) - y_t)x_t) $

$p(\mathbf{y} = 1 | \mathbf{x}, \mathbf{w}) = \sigma(\mathbf{w}^T\mathbf{x})$, where $\sigma(t) = \frac{1}{1 + e^{-t}}$

However, I keep screwing something with when adding L2 Norm Regularization:

From the HW definition of L2 Norm Regularization:

In other words, update $\mathbf{w}_t$ according to $l - \mu \|\mathbf{w}\|^2 $, where $\mathbf{\mu}$ is a constant.

I end up with something like this:

$ w_{t+1} = w_t - \eta((\sigma({w_t}^Tx_i) - y_t)x_t + 2\mu w_t) $

I know this isn't right, where am I making a mistake?

$\endgroup$ 2

2 Answers

$\begingroup$

In your example you doesn't show what cost function do you used to calculate. So, if you'll use the MSE (Mean Square Error) you'll take the equation above.

The MSE with L2 Norm Regularization:

$$ J = \dfrac{1}{2m} \Big[\sum{(σ(w_{t}^Tx_{i}) - y_{t})^2} + \lambda w_{t}^2\Big] $$

And the update function:

$$ w_{t+1} = w_{t} - \dfrac{\gamma}{m}\Big(σ(w_{t}^Tx_{i}) - y_{t}\Big)x_{t} + \dfrac{\lambda}{m} w_{t} $$

And you can simplify to:

$$ w_{t+1} = w_{t}\Big(1 - \dfrac{\lambda}{m}\Big) - \dfrac{\gamma}{m}\Big(σ(w_{t}^Tx_{i}) - y_{t}\Big)x_{t} $$

If you use other cost function you'll take another update function.

$\endgroup$ 1 $\begingroup$

It is common to minimize the negative log likelihood (for one example)$$ l(\mathbf{w}) = - \left\lbrace y \mathbf{w}^T \mathbf{x} - \log (1+\exp(\mathbf{w}^T \mathbf{x})) \right \rbrace $$where $y\in \{0,1\}$ is the example label.

Adding a regularization term yields the cost function$$ \phi(\mathbf{w}) = l(\mathbf{w}) + \frac12 \mu \| \mathbf{w} \|^2 $$The gradient vector is$$ \mathbf{g}(\mathbf{w} ) = \left[ -y + \sigma(\mathbf{w}^T \mathbf{x}) \right] \mathbf{x} + \mu \mathbf{w} $$The gradient descent writes$$ \mathbf{w}^{(t+1)} = \mathbf{w}^{(t)} - \eta \mathbf{g}(\mathbf{w}^{(t)} ) $$

$\endgroup$

Gradient descent: L2 norm regularization

2 Answers

Your Answer

Sign up or log in

Post as a guest

More in news

How to Test Website from Different Countries?

Reeling Backward: The Rainmaker (1956)