M HYPE SPLASH
// updates

What's the "trick" in log derivative trick?

By John Campbell
$\begingroup$

The following is often referred to as the "log derivative trick".

$$\frac{\nabla_\theta p(X,\theta)}{p(X, \theta)} = \nabla_\theta \log p(X,\theta)$$

For example here, here, and several other places (usually in reference to reinforcement learning)

Is it not just calculus? $\frac{\partial}{\partial x} \log f(x) = \frac{f'(x)}{f(x)}$ Is there anything else going on here?

$\endgroup$ 3

2 Answers

$\begingroup$

It's a "trick", when you use it to calculate $\nabla_\theta p(X,\theta)$ via the (hopefully, sometimes) easier expression $\log p(X,\theta)$. So the use is to write it as $$ \nabla_\theta p(X,\theta)=p(X,\theta)\,\nabla_\theta\log p(X,\theta), $$ in cases where the right-hand-side is easier than the left-hand-side. Typically, when $p$ has lots of products and exponents.

$\endgroup$ 5 $\begingroup$

Your are absolutely right, this is "just" calculus. But the real question here is, in what context is this trick used?

If you have an expectation value of the form

$\int d x ~p(x, \theta) f(x)$

with a parametrized probability distribution $p(x, \theta)$. It often happens that you want to calculate the derivative of this expectation value with respect to $\theta$, e.g. to maximize or minimize the expectation value.

The derivative takes the form

$\int d x ~\nabla_{\theta} p(x, \theta) f(x)$.

In practice it can be very difficult to calculate such an integral analytically, so you could estimate it via Monte Carlo sampling. But to do this, you need to bring it to the form:

$\int dx ~p(x) F(x) \approx \frac{1}{n_{MC}} \sum_{x_i} F(x_i)$,

where $x_i$ is sampled from p(x). If we now extend the integral from before by the factor $p(x, \theta)/p(x, \theta)$ and apply the log trick we get.

$\int d x ~p(x, \theta)~\nabla_{\theta} \log (p(x, \theta)) f(x)$

Now with $F(x) = \nabla_{\theta} \log (p(x, \theta)) f(x)$ we obtain exactly the Monte Carlo form from before and we can estimate the integral.

$\endgroup$

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy