M HYPE SPLASH
// general

Why don't people do simulated annealing before gradient descent?

By Andrew Adams
$\begingroup$

It seems obvious to me to first widely explore the optimization landscape (this is effectively what simulated annealing does) and get a sense of the problem structure. Only then, after finding which hill to climb, perform gradient descent. Why isn't this done more often?

$\endgroup$ 1

1 Answer

$\begingroup$

To give an example of deep learning, the number of parameters (in Millions) is so huge that simulated annealing may take longer than just doing a gradient descent from whatever (random) initial state your weights are currently in.

So, in case of deep learning it doesn't make (economic) sense to do simulated annealing.

$\endgroup$ 3

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy