Why don't people do simulated annealing before gradient descent?
By Andrew Adams •
It seems obvious to me to first widely explore the optimization landscape (this is effectively what simulated annealing does) and get a sense of the problem structure. Only then, after finding which hill to climb, perform gradient descent. Why isn't this done more often?
$\endgroup$ 11 Answer
$\begingroup$To give an example of deep learning, the number of parameters (in Millions) is so huge that simulated annealing may take longer than just doing a gradient descent from whatever (random) initial state your weights are currently in.
So, in case of deep learning it doesn't make (economic) sense to do simulated annealing.
$\endgroup$ 3