Why don't people do simulated annealing before gradient descent?

$\begingroup$

It seems obvious to me to first widely explore the optimization landscape (this is effectively what simulated annealing does) and get a sense of the problem structure. Only then, after finding which hill to climb, perform gradient descent. Why isn't this done more often?

$\endgroup$ 1

1 Answer

$\begingroup$

To give an example of deep learning, the number of parameters (in Millions) is so huge that simulated annealing may take longer than just doing a gradient descent from whatever (random) initial state your weights are currently in.

So, in case of deep learning it doesn't make (economic) sense to do simulated annealing.

$\endgroup$ 3

Why don't people do simulated annealing before gradient descent?

1 Answer

Your Answer

Sign up or log in

Post as a guest

More in general

Fortnite Screen Won't Resize PC

'Doctor Sleep' and the Problem of the Impossible Sequel