from Suzuki, Daiji - Mathematics of Deep Learning Non-probabilistic gradient method takes exponential time to get out of the saddle point. The use of SGD was not for this purpose, but for the performance of the computer, and as a result, I found out later that I had unexpectedly chosen the “good way” of doing things.
This page is auto-translated from /nishio/非確率的勾配法は鞍点から出るのに指数時間かかる using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.