2017-01-11
I had never heard of the optimistic initial value method, so I experimented with it and found that it took about 20,000 trials for the UCB1 and reward averages to reverse in my problem set-up. - reinforcement learning - In times of uncertainty, be optimistic.
https://www.slideshare.net/nishio/1-70974083 p.33
This page is auto-translated from /nishio/楽観的初期値法 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.