from 2023-02-14 Organize the top page leads An example of how optimizing for short-term rewards can weaken
-
@tsukammo: I’m having trouble explaining why Life Optimization doesn’t work, game tree search.
-
-
@tsukammo: This is what happens with an evaluation function based on direct rewards alone, so a common ” lifehacks” are optimizing the evaluation function with “curiosity” or “prepare a reward by chopping in small steps”.
-
Yeah, I know all that. I just don’t.
-
This page is auto-translated from /nishio/短期的報酬に最適化すると弱くなる例 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.