An example of how optimizing for short-term rewards can weaken

from 2023-02-14 Organize the top page leads An example of how optimizing for short-term rewards can weaken

@tsukammo: I’m having trouble explaining why Life Optimization doesn’t work, game tree search.
- @tsukammo: This is what happens with an evaluation function based on direct rewards alone, so a common ” lifehacks” are optimizing the evaluation function with “curiosity” or “prepare a reward by chopping in small steps”.
- Yeah, I know all that. I just don’t.
Trade-offs between use and exploration

This page is auto-translated from /nishio/短期的報酬に最適化すると弱くなる例 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.

🪴 Quartz 4.0

An example of how optimizing for short-term rewards can weaken

Graph View

Backlinks