(2.2.3.1) Exploration-exploitation tradeoff

This problem is called ‘exploration-exploitation tradeoff’ in the field of reinforcement learning. You can not find better options if you choose only the option that looks the best from your experiences. It is a lack of exploration. (*1)

On the other hand, if you are looking for better options and only choosing inexperienced options, your experiences are not used. It is a lack of exploitation.

Since exploration and exploitation are in a trade-off relationship, it is necessary to execute both in a well-balanced manner, not on one side. So how can we make the well-balanced choices?

Footnote *1:

The discussion went detail in the field of reinforcement learning.
- https://en.wikipedia.org/wiki/Multi-armed_bandit
However, its origin is unclear. The cencept is used in wide domain.
- Box, G. E., 1954. The exploration and exploitation of response surfaces: some general considerations and examples. Biometrics, 10(1), pp.16-60.
- March, J.G., 1991. Exploration and exploitation in organizational learning. Organization science, 2(1), pp.71-87.

--- This page is auto-translated from [/nishio/(2.2.3.1) Exploration-exploitation tradeoff](https://scrapbox.io/nishio/(2.2.3.1) Exploration-exploitation tradeoff) using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at [@nishio_en](https://twitter.com/nishio_en). I'm very happy to spread my thought to non-Japanese readers.

🪴 Quartz 4.0

(2.2.3.1) Exploration-exploitation tradeoff

Graph View

Backlinks