This problem is called âexploration-exploitation tradeoffâ in the field of reinforcement learning. You can not find better options if you choose only the option that looks the best from your experiences. It is a lack of exploration. (*1)
On the other hand, if you are looking for better options and only choosing inexperienced options, your experiences are not used. It is a lack of exploitation.
Since exploration and exploitation are in a trade-off relationship, it is necessary to execute both in a well-balanced manner, not on one side. So how can we make the well-balanced choices?
Footnote *1:
- The discussion went detail in the field of reinforcement learning.
- However, its origin is unclear. The cencept is used in wide domain.
- Box, G. E., 1954. The exploration and exploitation of response surfaces: some general considerations and examples. Biometrics, 10(1), pp.16-60.
- March, J.G., 1991. Exploration and exploitation in organizational learning. Organization science, 2(1), pp.71-87.