This problem is called ’exploration-exploitation tradeoff’ in the field of reinforcement learning. You can not find better options if you choose only the option that looks the best from your experiences. It is a lack of exploration. (*1)

On the other hand, if you are looking for better options and only choosing inexperienced options, your experiences are not used. It is a lack of exploitation.

Since exploration and exploitation are in a trade-off relationship, it is necessary to execute both in a well-balanced manner, not on one side. So how can we make the well-balanced choices?


Footnote *1:

en.icon