For many tasks, the result is indeterminate. Let’s take a look at the diagram.
Fig: Which do you prefer, A or B?
A and B are tasks. The horizontal axis is the benefit obtained by it, and the vertical axis is the probability. For example, adding a feature may increase the number of customers, this is the benefit of doing the task. However, we do not know how many customers we get before doing it. Task A may be very large in benefit, but it may be very small. The task is with a large variance. Task B is a task with a small variance. Which do you prefer, A or B?
You may choose B because it has a larger average. I introduce interesting phenomena.
Suppose there are two slot machines C and D here. By inserting a dollar and pull the lever, you get 5 dollars in a certain probability. Which slot machine would you choose? How do you decide it? When I asked some people, there was an answer: “I tried each machine several times and picked it with the best return.”
I conducted a simulation experiment that 600 agents do 1,000 decision makings in an environment with 3 slot machines. As a decision making, each agent chooses one option from for options of three slot machines and the “do not play” option.
Each agent first plays each slot machine three times. After that, based on their past experiences, they choose the option with the highest expectation. In the slot machines, I prepared a machine which returns 5 dollars with a probability of 0.3. The expected return of the machine is 1.5 dollars. So it is the optimal solution to keep playing with the slot machine. Of course, each agent does not know that such a slot machine exists, or which one is the slot machine. Can agents find the slot machine through their experience?
As a result of the experiment, a majority 57% of agents do not find the best slot machine. Most of them chose the “do not play” option. Why does such a thing happen? *1
Even if they win with a probability of 0.3, they lose all three trials with a probability of 0.7 * 0.7 * 0.7 = 0.343. In the unfortunate case, they misunderstand that the expected return of this slot is 0. Let’s call this pessimistic misunderstanding in the sense that it is a misunderstanding in a direction worse than reality.
By choosing the option with the highest expected value based on past experience, once they make a pessimistic misunderstanding, they never choose the option again. Without chances to find that it is misunderstandings, they finally choose the “do not play” option with no stochastic variance.
- (2.2.3.1) Exploration-exploitation tradeoff
- (2.2.3.2) Optimism in face of uncertainty
- (2.2.3.3) Risks and values and priorities
Footnote *1: Result of the agent simulation
--- This page is auto-translated from [/nishio/(2.2.3) We can not compare magnitude when there is an uncertain factor?](https://scrapbox.io/nishio/(2.2.3) We can not compare magnitude when there is an uncertain factor?) using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at [@nishio_en](https://twitter.com/nishio_en). I'm very happy to spread my thought to non-Japanese readers.