Below is a draft for a 10-minute commentary at an internal study session
reinforcement learning
- supervised learning
- Input and Teacher Data
- In Go, it’s called notation.
- Who’s going to make the teacher data?
- People.
- I can’t talk about ten cases, a hundred cases.
- AlphaGo
- 160,000 games
- 28.4 million boards
- 57.0%
- self competition
- How many times?
- state-value network
- Take data from the results of the self-match.
- Only one board is taken from each game.
- 30 million = 30 million games
This page is auto-translated from /nishio/強化学習 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.