Below is a draft for a 10-minute commentary at an internal study session

reinforcement learning

  • supervised learning
  • Input and Teacher Data
  • In Go, it’s called notation.
  • Who’s going to make the teacher data?
  • People.
  • I can’t talk about ten cases, a hundred cases.
  • AlphaGo
    • 160,000 games
    • 28.4 million boards
    • 57.0%
  • self competition
    • How many times?
  • state-value network
    • Take data from the results of the self-match.
    • Only one board is taken from each game.
    • 30 million = 30 million games

This page is auto-translated from /nishio/強化学習 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.