reinforcement learning

List of documents from internal workshops

Below is a draft for a 10-minute commentary at an internal study session

supervised learning
Input and Teacher Data
In Go, it’s called notation.
Who’s going to make the teacher data?
People.
I can’t talk about ten cases, a hundred cases.
AlphaGo
- 160,000 games
- 28.4 million boards
- 57.0%
self competition
- How many times?
state-value network
- Take data from the results of the self-match.
- Only one board is taken from each game.
- 30 million = 30 million games

This page is auto-translated from /nishio/強化学習 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.

🪴 Quartz 4.0

reinforcement learning

Graph View

Backlinks