• Is attention mechanism a dictionary object?

  • Consider [internal volume caution

  • For vectors of higher dimensions, any two vectors have an inner product approximately equal to zero (Curse of the dimension).

  • Using the soft-argmax approximation, it is practically argmax

  • Even if a Key→Value function is difficult to acquire by learning, it can be created by memory

  • The actual dictionary object determines key matches, but here the inner product proximity

    • Almost every inner product is zero, so space would be divided up fuzzy like this.
    • image

Key-Value Memory Networks for Directly Reading Documents


This page is auto-translated from /nishio/注意機構は辞書オブジェクト using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.