- Scaled Dot-Product Attention
- with reduction internal volume caution.
- Scaling in the opposite direction to the soft-argmax approximation, scaling in the direction of decreasing before Softmax
- In short, the act of making soft caution softer.
This page is auto-translated from /nishio/縮小付き内積注意 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.