Skip-Gram is, in essence, AutoEncoder input that is not the same word but surrounding words

  • If you use a word 1-of-K vector as input, there is no way to do much dimensional compression with the same word as input.
  • What would happen if we were forced to do it? I wonder if it would be in the middle tier, chosen in order of frequency of occurrence.

This page is auto-translated from /nishio/Skip-Gram using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.