- What makes an embedded vector good or bad?
For two words that are close together
- Identical: B to A (Python and Pyth0n) → Replace B in original data with A
- Synonyms: A and B mean the same thing (please see below) → Keep the original data, but unify the mapping between words and IDs.
- synonym: quasi-synonym (word that has a similar meaning to another, but is not interchangeable) → this is OK as is
- Antonyms: A and B are antonyms → add 1 axis for antonyms to the vector and swing +1/-1 as appropriate
- Concatenation: A and B are a single meaning in the form “AB” → need to add vocabulary, be creative when reading input
Teaching conjunction increases vocabulary. Teaching the same decreases it. This teacher data itself can be reused.
I need to make some modifications to the learning process, I want to use vectors around, and in the end I need to create my own word2vec-like system?
This page is auto-translated from /nishio/埋め込みベクトルの良し悪し using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.