• Suppose the label is embedded in a 200-dimensional space.
  • Suppose a human places part of that label on a 2-dimensional space (Electronic KJ method tool).
  • Is there a way to put this information to good use?
  • embedded vector extension

If you want to merge a 200-dimensional distributed representation with a 200-dimensional distributed representation created by some other method, I guess you could make it 400-dimensional and then drop it down to 200 dimensions with PCA, but what if you want to add something manually added by humans using the KJ method or something? Nah, I don’t want that axis to be rewritten, I think.

I wonder if this is a different kind of problem when using the KJ method, since not all rows have hand-added data to begin with.

Given that humans do not input the positions of all labels, the missing value process “infers a 2D vector from a 200-dimensional embedded vector and a few 2D vectors input by humans for those for which humans did not input a 2D vector”?

Sorting it out: missing value.

When the missing value process is able to infer a 2D vector from a 200-dimensional embedded vector and a small number of 2D vectors input by a human without error using linear regression, the additional two dimensions are reduced to the remaining dimensions without error using PCA. Therefore, the missing value process can be reduced without error by applying PCA.


This page is auto-translated from /nishio/埋め込みベクトルの拡張 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.