The naive vector similarity method separates English and Japanese.
For me personally, this is a big problem, so I’m not very satisfied with a simple embedded vector similarity
- But very few people use both English and Japanese, so you don’t have to worry about it.
But when I went to bed and woke up, I felt like, “Why don’t we machine translate every English sentence into Japanese and load it into a vector index?
-
Better than making it into a vector and then thinking “how do I paste together the spaces that are far apart”?
-
In Plurality Japanese translation, the language is now included in the vector index without separating the languages.
-
[/plurality-japanese/Plurality Vector Search](https://scrapbox.io/plurality-japanese/Plurality Vector Search)
-
I feel like I can’t provide user value with that.
- If it is Japanese who use it, the query is also usually in Japanese
- But when I search for idioms, I get Chinese hits.
-
If so, it would be better to load the “Japanese” side of “other language → Japanese” into the search index.
- Then, turn the arrow in the opposite direction and read the original text after the Japanese hits if you want to.
The association of two chunks by machine translation is a kind of “link”
- Hits by vector search are loose links
- Compare with explicit human links in Scrapbox
- Link meaning “same content, different language.”
- Scrapbox link to “Human Associations” link
What to record
- Diary 2024-01-13 ← Diary 2024-01-14 → Diary 2024-01-15 100 days ago Diary 2023-10-06. 1 year ago Diary 2023-01-14.
This page is auto-translated from /nishio/日記2024-01-14 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.