Diary 2024-01-14

The naive vector similarity method separates English and Japanese.

For me personally, this is a big problem, so I’m not very satisfied with a simple embedded vector similarity

But very few people use both English and Japanese, so you don’t have to worry about it.

But when I went to bed and woke up, I felt like, “Why don’t we machine translate every English sentence into Japanese and load it into a vector index?

Better than making it into a vector and then thinking “how do I paste together the spaces that are far apart”?
In Plurality Japanese translation, the language is now included in the vector index without separating the languages.
[/plurality-japanese/Plurality Vector Search](https://scrapbox.io/plurality-japanese/Plurality Vector Search)
I feel like I can’t provide user value with that.
- If it is Japanese who use it, the query is also usually in Japanese
- But when I search for idioms, I get Chinese hits.
If so, it would be better to load the “Japanese” side of “other language → Japanese” into the search index.
- Then, turn the arrow in the opposite direction and read the original text after the Japanese hits if you want to.

The association of two chunks by machine translation is a kind of “link”

Hits by vector search are loose links
Compare with explicit human links in Scrapbox
- Link meaning “same content, different language.”
- Scrapbox link to “Human Associations” link

What to record

Diary 2024-01-13 ← Diary 2024-01-14 → Diary 2024-01-15 100 days ago Diary 2023-10-06. 1 year ago Diary 2023-01-14.

This page is auto-translated from /nishio/日記2024-01-14 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.

🪴 Quartz 4.0

Diary 2024-01-14

Graph View

Backlinks