nishio When data science in the corporate world gained attention, there were companies that boasted of piles of garbage data with no metrics or assumptions, saying, “We have a lot of data. In the same way, there will be companies that say, “We have a large amount of chat logs, so let’s use them in LLM! There will be companies that say, “We have a lot of chat logs, so we can use them in LLM!

nishio Just as “a list of numbers with no idea what they measure under what circumstances” is useless for data analysis, “a list of words with no idea what they talk about under what circumstances” is also useless. In the same way, “a list of numbers with no idea of what was measured and under what circumstances” is useless for data analysis. Just as “a list of numbers without knowing what was measured and under what circumstances” is useless for data analysis, “a list of words without knowing what was said and under what circumstances” is useless for LLM.

nishio LLMs cannot understand context unless contextual information is stored in a form that LLMs can understand. If the conversation and context (materials and data) are tied together and stored in a machine-readable form, we can still make it for LLMs.

nishio Companies that had thought carefully about what form the information must be stored in to be used in the future, even before the advent of LLM, still have a good chance. I think


This page is auto-translated from /nishio/LLMが理解できる形で文脈情報を保存 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.