nishio First of all, āeveryone puts GPT-generated text on the net, and future LLMs will think it is human input and learn it,ā and āthatās why performance will degrade. As for the claim, the premise is not valid because it can be identified as LLM-generated text and filtered. Watermarking of LLM-generated text
-
cheedah7427 I think the watermarking is just a matter of saying that it can be countered to some extent if you want to identify it, not that it is 100% identifiable. And I think that eventually contamination will happen, because not all LLMs have that countermeasure, and we cannot assume a situation where we know that a certain sentence in reality was generated from an LLM with certain conditions.
-
cheedah7427 Here the entire sentence generation system is referred to as LLM for simplicity
-
nishio We were talking about the context of ābecause everyone writes ChatGPT-generated sentences on the Internet,ā and in this case OpenAI can identify that the sentences are ChatGPT-generated. We were talking about the fact that in this case OpenAI can identify that it is a ChatGPT-generated text and exclude it from the training data. Sure, the wide variety of LLMs in the open camp might crush each other because they canāt identify each other. Maybe we need standardization?
-
cheedah7427 I didnāt follow the context, my apologies. It would be best if it could be standardized, but I feel that in reality it would be difficultā¦ I feel that token generation is too intertwined with so many different factors!
-
nishio No, no, I appreciate your point of view as it has broadened my perspective!
-
Diary 2023-11-26 ā Diary 2023-11-27 ā Diary 2023-11-28 100 days ago Diary 2023-08-19. 1 year ago Diary 2022-11-27.
This page is auto-translated from /nishio/ę„čØ2023-11-27 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. Iām very happy to spread my thought to non-Japanese readers.