nishio First of all, ā€œeveryone puts GPT-generated text on the net, and future LLMs will think it is human input and learn it,ā€ and ā€œthatā€™s why performance will degrade. As for the claim, the premise is not valid because it can be identified as LLM-generated text and filtered. Watermarking of LLM-generated text

  • cheedah7427 I think the watermarking is just a matter of saying that it can be countered to some extent if you want to identify it, not that it is 100% identifiable. And I think that eventually contamination will happen, because not all LLMs have that countermeasure, and we cannot assume a situation where we know that a certain sentence in reality was generated from an LLM with certain conditions.

  • cheedah7427 Here the entire sentence generation system is referred to as LLM for simplicity

  • nishio We were talking about the context of ā€œbecause everyone writes ChatGPT-generated sentences on the Internet,ā€ and in this case OpenAI can identify that the sentences are ChatGPT-generated. We were talking about the fact that in this case OpenAI can identify that it is a ChatGPT-generated text and exclude it from the training data. Sure, the wide variety of LLMs in the open camp might crush each other because they canā€™t identify each other. Maybe we need standardization?

  • cheedah7427 I didnā€™t follow the context, my apologies. It would be best if it could be standardized, but I feel that in reality it would be difficultā€¦ I feel that token generation is too intertwined with so many different factors!

  • nishio No, no, I appreciate your point of view as it has broadened my perspective!

  • Diary 2023-11-26 ā† Diary 2023-11-27 ā†’ Diary 2023-11-28 100 days ago Diary 2023-08-19. 1 year ago Diary 2022-11-27.


This page is auto-translated from /nishio/ę—„čؘ2023-11-27 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. Iā€™m very happy to spread my thought to non-Japanese readers.