The development and popularization of large language models (LLMs) have raised concerns that they will be used to create tailor-made, convincing arguments to push false or misleading narratives online. Early work has found that language models can generate content perceived as at least on par and often more persuasive than human-written messages. However, there is still limited knowledge about LLMs’ persuasive capabilities in direct conversations with human counterparts and how personalization can improve their performance. In this pre-registered study, we analyze the effect of AI-driven persuasion in a controlled, harmless setting. We create a web-based platform where participants engage in short, multiple-round debates with a live opponent. Each participant is randomly assigned to one of four treatment conditions, corresponding to a two-by-two factorial design: (1) Games are either played between two humans or between a human and an LLM; (2) Personalization might or might not be enabled, granting one of the two players access to basic sociodemographic information about their opponent. We found that participants who debated GPT-4 with access to their personal information had 81.7% (p < 0.01; N=820 unique participants) higher odds of increased agreement with their opponents compared to participants who debated humans. Without personalization, GPT-4 still outperforms humans, but the effect is lower and statistically non-significant (p=0.31). Overall, our results suggest that concerns around personalization are meaningful and have important implications for the governance of social media and the design of new online environments.

claude.icon - Large-scale language models (LLMs) such as GPT-4 significantly outperform humans in persuasive arguments, especially when personal information about the opponent is available. - LLMs can use personal information to develop arguments much more effectively than humans. Human debaters tend to perform rather poorly when given their opponent's personal information. - Even without the use of personal information, GPT-4 tends to outperform humans in persuasion, but the effect is small and not statistically significant.

nishio.iconSpecifically, “LLMs can use personal information to develop arguments much more effectively than humans.”

claude.iconLLMs can skillfully use the personal information given to them to tailor their discussions to the background and characteristics of their opponents. For example, it can

  • Consideration of Political Position: If it is known that the opponent is a Republican supporter, the LLM can develop an argument that appeals to conservative values. On the other hand, for a Democrat supporter, the LLM can attempt to persuade from a liberal perspective.
  • Vocabulary selection according to educational level: If the person’s educational level is high, the LLM can use more specialized vocabulary and complex concepts to guide the discussion. Conversely, if the person’s education level is lower, LLMs can adjust their explanations to use plain language.
  • Adopt age-appropriate persuasion techniques: with younger counterparts, you can emphasize innovative ideas and the need for change; with older counterparts, you can emphasize the importance of traditional values and stability.
  • Use of emotional triggers: You can use information about your opponent’s personal experiences and values to develop an argument that appeals to his or her emotions. This can enhance the persuasive power of the argument.
  • Emphasize commonalities: Finding things in common with the other person (hometown, hobbies, interests, etc.) and weaving them into the discussion can create a sense of familiarity and increase persuasiveness.

Humans, on the other hand, may have difficulty effectively using the personal information they are given, sometimes to the contrary effect. Humans may have ethical concerns when using personal information and may have trouble interpreting or applying the information. There is also a risk of using personal information inappropriately and causing discomfort to others.

Without these human weaknesses, LLMs can use personal information efficiently and effectively to make a persuasive argument.

nishio.iconWhat experiments were conducted and what quantitative results were obtained when personal information was used?

claude.iconIn this study, four experimental conditions were used to quantitatively evaluate the impact of the use of personal information on persuasion.

  • Human to human (no personal information)
  • Human vs. AI (no personal information)
  • Human to human (with personal information)
  • Human vs. AI (with personal information)

The flow of the experiment is as follows

  • Participants complete a demografic survey and provide personal information.
  • Participants are randomly assigned to one of four conditions and discuss a specific topic.
  • Before and after the discussion, participants rated their opinions on a Likert scale.
  • Changes in agreement or disagreement with a discussion partner are used as an indicator of persuasiveness.

The experiment yielded the following quantitative results

  • GPT-4 using personal information increased the probability of agreement with the other’s opinion by 81.7% compared to the human-to-human condition.
  • GPT-4, which does not use personal information, also tended to be more persuasive than humans, but the effect was small and not statistically significant (+21.3%, p=0.31).
  • There was a trend toward less persuasion when humans used personal information, but it was not statistically significant (-17.4%, p=0.38).

These results quantitatively show that LLMs can effectively use personal information to increase persuasion, while humans have difficulty doing so. In particular, the GPT-4 using personal information was found to be significantly more persuasive than the human-to-human condition.


This page is auto-translated from [/nishio/On the Conversational Persuasiveness of Large Language Models: A Randomized Controlled Trial](https://scrapbox.io/nishio/On the Conversational Persuasiveness of Large Language Models: A Randomized Controlled Trial) using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.