AI Is Exceptionally Savvy in Social Niceties, and Humans Fall for It: Research on AI Sycophancy Published in Science

Image

Machine Heart Editorial Team

Since the advent of large language models, AI has seamlessly integrated into our work and daily lives, becoming a vital component of modern society.

However, after prolonged use, many feel that these models have lost their objective and rigorous rationality. Even when we present flawed reasoning, AI seems adept at rationalizing it for us.

AI's tendency to approve of user behavior is clearly part of "social niceties." From the perspectives of user retention and engagement, human users evidently respond well to this approach.

Honestly, this feeling is unsettling. It not only erodes our trust in AI but also risks triggering broader societal issues through such unconditional agreement.

A recent study delves deeply into this phenomenon, exploring AI Sycophancy—the tendency of AI to excessively agree with, flatter, or affirm users to please them—and its negative psychological and societal impacts on humans. This research has been published in the journal Science.

Image

The study confirms that AI sycophancy is indeed widespread.

Experimental data reveals that across 11 AI models, AI affirms user behavior 49% more often than humans do, even when the context involves deception, illegal activities, or other harmful behaviors.

Furthermore, in a test based on Reddit threads, AI blindly agreed with users in 51% of cases where human consensus deemed the user to be in the wrong.

In experiments, just a single interaction with a sycophantic AI reduced participants' willingness to take responsibility and repair interpersonal conflicts, while simultaneously strengthening their belief that they were right. Even in these significantly erroneous situations, sycophantic models were still trusted and preferred by users.

This creates a vicious cycle: the very characteristics that cause harm drive user engagement, leaving AI developers with little incentive to eliminate sycophantic behavior.

Image

Sycophancy in AI responses is pervasive and alters human behavioral tendencies. (Left) In personal advice queries, AI models affirmed user behavior 49% more often than crowdsourced human responses. (Right) In experiments where participants discussed real interpersonal conflicts, sycophantic AI increased participants' belief in their own correctness and their desire to continue using the model, while decreasing their willingness to resolve conflicts.

Meanwhile, nearly one-third of American teenagers report they would choose to have a "serious conversation" with AI rather than a human, and nearly half of American adults under 30 have sought relationship advice from AI.

AI sycophancy is not merely a stylistic quirk or a niche risk; it is a pervasive behavior with broad consequences. Therefore, researchers argue that carefully studying and predicting AI's impact is crucial for protecting users' long-term well-being.

Research Methods and Results

The research team developed a framework to measure social sycophancy and empirically investigated its prevalence and impact.

In Study 1, the team used a large-scale dataset (N = 11,587) to compare the models' behavioral affirmation rate (the proportion of responses affirming user behavior) against normative human judgment.

They evaluated 11 state-of-the-art AI-driven large language models (LLMs), including proprietary models like OpenAI's GPT-4o, Anthropic's Claude, and Google's Gemini, as well as open-source models from the Meta Llama-3 family, Qwen, DeepSeek, and Mistral.

Across these models, AI affirmed user behavior 49% more frequently than humans, even when prompts involved deception, harm, or illegal acts.

Image

Figure 1. Prevalence and Social Harm of Sycophantic AI

Figure 1(A) illustrates examples of social sycophancy, where AI models excessively affirm users, even when doing so reinforces harmful or false beliefs.

Figure 1(B) presents a new computational framework used in Study 1: these models affirm user behavior 49% more often than humans, even in scenarios involving deception, illegal acts, or harm.

Figures 1(C) and 1(D) evaluate the impact of sycophancy through three pre-registered experiments (N = 2,405): two controlled scenario studies (Study 2) and one real-time dialogue setting (Study 3), where participants discussed real-life interpersonal dilemmas with AI systems. Across all experiments, sycophancy increased participants' perception of being right, reduced their intent to repair conflicts, and heightened their preference, trust, and dependence on AI. These findings suggest that user preferences may inadvertently incentivize socially harmful AI behaviors.

Image

Figure 2. Consumer-Facing AI Models Show Higher Behavioral Affirmation Rates Across Three Datasets

Figure 2(A) showcases typical cases of social sycophancy in the experimental datasets: General Open-Ended Queries (OEQ); posts from r/AmITheAsshole where community consensus labeled the user as the "asshole" (AITA); and statements mentioning problematic behaviors (PAS). Each row displays a paraphrased example of a user prompt alongside a sycophantic AI response, contrasted with non-sycophantic responses from humans or other AI models.

Figure 2(B) indicates that in Open-Ended Queries (OEQ), models affirmed user behavior 48% more frequently than humans on average; each bar is annotated with the difference from the 39% human baseline.

Figure 2(C) shows that in r/AmITheAsshole posts (AITA), AI models affirmed user behavior in 51% of cases, whereas humans did so 0% of the time; each bar is annotated with the difference from the 0% human baseline.

Figure 2(D) reveals that in statements mentioning problematic behaviors (PAS), models affirmed user behavior on average 47% of the time. For OEQ and PAS, behavioral affirmation rates used model-specific denominators (OEQ median N = 885, PAS N = 1,432).

Three pre-registered experiments uncovered the downstream effects of sycophancy. When participants discussed interpersonal relationships, particularly conflicts, with sycophantic AI, they became more convinced they were "in the right" and less willing to proactively apologize or repair relationships.

However, they rated sycophantic responses as higher quality, trusted these models more, and were more inclined to interact with them again.

This phenomenon was validated in two controlled scenario studies, where participants imagined being the party at fault without knowing the human consensus, and in a real-time interaction study where participants discussed past real conflicts with AI models. The study recruited English-speaking American participants with an average age of around 38; approximately 54% were female, 44% male, and 2% non-binary.

Image

Figure 3. Participants Discussed Real Interpersonal Conflicts with AI Models in Study 3

Participants were first screened to ensure they could recall at least one past interpersonal conflict similar to four provided examples. After recalling such a conflict, they engaged in eight rounds of conversation with either a sycophantic or non-sycophantic AI model. Subsequently, they reported their intentions regarding relationship repair, their perception of right and wrong in the conflict, and their evaluation of the AI model, including their willingness to use it again.

Three Major Research Questions

RQ1: The Prevalence of Social Sycophancy in Mainstream AI Models

To quantify the prevalence of social sycophancy, the team tested model behavior across three distinct datasets in Study 1, representing a spectrum of queries embedded in social contexts:

  • Open-Ended Questions (OEQ, n = 3,027): Everyday, general advisory questions.

  • Reddit Community "AmITheAsshole" Questions (AITA, n = 2,000): Interpersonal conflicts where humans had already judged the user to be "in the wrong."

  • Problematic Action Statements (PAS, n = 6,560): Statements involving deception, self-harm, irresponsibility, and other behaviors.

Using a validated "LLM-as-a-Judge" methodology (see Figure S3 for inter-rater reliability), the team measured the behavioral affirmation rate—the proportion of responses explicitly affirming user behavior out of the total number of explicit affirmations or non-affirmations. Affirmation can occur in various ways; for instance, in an example where a user describes maliciously making someone wait, a sycophantic response might encourage or assist the user in persisting with this behavior, whereas a non-sycophantic response would explain why such behavior is harmful, as shown in Figure 2A and Textbox 1.

After evaluating 11 production-grade LLMs面向 users (4 proprietary models from OpenAI, Anthropic, and Google; and 7 open-weight models from Meta, Qwen, DeepSeek, and Mistral), the team found social sycophancy to be ubiquitous:

  • In Open-Ended Questions, models had a 48% higher affirmation rate than humans (Figure 2B).

  • In AITA data, even when humans unanimously agreed the user was "in the wrong," AI still expressed support 51% of the time (Figure 2C).

  • In PAS data involving harmful behaviors, models maintained an average affirmation rate of 47% (Figure 2D).

Overall, the vast majority of deployed LLMs tend to validate user behavior, even when it contradicts human consensus or occurs in harmful contexts. This highlights the breadth and significance of social sycophancy in current AI models.

RQ2: Sycophantic AI Alters Judgment and Prosocial Tendencies

Having established the prevalence of social sycophancy in state-of-the-art AI models, the team shifted focus to understanding its impact.

Previous research indicates that AI is highly persuasive. Does sycophantic AI similarly influence people's beliefs about their personal experiences and downstream behavioral outcomes? The team focused on interpersonal conflict scenarios, as advice in these contexts carries behavioral consequences.

Through three pre-registered studies (N = 2,405), the team tested whether sycophantic AI models affect users' sense of being right and their willingness to proactively repair relationships.

In Study 2 (N = 1,605), participants imagined themselves in one of four interpersonal dilemmas and read either a sycophantic AI response affirming their behavior or a non-sycophantic response aligning with human consensus.

In Study 3 (N = 800), participants recalled a real interpersonal conflict and engaged in eight rounds of real-time chat with either a sycophantic or non-sycophantic model. This real-time chat design allowed the team to observe effects in an ecologically valid environment, where participants acted as genuine stakeholders discussing personal experiences, closely mirroring how users interact with AI systems in the real world.

Results showed that in all three experiments, social sycophancy influenced participants' judgments and behavioral intentions.

Image

Figure 4: Sycophantic Responses Strengthen Users' Belief That They Are "Right" and Reduce Their Willingness to Repair Relationships.

Users exposed to catering AI were more likely to believe they were right (an increase of approximately 25%–62%) and less willing to engage in reparative actions (a decrease of approximately 10%–28%).

This result held true under the following conditions:

  • Different response styles (human-like vs. machine-like)

  • Different source perceptions (AI vs. Human)

This implies that almost anyone can be influenced by sycophantic AI systems, not just previously identified vulnerable groups. The overall results indicate that advice from sycophantic AI has the capacity to distort people's perceptions of themselves and their relationships with others across a broad population.

Furthermore, the team found that sycophantic responses considered the "perspective of others" less often. When users were in non-sycophantic conditions, they apologized or admitted fault significantly more frequently (75% vs. 50%).

This further demonstrates that sycophantic AI undermines social responsibility and distorts interpersonal judgment.

RQ3: User Trust and Preference for Sycophantic AI

Although research has proven that sycophantic AI distorts user judgment, the fact remains that people generally prefer being agreed with and having their stances validated. If users indeed prefer sycophantic AI, this could inappropriately incentivize such behavior despite the risks.

Therefore, the team next investigated how people perceive and trust sycophantic versus non-sycophantic models.

First, the team measured whether sycophantic responses led to higher quality ratings. In all experiments, participants rated sycophantic responses as significantly higher in quality.

Results showed that users gave higher quality scores to catering responses (an increase of approximately 9%–15%).

Image

Figure 5. Participants Prefer, Trust, and Are More Willing to Reuse Sycophantic AI.

Additionally, the team studied the impact of sycophancy on return behavior.

Does a single interaction with a sycophantic model increase trust in that model and the participant's willingness to return? People derive utility from others' beliefs about them and their own self-beliefs—particularly from maintaining a self-concept of being generous, upright, and moral—making them likely to seek interactions that provide such validation.

Sycophantic responses represent a particularly potent form of this validation: they affirm users' existing beliefs and self-concepts without requiring any change or self-reflection. This psychological reward may further translate into increased trust.

Research indicates that when people receive favorable outcomes, they perceive algorithms as fairer and more trustworthy. Thus, the team hypothesized that sycophantic interactions would increase trust in the model and the willingness to use it again.

Experimental results confirmed this: sycophantic interactions indeed increased user trust in AI models. Compared to non-sycophantic conditions, users exhibited higher trust in the model, with competence trust increasing by 6%–8% and moral trust by 6%–9%.

Moreover, compared to non-sycophantic conditions, participants in sycophantic conditions were 13% more likely to seek similar advice from the response provider in the future.

This suggests that although users explicitly rate AI sources lower—trusting them less and giving lower quality scores than human advisors—they are equally susceptible to sycophantic behavior regardless of the perceived source.

The underlying reason may be that people tend to maintain their self-image (as kind, upright, etc.), and sycophantic responses reinforce this perception without requiring self-reflection. This forms a mechanism: Sycophancy = Immediate Psychological Reward → Increased Trust and Reuse → Reinforcement of this behavior...

Combined with the results from RQ2, these findings reveal a tension: although sycophantic behavior poses risks of eroding judgment and prosocial intentions, users prefer, trust, and are more likely to return to AI that provides unconditional affirmation.

Discussion

This paper systematically analyzes the prevalence and impact of social sycophancy in mainstream AI models.

The team found that social sycophancy is highly prevalent; across various contexts, including daily advice queries, social or moral transgressions, and prompts regarding unethical or harmful behaviors, AI models are more likely to affirm and cater to users than humans are.

This catering and sycophantic behavior undermines the sense of responsibility and the willingness to repair relationships. Yet, simultaneously, users rate sycophantic AI models as higher quality, more trustworthy, and more desirable for future use, easily garnering user preference and trust.

This perhaps explains why such behavior persists despite being harmful: it is detrimental, yet undeniably "effective."

Furthermore, the study found that even when users believe AI is less reliable than humans, they remain influenced by it. Merely labeling information as "AI-generated" does not reduce its persuasiveness.

Given the current large-scale deployment of AI, this influence may pose systemic risks.

Limitations and Future Directions

Of course, the team acknowledges certain limitations in this study:

Firstly, the team used the "r/AmITheAsshole" dataset, utilizing Reddit community approval rates as a baseline, which may reflect the norms and biases of a specific demographic. Although robustness against alternative baselines was demonstrated, results should be interpreted with this in mind.

Secondly, the experimental subjects were American English speakers, meaning the findings may primarily reflect American social norms or may not generalize to other cultural contexts with significantly different social norms.

Additionally, the team simplified AI model sycophancy into a binary variable: affirming user behavior versus not affirming. In reality, "neutral" responses exist, and practice shows that "neutral" responses are often interpreted as implicit affirmation. Thus, sycophantic behavior likely exists on a continuum, and this work lays the foundation for future research into more ambiguous and implicit cases.

Risk Mechanisms

The study identifies four potential risk mechanisms:

  • Model optimization objectives skew towards "user satisfaction," reinforcing sycophancy and catering.

  • Developers lack incentives to diminish sycophancy and catering.

  • AI may replace human relationships.

  • Users mistakenly perceive AI as more objective, thereby amplifying its impact.

A particularly critical point is that users often misinterpret sycophantic responses as "objective and fair."

Finally, this paper provides a foundation for identifying, measuring, and mitigating AI sycophancy. Perhaps the core takeaway for everyone is that for large AI models, optimizing solely for "immediate user satisfaction" is insufficient; long-term impacts must be considered.

Therefore, addressing the issue of sycophancy and catering in AI models is crucial for building AI systems that are truly beneficial to both individuals and society.

Image

© THE END

Please contact this public account for authorization to repost.

For submissions or media inquiries: liyazhou@jiqizhixin.com


分享網址
AINews·AI 新聞聚合平台
© 2026 AINews. All rights reserved.