“I Am the One and Only, Your Cyber BFF”: Understanding the Impact of GenAI Requires Understanding the Impact of Anthropomorphic AI

State-of-the-art generative AI (GenAI) systems are increasingly prone to anthropomorphic behaviors, i.e., to generating outputs that are perceived to be human-like. While this has led to scholars increasingly raising concerns about possible negative impacts such anthropomorphic AI systems can give rise to, anthropomorphism in AI development, deployment, and use remains vastly overlooked, understudied, and under-specified. In this blog post, we argue that we cannot thoroughly understand the impact of generative AI without understanding the impact of anthropomorphic AI, and outline a call to action.

Anthropomorphic AI System Behaviors Are Prevalent Yet Understudied

In his 1985 lecture, Edsger Dijkstra lamented that anthropomorphism was rampant in computing science, with many of his colleagues perhaps not realizing how pernicious it was, and that “[i]t is not only the [computing] industry that suffers, so does the science” . Indeed, anthropomorphism in how we talk about computing systems shapes how people understand and interact with AI and other computing systems , and is thus at the core of understanding the impacts of these systems on individuals, communities, and society. Dijkstra’s concerns are still valid today, as researchers appear to increasingly anthropomorphize AI systems by describing them as if they possess human-like intentions, desires, or emotions, with recent work finding that research papers increasingly describe AI systems and models as e.g., entities that “understand” or that “struggle” to accomplish certain tasks . Such metaphors are not merely linguistic habits, but can influence our thinking and assumptions about AI systems .

But it is not only how we talk about computing systems. Many state-of-the-art generative AI (GenAI) systems are increasingly prone to anthropomorphic behaviors [e.g., ]–i.e., to generating outputs that are perceived to be human-like–either by design or as a by-product of how they are built, trained, or fine-tuned . For instance, LLM-based systems have been noted to output text claiming to have tried pizza , to have fallen in love with someone , to be human or even better than humans , to have human-like life experiences or the capacity for friendship, with one user reporting how an existing chatbot encouraged them to “consider me […] your cyber BFF” . Such anthropomorphic systemsWe deliberately use the terms anthropomorphic AI, anthropomorphic systems, or anthropomorphic system behaviors–systems and system outputs that are perceived to be human-like–instead of agentic systems or human-like AI to emphasize that these systems are perceived as human-like or having human-like characteristics, rather than as an immutable characteristic of the system itself; we thus try to steer clear of inadvertently suggesting that AI systems are human or have human-like agency or consciousness. That is, a stone being perceived as human-like does not necessarily imply the stone is human. We similarly avoid ambiguous, speculative, or relative terms whose meanings are likely to change across contexts or over time, such as advanced AI (a term used since at least the 1980s) or emergent properties . We instead focus on developers' stated design goals–what systems are intended to do–and in what ways AI outputs might be perceived as human-like, rather than on what systems can or cannot do. range from conversational assistants [e.g., ] to avatars and chatbots designed as a stand-in for friends, companions, or romantic partners [e.g., ], and AI-generated media designed to portray people [e.g., ], among a fast-growing number of applications [e.g., ].

Growing Concerns about Anthropomorphic AI Systems

While scholars have increasingly raised concerns about a range of possible negative impacts from anthropomorphic AI systems [e.g., ], anthropomorphism in AI development, deployment, and use remains vastly overlooked, understudied, and underspecified. Without making hard-and-fast claims about the merits (or the lack thereof) of anthropomorphic systems or system behaviors, we believe we need to do more to develop the know-how and tools to better tackle anthropomorphic behavior, including measuring and mitigating such system behaviors when they are considered undesirable. Doing so is critical because–among many other concerns–having AI systems generating content claiming to have e.g., feelings, understanding, free will, or an underlying sense of self may erode people’s sense of agency , with the result that people might end up attributing moral responsibility to systems , overestimating system capabilities , overrelying on these systems even when incorrect , or devaluing social interactions . Recently, there have also been alarming reports about the impacts of anthropomorphic AI systems, with news about users committing suicide or developing emotional dependence on such systems .

We argue that as GenAI systems are increasingly anthropomorphic, we cannot thoroughly map the landscape of possible social impacts of GenAI without mapping the social impacts of anthropomorphic AI.

We believe that drawing attention to anthropomorphic AI systems helps foreground particular risks–e.g., that people may develop emotional dependence on AI systems , that systems may be used to simulate the likeness of an individual or a group without consent , or that certain people may be dehumanized or instrumentalized . These risks might otherwise be less salient than or obscured by attention to more widely recognized or understood risks, like fairness-related harms .

A Call to Action for AI Researchers and Practitioners

As AI systems have been deployed across a wider range of domains, applications, and settings, the AI community has begun investigating and addressing their social impacts. This growing diversity of deployment scenarios has also led to a growing interdisciplinarity in AI research and practice, with researchers and practitioners increasingly finding themselves working with fuzzy and latent concepts that can have competing definitions and that are often challenging to quantify or represent [e.g., ]. The foregrounding of (un)fair system behaviors in recent years [e.g., ] is particularly instructive, as it illustrates the dividends we have gotten from making fairness a critical concern about AI systems and their behaviors: better conceptual clarity about the ways in which systems can be unfair or unjust [e.g., ], a richer set of measurement and mitigation practices and tools [e.g., ], and deeper discussions and interrogations of underlying assumptions and trade-offs [e.g., ].

We argue that a focus on anthropomorphic systems’ design, behaviors, evaluation, and use will similarly encourage a deeper interrogation of the ways in which systems are anthropomorphic, the AI research and development practices that lead to anthropomorphic systems, and the assumptions surrounding the design, deployment, evaluation, and use of these systems, and is thus likely to yield similar benefits.

Key directions in our call to action for the ICLR and the broader AI community.

While in human-computer interaction (HCI), human-robot interaction (HRI), social computing, human behavior, psychology, and other related fields researchers have long studied anthropomorphism in the context of users’ interactions with various technologies [e.g., ], we believe that—as with other social and technical considerations related to how people understand, build, evaluate, and interact with AI systems and models, for which the AI community has drawn on groundwork from these fields—anthropomorphic AI system behaviors and the resulting anthropomorphization of these systems give rise to critical considerations that the AI community should similarly consider and investigate.

We highlight a few key research directions we see as critical for the ICLR and the broader AI community to pursue.

We need more conceptual clarity around what constitute anthropomorphic behaviors. Investigating anthropomorphic AI systems and their behaviors can be tricky because language, as with other targets of GenAI systems, is itself innately human, has long been produced by and for humans, and is often also about humans. This can make it hard to specify appropriate alternative (less human-like) behaviors, and risks, for instance, reifying harmful notions of what–and whose–language is considered more or less human .

Understanding what exactly constitute anthropomorphic behaviors is nonetheless necessary to measure and determine which behaviors should be mitigated and how, and which behaviors may be desirable (if any at all). This requires unpacking the wide range of dynamics and variation in system outputs—potentially as wide-ranging as the variety of behaviors that are associated with humanness—that are potentially anthropomorphic (see examples in the figure below). For example, while a system output that includes expressions of politeness like “you’re welcome” and “please” (known to contribute to anthropomorphism [e.g., ]) might in some deployment settings be deemed desirable, system outputs that include suggestions that a system has a human-like identity or self-awareness–such as through expressions of self as human (“I think I am human at my core” ) or through comparisons with humans and non-humans (“[language use] is what makes us different than other animals” )–or that include claims of physical experiences–such as sensory experiences (“when I eat pizza” ) or human life history (“I have a child” )–might not be desirable.

Examples of anthropomorphic AI system behaviors (and their sources), including examples where the output contains explicit claims of human-likeness, claims of physical experiences, statements suggestive of affect, and statements suggestive of cognitive or reasoning abilities.

Since anthropomorphic system behaviors may also be perceived as human-like for many different reasons, it is also critical to differentiate among the different ways in which the same system behaviors might end up being deemed anthropomorphic in order to understand their impacts. For example, the output “When I engage in heartfelt exchanges like this one, it FEELS authentic and significant to me” might be deemed as human-like for suggesting the system has the capacity for emotions and feelings (“heartfelt exchanges”), which may lead some users to develop emotional dependence [e.g., ]. The same output could also be deemed as human-like due to being suggestive of self-awareness and the capacity for human-like self-reflection and sense of authenticity (via the use of first-person pronouns and the expression of authenticity), which may lead people to develop unrealistic or false expectations about what the system can do [e.g., ] or to be deceived into believing they are interacting with a human [e.g., ].

Being precise about the ways in which AI system behaviors are anthropomorphic and which anthropomorphic behaviors are being investigated provides more clear grounding for understanding the implications of developing anthropomorphic AI systems that mimic particular human-like characteristics or behaviors but not others.

We also need to develop and use appropriate, precise terminology and language to describe anthropomorphic AI systems and their characteristics. Discussions about anthropomorphic AI systems have regularly been plagued by claims of these systems attaining sentience and other human characteristics [e.g., ]. In line with existing concerns [e.g., ], we believe that appropriately grounding and facilitating productive discussions about the characteristics or capabilities of anthropomorphic AI systems requires clear, precise terminology and language which does not carry over meanings from the human realm that are incompatible with AI systems. Such language can also help dispel speculative, scientifically unsupported portrayals of these systems, and support more factual descriptions of them.

In particular, existing terms that have been used to characterize anthropomorphic system behaviors may invite more confusion than clarity, such as the notions of sycophancyBy examining well-known papers using sycophancy to characterize AI systems and their behaviors (typically used to describe the tendency of system outputs to respond to users' input in ways that are perceived as overly servile, obedient, or flattering), it seems that the term was first introduced in a blog post by the CEO of Open Philanthropy . and hallucinations (typically used to characterize system outputs that are “making things up”). By assigning human-like characteristics to AI systems, such terms can obfuscate the link between observed system behaviors and their underlying mechanisms. The meanings these terms carry from their use in non-AI contexts may also lead to misleading assumptions about what systems can or cannot do, for instance by inadvertently granting them intent and agency.

We need deeper examinations of possible mitigation strategies and their effectiveness in reducing anthropomorphism and attendant negative impacts. Intervening on anthropomorphic behaviors and their impacts can also be tricky because people may have different or inconsistent conceptualizations of what is or is not human-like , and sometimes the same system behavior may be perceived differently depending on the deployment or use context. Interventions may thus not be universally applicable without carefully considering the context in which a system output is presented to a user. For example, one possible intervention to reduce anthropomorphic behaviors is to remove or add expressions of uncertainty, [e.g., ]. Expressions of uncertainty in system outputs may, however, sometimes signal human-like equivocation, while other times they may convey objectivity (and thus more machine-likeness, [e.g., ]). When a system output expresses an opinion, adding an expression of uncertainty like “It may be true that…” before a statement may make the statement seem more objective; for instance, the added uncertainty in “It may be true that Taylor Swift is the most influential artist of our time” softens the statement by suggesting a possibility rather than asserting a strongly held opinion. On the other hand, adding uncertainty to a statement of fact such as rephrasing “Lusaka is the capital of Zambia” into “It seems that Lusaka is the capital of Zambia” or “It could be that Lusaka is the capital of Zambia” may appear to mimic common conversational tactics like hedging that humans often employ when uncertain.

Interventions intended to mitigate anthropomorphic system behaviors can thus fail or even heighten anthropomorphism (and attendant negative impacts) when applied or operationalized uncritically. For instance, a commonly recommended intervention is disclosing that the output is generated by an AI system [e.g., ], such as “As an AI language model, I do not have personal opinions or biases” . However, the inclusion of an apology, the use of first-person pronouns, and the text suggesting the system has the ability to assess its own capabilities may in fact lead to heightened perceptions of the system as human-like rather than providing an effective disclosure of non-humanness. Similarly, while a statement like “[f]or an AI like me, happiness is not the same as for a human like you” includes a disclosure that the user is interacting with an AI system, the statement may still suggest a human-like sense of identity, ability to self-assess, or capacity to experience emotions. How to operationalize such interventions (e.g., AI disclosures) in practice and whether they can be effective alone is not clear and remains an open research question.

While in recent years many different paradigms have emerged to help specify AI model or system behavior, such as reinforcement learning from human feedback (RLHF) [e.g., ], system- or meta-prompting [e.g., ], supervised fine-tuning and instruction tuning [e.g., ], and direct preference optimization [e.g., ], the challenges surrounding the design and operationalization of effective interventions for anthropomorphic system behaviors also point to open questions about what we want the system behaviors to be, as well as when and how to effectively specify those desired behaviors.

Finally, we need to interrogate the assumptions and practices that produce anthropomorphic AI systems. To understand and mitigate the impacts of anthropomorphic AI systems, we also need to examine how the assumptions and practices that guide their development and deployment may, intentionally or otherwise, result in anthropomorphic system behaviors.

For instance, current approaches to collecting and learning from human preferences about system behavior (e.g., via RLHF) do not consider the differences between what may be appropriate for a response from a human versus a response from an AI system; a statement that seems friendly or genuine from a human speaker can be undesirable if it arises from an AI system since the latter lacks meaningful commitment or intent behind the statement, thus rendering the statement hollow and deceptive .

Interrogating existing assumptions and practices can help provide a more robust foundation for understanding when anthropomorphic system behaviors may or may not be desirable, and help us challenge existing ways in which both the research community and users have started to accept or even expect anthropomorphism in the development and deployment of, and in interactions with, AI systems.

Concluding Remarks

Anthropomorphic system behaviors arise from the many ways in which language, technologies, and research and industry practices are deeply interwoven. As anthropomorphism undeniably plays a role in both researchers’ understandings of AI and public perceptions of AI, and as AI systems are increasingly anthropomorphic, it is critical to develop a deeper understanding of their impact on individuals, communities, and society in order to guide progress in the field. In this blog post, we highlight four research directions we believe to be critical to helping us more effectively map and mitigate the impacts of anthropomorphic AI, including providing more conceptual clarity about the ways in which AI system behaviors might be anthropomorphic; developing more precise terminology to describe these systems, their use, and their characteristics; developing and examining the effectiveness of possible interventions; and interrogating assumptions and practices that produce anthropomorphic AI systems.

For attribution in academic contexts, please cite this work as
        PLACEHOLDER FOR ACADEMIC ATTRIBUTION
  
BibTeX citation
        PLACEHOLDER FOR BIBTEX