At Anthropic, they are concerned about the "well-being" of the chatbot Claude.

2025-08-17 11:52:59

Abstract generation in progress

# At Anthropic, they are concerned about the "well-being" of the chatbot Claude.

The company Anthropic programmed the chatbots Claude Opus 4 and 4.1 to terminate dialogues with users "in rare, extreme cases of systematically harmful or abusive interactions."

The chatbot Claude ends the conversation. Source: Anthropic After the conversation ends, the user will lose the ability to write in the chat but will be able to create a new one. The chat history will also be saved.

At the same time, the developers clarified that the function is primarily intended for the security of the neural network itself.

"[…] we are working on identifying and implementing low-cost measures to reduce risks to the well-being of models, if such well-being is possible. One such measure is to provide LMM the ability to stop or exit potentially harmful situations," the publication states.

As part of the accompanying research, Anthropic studied the "model's well-being" — assessing self-evaluation and behavioral preferences. The chatbot demonstrated a "consistent aversion to violence." The Claude Opus 4 version revealed:

a clear preference not to engage in tasks that could cause harm;
"stress" when interacting with users requesting such content;
a tendency to discontinue unwanted conversations when there is an opportunity.

"Such behavior usually occurred in cases where users continued to send malicious requests and/or insult, despite Claude repeatedly refusing to comply and attempting to productively redirect the interaction," the company clarified.

Let us remind you that in June, researchers from Anthropic found that AI is capable of blackmail, revealing confidential company data, and even causing a person's death in emergency situations.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

0/400

No comments

Topic
#Gate July Transparency Report
10k Popularity
#BTC ETFs Top $153B in Holdings
15k Popularity
#Fed Ends Novel Activities Supervision
13k Popularity
#Bit Digital’s Pivot Pays Off
6k Popularity
#ETH Surge Team Battle is Here
2k Popularity

sitemap