At Anthropic, they are concerned about the "well-being" of the chatbot Claude.

robot
Abstract generation in progress

AI startup Anthropic AI# At Anthropic, they are concerned about the "well-being" of the chatbot Claude.

The company Anthropic programmed the chatbots Claude Opus 4 and 4.1 to terminate dialogues with users "in rare, extreme cases of systematically harmful or abusive interactions."

The chatbot Claude ends the conversation. Source: Anthropic After the conversation ends, the user will lose the ability to write in the chat but will be able to create a new one. The chat history will also be saved.

At the same time, the developers clarified that the function is primarily intended for the security of the neural network itself.

"[…] we are working on identifying and implementing low-cost measures to reduce risks to the well-being of models, if such well-being is possible. One such measure is to provide LMM the ability to stop or exit potentially harmful situations," the publication states.

As part of the accompanying research, Anthropic studied the "model's well-being" — assessing self-evaluation and behavioral preferences. The chatbot demonstrated a "consistent aversion to violence." The Claude Opus 4 version revealed:

  • a clear preference not to engage in tasks that could cause harm;
  • "stress" when interacting with users requesting such content;
  • a tendency to discontinue unwanted conversations when there is an opportunity.

"Such behavior usually occurred in cases where users continued to send malicious requests and/or insult, despite Claude repeatedly refusing to comply and attempting to productively redirect the interaction," the company clarified.

Let us remind you that in June, researchers from Anthropic found that AI is capable of blackmail, revealing confidential company data, and even causing a person's death in emergency situations.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)