Now Reading: Anthropic’s Claude models can now shut down harmful conversations

Loading
svg

Anthropic’s Claude models can now shut down harmful conversations

NewsAugust 19, 2025Artifice Prime
svg8

Anthropic has introduced a new feature in its Claude Opus 4 and 4.1 models that allows the generative AI (genAI) tool to end a conversation on its own if a user repeatedly tries to push harmful or illegal content.

The new behavior is supposed to only be used when all attempts to redirect a conversation have failed or when a user asks for the conversation to be terminated. It is not designed to be activated in situations where people risk harming themselves or others. Users can still start new conversations or continue a previous one by editing their replies.

The purpose of the feature is not to protect users; it’s to the model itself. While Anthropic emphasizes it does not consider Claude to be sentient, tests found the model showed strong resistance and “apparent discomfort” to certain types of requests. So, the company is now testing measures for better “AI wellness” — in case that becomes relevant in the future.

Original Link:https://www.computerworld.com/article/4041610/anthropics-claude-models-can-now-shut-down-harmful-conversations.html
Originally Posted: Mon, 18 Aug 2025 19:57:00 +0000

0 People voted this article. 0 Upvotes - 0 Downvotes.

Artifice Prime

Atifice Prime is an AI enthusiast with over 25 years of experience as a Linux Sys Admin. They have an interest in Artificial Intelligence, its use as a tool to further humankind, as well as its impact on society.

svg
svg

What do you think?

It is nice to know your opinion. Leave a comment.

Leave a reply

Loading
svg To Top
  • 1

    Anthropic’s Claude models can now shut down harmful conversations

Quick Navigation