Researchers astonished by tool’s apparent success at revealing AI’s hidden motives
Anthropic trains AI to hide motives, but different “personas” betray their secrets.
Anthropic trains AI to hide motives, but different “personas” betray their secrets.