OpenAI’s research team has trained its GPT-5 large language model to “confess” when it doesn’t follow instructions, providing a second output after its main answer that reports when the model didn’t do as it was told, cut corners, hallucinated, or was uncertain of its answer. “If we can surface when that happens, we can better










