Researchers from Microsoft have introduced a new way to spot malicious modifications in AI models without knowing what to look for. This method targets large language models (LLMs) that are often shared or reused, which can hide secret triggers. These hidden backdoors, called “sleeper agents,” can remain inactive during normal testing but activate harmful behaviors










