Code agents, AI systems that can generate high-quality code, have revolutionized software development workflows. However, this progress also introduces critical safety and security risks. Static safety benchmarks and red-teaming methods often fall short when evaluating code agents, which may fail to detect emerging real-world risks. The University of Chicago, University of Illinois Urbana–Champaign, VirtueAI, the










