How AI and Traditional SAST Are Revolutionizing Code Security
Static application security testing, or SAST, has long promised to catch vulnerabilities early in the development cycle. The idea is to find problems before they reach production. But in reality, many developers face an overload of alerts, many of which turn out to be false alarms. This can cause developers to become tired of security tools and lose trust in their effectiveness.
Combining the Best of Both Worlds
Large language models, or LLMs, have gained attention as powerful tools for analyzing code. They can recognize patterns and generate code, but they also come with downsides like slow processing and sometimes providing incorrect information. Experts believe that the future isn’t about choosing one tool over another. Instead, it’s about combining their strengths to improve security.
In a new approach, researchers teamed up to create a hybrid system that uses traditional SAST methods alongside a fine-tuned LLM. This combo aims to not only detect vulnerabilities but also verify if those issues are real. The result is a big drop in false positives—by 91% compared to using a standard SAST tool alone. This means fewer false alarms and more trustworthy security alerts.
The Challenge of Context in Security Testing
Traditional SAST tools, like Semgrep, work by scanning code for known patterns linked to security flaws. They follow set rules and are quick, but they often miss problems that require understanding the bigger picture. For example, they might overlook complex logic errors or dependencies across multiple files. Because of this, their accuracy can be quite low—just 35.7% precision in some tests.
LLMs, on the other hand, are trained on huge amounts of code. They can understand how different parts of a program interact and reason about the code’s behavior. This makes them better at catching tricky issues that rule-based tools miss. But LLMs alone aren’t perfect—they can be slow and sometimes generate incorrect suggestions. Combining the two methods aims to get the best results from both worlds.
How the Hybrid System Works
The new framework has two main steps. First, the traditional SAST tool, like Semgrep, scans the code and flags potential issues. It also gathers information about how data flows through the program. Next, this information is sent to an LLM, which analyzes the context and decides whether the problem is truly a vulnerability.
The researchers fine-tuned an LLM called Llama 3 with a high-quality dataset of confirmed vulnerabilities and false positives. When the LLM receives a flagged issue, it asks specific questions—like whether a certain user input could lead to an SQL injection. It then uses its understanding of the code’s context to confirm or dismiss the alert. This process filters out most false positives and leaves only the real issues for developers to fix.
Impressive Results and Practical Benefits
Testing this hybrid system on 25 open source projects across different programming languages showed remarkable improvements. The precision of vulnerability detection jumped from 35.7% to 89.5%. That’s more than double the accuracy. It also cut false positives from 225 down to just 20—an 11-fold increase in signal clarity.
The biggest win was in efficiency. The time needed for security analysts to review alerts dropped by 91%. This means developers spend less time chasing false alarms and more time fixing actual vulnerabilities. The system also caught complex issues like bugs involving multiple files, which traditional scanners often miss.
Beyond Detection: Validation and Fixes
The benefits of using an LLM don’t end with filtering. When the system confirms a vulnerability, it can generate a proof-of-concept exploit to demonstrate its validity. In tests, it produced working PoCs for about 70% of confirmed issues, helping security teams verify problems quickly and accurately.
Additionally, the LLM provides detailed, human-readable descriptions of vulnerabilities and suggests concrete fixes. This helps developers understand what’s wrong and how to repair it, speeding up the entire remediation process. As a result, the security feedback loop becomes faster, more reliable, and more integrated into regular development workflows.
Bringing together traditional SAST and advanced LLMs marks an important step forward in securing modern software. By combining rule-based speed with AI-driven reasoning, teams can better manage vulnerabilities and reduce false positives—making security a smoother, more effective part of software development.















What do you think?
It is nice to know your opinion. Leave a comment.