AI in Vulnerability Management: A Trust but Verify Approach

The integration of artificial intelligence into cybersecurity workflows promises unprecedented speed in vulnerability detection and code generation. A recent investigation by Intruder’s security team tested whether AI can accelerate the creation of vulnerability checks without compromising quality¹. Their findings, along with broader industry research, confirm that AI serves as a powerful force multiplier, but its effectiveness is entirely dependent on a “trust, but verify” paradigm where human expertise remains non-negotiable for oversight and validation.

For security teams, the allure is clear: automating the tedious process of writing checks for tools like Nuclei could drastically reduce the window of exposure for new threats. However, initial attempts using basic large language model (LLM) chatbots resulted in significant failures. Prompts to models like ChatGPT and Claude produced outputs with hallucinations, inventing non-existent Nuclei functions and delivering poorly formatted code¹. The breakthrough came from adopting an agentic AI approach, using an AI agent equipped with tools like curl and grep and access to a curated index of existing Nuclei templates. This method, which mimics a human workflow of researching targets, generating templates, and testing them, yielded higher-quality, consistent outputs.

The Agentic AI Workflow in Practice

Intruder’s refined process positions the AI as a productivity tool within a human-in-the-loop framework. Security engineers provide critical inputs such as target endpoints, the type of matchers to use, and the specific data to extract. The AI agent then follows a set of rules to autonomously research the target application, generate the Nuclei template, and validate it against both vulnerable and non-vulnerable hosts¹. This approach has proven successful in scenarios like rapidly creating checks for exposed administrative panels not covered by other scanners and developing sophisticated multi-request templates for unsecured Elasticsearch instances that confirm unauthenticated data access. Despite these successes, challenges persist. The AI still occasionally hallucinates, requiring manual follow-up prompts to add necessary components like a favicon matcher to reduce false positives, and sometimes uses inefficient methods, such as shell loops instead of Nuclei’s native host-list flag.

Pervasive Security Risks in AI-Generated Code

The security implications of AI-assisted development extend far beyond vulnerability checks. A large-scale study by Veracode found that 45% of all AI-generated code samples failed security tests, introducing OWASP Top 10 vulnerabilities. The risk was particularly pronounced for Java, which had a 72% security failure rate⁴. Alarmingly, the study found no correlation between a model’s release date, size, or training sophistication and its ability to produce secure code; while syntax improved, security posture remained flat. This debunks the myth that newer models are inherently more secure.

Common vulnerability patterns in AI-generated code are systemic. A critical finding was that 86% of code samples designed to prevent Cross-Site Scripting failed to do so correctly⁴. Other frequent issues include the suggestion of outdated dependencies with known security flaws and the hardcoding of secrets like API keys directly into the codebase⁷. Perhaps more concerning is the shift in error types. Research from Apiiro indicates that while AI reduces trivial syntax errors by 76%, it causes a massive spike in deep, architectural flaws. Their data shows privilege escalation paths jumped 322% and architectural design flaws spiked 153% in AI-assisted projects⁸. This represents a fundamental change from simple typos to complex “timebombs” embedded in an application’s core design.

Implementing the “Trust, But Verify” Model

The core philosophy for safely leveraging AI, as outlined by Addy Osmani, is to treat it as a copilot, not an autopilot. The AI is a fast but error-prone junior developer that lacks context and accountability⁶. The principle is to trust the AI to generate code, but verify its output with the same, or greater, rigor as human code. This mindset leverages the skeptical instincts of experienced engineers as a key asset. The central challenge, as framed by technologist Balaji Srinivasan, is that “AI prompting scales, because prompting is just typing. But AI verifying doesn’t scale, because verifying AI output involves much more than just typing.”⁶ For anything subtle, deep understanding is required to correct the AI, making verification the rate-limiting step in the workflow.

Concrete verification processes are essential for any organization harnessing AI. Peer review is non-negotiable; a human must review AI contributions with a fine-toothed comb before code is deployed⁶. Engaging in a real-time dialogue with the AI, similar to pair programming, allows for reviewing each code chunk and asking the AI to explain its reasoning, which surfaces flaws early. Automated testing serves as a critical safety net, whether through a test-first approach or by prompting the AI to generate tests for its own code, which must then be manually reviewed. Furthermore, all AI-generated code must be scanned with Static Application Security Testing (SAST) and Software Composition Analysis (SCA) tools to catch vulnerabilities and outdated dependencies⁷.

The Scaling Problem and Future Outlook

The core challenge of AI-assisted development is the scaling asymmetry between code generation and code verification. Apiiro’s research highlights a productivity paradox: AI-assisted developers produce 3-4x more commits, but these are bundled into fewer, much larger Pull Requests⁸. This concentrates change, overloading the code review process and increasing the potential blast radius of any single error. The result can be a marginal net gain if the time required for verification equals or exceeds the time saved by AI generation.

The industry is actively seeking solutions to this bottleneck. Improved prompting with better context and constraints can reduce errors upfront. Using one AI model to critique or verify another’s output is an area of active research, though it is not yet a reliable solution⁶. Platforms are emerging that aim to provide real-time governance, using deep code analysis to automatically identify and manage risks as they are generated⁸. The future of AI in cybersecurity is not full automation but a partnership where AI accelerates the creation of a first draft and humans provide the essential polish, strategic risk analysis, and final accountability.

In conclusion, AI has firmly established itself as a transformative tool in vulnerability management and secure coding, acting as a significant speed multiplier. However, its value is entirely contingent on robust human oversight. The “trust, but verify” model is not just a best practice but a necessity for managing the inherent risks of AI-generated content, from simple bugs to critical architectural flaws. As AI handles more routine tasks, the role of human expertise becomes more, not less, critical, shifting the focus to complex logic, verification, and maintaining the judgment required to oversee these powerful systems safely.