
A recent study by Backslash Security reveals that popular large language models (LLMs) frequently produce code containing security vulnerabilities when given simple, unguided prompts. The research found that naïve prompts resulted in code vulnerable to at least four of the top 10 Common Weakness Enumerations (CWEs), including command injection and cross-site scripting (XSS)1. This poses significant risks for developers relying on AI-generated code without proper safeguards.
Key Findings on LLM-Generated Code Vulnerabilities
The study evaluated three leading LLMs—GPT-4o, Claude 3.7-Sonnet, and Gemini—using both naïve prompts and security-aware prompts. GPT-4o performed worst, generating secure code only 10% of the time with naïve prompts, while Claude 3.7-Sonnet achieved 60% security under the same conditions2. When explicitly instructed to follow OWASP best practices, Claude improved to 100% secure code generation, demonstrating the critical importance of prompt engineering.
Common vulnerabilities in AI-generated code included CWE-78 (OS Command Injection), CWE-79 (XSS), and CWE-434 (Unrestricted File Upload). These vulnerabilities could lead to severe security breaches if deployed in production environments. The research also found that shorter, more focused prompts (under 50 words) yielded 40% better code quality than verbose prompts3.
Security Implications for Development Teams
The prevalence of vulnerable AI-generated code creates new challenges for security teams. Prior studies indicate that 40% of AI-generated code contains vulnerabilities4, and attack techniques like Hackode can intentionally induce vulnerabilities with an 84.29% success rate5. This highlights the need for robust validation processes when incorporating LLM outputs into production code.
Security professionals should implement several mitigation strategies:
- Use concise, security-focused prompts (e.g., “Generate Python code with input validation to prevent SQL injection”)
- Integrate automated vulnerability scanners for LLM-generated code
- Develop organization-specific guidelines for secure prompt engineering
Industry Response and Future Directions
The security community has begun addressing these challenges through tools like AI-powered code scanners and discussions around standardized guidelines for LLM-generated code. Social media discussions on platforms like LinkedIn emphasize the urgency of addressing these vulnerabilities6. Some organizations are developing internal prompt libraries with security best practices pre-configured.
As LLMs become more integrated into development workflows, security teams must adapt their processes to account for this new vector of potential vulnerabilities. This includes updating code review practices, implementing specialized scanning tools, and educating developers about the risks of unverified AI-generated code.
Conclusion
The research demonstrates that while LLMs can accelerate development, they introduce new security risks that must be managed. Organizations using AI-generated code should implement strict validation processes and security-aware prompt engineering practices. As the technology evolves, we can expect to see more specialized tools and standards emerge to address these challenges.
References
- “Popular LLMs found to produce vulnerable code by default,” Infosecurity Magazine, [Online]. Available: https://www.infosecurity-magazine.com/news/llms-vulnerable-code-default/
- “The biggest LLMs are generating vulnerable code by default,” digit.fyi, [Online]. Available: https://www.digit.fyi/the-biggest-llms-are-generating-vulnerable-code-by-default/
- “Popular LLMs produce insecure code by default,” BetaNews, [Online]. Available: https://betanews.com/2025/04/24/popular-llms-produce-insecure-code-by-default/
- “Every 1 of 3 AI-generated code is vulnerable,” SocRadar, [Online]. Available: https://socradar.io/every-1-of-3-ai-generated-code-is-vulnerable-exploring-insights-with-cyberseceval/
- “Inducing Vulnerabilities in LLM-Generated Code,” arXiv, [Online]. Available: https://arxiv.org/pdf/2504.15867
- “Popular LLMs found to produce vulnerable code,” The Cyber Security Hub, [Online]. Available: https://www.linkedin.com/posts/the-cyber-security-hub_popular-llms-found-to-produce-vulnerable-activity-7321466265112375296-DT_N