Image-Scaling Attacks: A New Vector for AI Prompt Injection and Data Exfiltration

A new class of attack, exploiting the image preprocessing pipelines of multimodal AI systems, has been demonstrated by security researchers. This technique, termed an “image-scaling attack,” allows threat actors to embed malicious prompts within images that become legible only after being downscaled by an AI system, leading to potential data theft and system compromise¹. This development represents a significant escalation in the ongoing battle against prompt injection vulnerabilities, highlighting a systemic challenge for AI architectures that process both text and visual data.

The attack was detailed by researchers Kikimora Morozova and Suha Sabi Hussain from Trail of Bits in August 2025¹. It functions by crafting a high-resolution image containing pixel-level manipulations specifically engineered to exploit the mathematical properties of common interpolation algorithms, such as nearest-neighbor, bilinear, and bicubic. When the targeted AI system downscales this image to its required input dimensions, the aliasing effect causes these subtle manipulations to coalesce into clearly legible text or instructions. The attack leverages the Nyquist-Shannon sampling theorem, designing a high-frequency pattern that exceeds the Nyquist frequency of the target downscaling operation to guarantee the signal will be aliased, creating a lower-frequency, unintended pattern—the malicious prompt—in the output¹.

Technical Mechanism and Tooling

The core of this vulnerability lies in the fundamental architecture of Large Language Models (LLMs) and multimodal systems. These models lack a built-in mechanism to contextually differentiate between trusted developer instructions and untrusted user input, as both are formatted identically as strings of natural-language text². This flaw, first highlighted by Riley Goodside and formally defined by Simon Willison in 2022, makes AI systems susceptible to instructions that convincingly mimic or override their original programming. The image-scaling attack is a sophisticated form of indirect multimodal prompt injection, where the malicious payload is hidden in data the LLM consumes from an external source—in this case, a seemingly benign image³.

To automate the creation of these malicious images, researchers have released an open-source beta tool named *Anamorpher*¹. This tool generates adversarial images tailored to specific downscaling implementations found in common libraries like Pillow and PyTorch, lowering the barrier to entry for executing such attacks. The proof-of-concept demonstrated the exfiltration of Google Calendar data by uploading a malicious image to the Gemini CLI, which exploited a `trust=True` setting in a connected Zapier MCP server to execute unauthorized actions¹.

Affected Systems and Industry Response

The scope of this vulnerability is broad. Systems confirmed to be vulnerable include Google Gemini CLI, Vertex AI Studio, the Gemini web interface and API, Google Assistant on Android, and the Genspark agentic browser¹. Mobile and edge devices are considered particularly susceptible due to their aggressive image compression and downscaling policies, which are often applied to conserve bandwidth and processing power. The widespread use of these AI assistants and agents integrated into everyday applications significantly expands the potential attack surface.

Google’s response to the disclosure has been to controversially classify this as a non-vulnerability in default configurations⁴. The company’s stance, as reported by Thomas Claburn in The Register, is that the attack requires non-standard, unsafe settings, such as the auto-approval of tool calls. Instead of issuing patches for core systems, Google has focused on adding more explicit warnings for users who choose to disable these built-in safeguards. This response underscores a tension between usability and security in the rapidly evolving AI ecosystem.

The Broader Landscape of AI Vulnerabilities

This image-scaling technique is not an isolated threat but part of a rapidly expanding universe of AI vulnerabilities documented in research repositories. Hundreds of new exploits and defensive techniques are published monthly, indicating an intense arms race between attackers and defenders². Key areas of development include advanced jailbreaking techniques like HauntAttack and Multi-turn Jailbreaking via Global Refinement, which use iterative reasoning to override model safeguards. Data extraction methods continue to evolve, with new research demonstrating the feasibility of approximating training data from model weights and executing sophisticated membership inference attacks².

Agentic systems, which use LLMs to make decisions and perform actions, represent a particularly concerning attack frontier. Research has identified novel vulnerability classes like Logic layer Prompt Control Injection (LPCI), which targets the logic flow between components in an agent². Furthermore, studies have shown that even commercial agentic systems are susceptible to simple yet dangerous attacks, raising questions about their readiness for secure deployment². The release of benchmarks like RAS-Eval and AGENTSAFE aims to help the community test the safety and security of these LLM agents in more realistic environments.

Mitigation and Defense Strategies

For defenders, mitigating image-scaling attacks requires a multi-faceted approach. The primary recommendation is to implement input dimension restrictions, avoiding downscaling altogether by limiting upload dimensions to the native model input size². Providing processing transparency by showing users a preview of the post-processed image that will be delivered to the LLM can help identify unexpected content. Most critically, systems should require explicit user confirmation for all sensitive tool calls, especially when text is detected within images, creating a necessary human-in-the-loop checkpoint.

Beyond this specific vector, defending against the broader class of prompt injection attacks remains challenging. Strategies include input validation and filtering, though this often becomes an arms race against evolving malicious prompts. Techniques like prompt armoring and compression, such as those proposed in SecurityLingua, aim to harden system prompts against injection². Another approach involves using secondary AI models to monitor and classify inputs for malicious intent, though these monitoring systems can themselves become targets for adversarial attacks, necessitating robust and resilient design patterns from the ground up.

The discovery of image-scaling attacks serves as a critical reminder that AI security vulnerabilities are often systemic. As AI systems become more complex and integrated into critical workflows, a proactive and security-first design philosophy is essential. This involves continuous threat modeling, red teaming exercises focused on novel attack vectors, and the development of standardized defenses that can be integrated across the AI development lifecycle. The rapid pace of research indicates that this field will remain a primary battleground for security professionals for the foreseeable future.