Covert attacks on AI: how implicit cues break the security of generative models

10.12.2025 12 minutes Author: Cyber Witcher

The article explains how external texts, documents, and web content can contain commands invisible to humans that affect AI responses, and why this little-noticed vulnerability is gradually becoming one of the main challenges for the security of generative technologies.

Silent operation of AI

Rapid implementation is one of the most pressing issues facing the latest Generative Artificial Intelligence (GenAI) models. The UK’s National Cyber Security Centre (NCSC) has flagged it as a critical risk, while the US National Institute of Standards and Technology has called it “the single biggest security flaw in Generative Artificial Intelligence.” Put simply, rapid implementation occurs “when an attacker manipulates a large language model (LLM) with specially crafted inputs, causing the LLM to unknowingly carry out the attacker’s intentions,” as the Open Worldwide Application Security Project puts it. This can lead to manipulation of the system’s decision-making, spreading misinformation among the user, exposing confidential information, orchestrating sophisticated phishing attacks, and executing malicious code.

Indirect query injection is the insertion of malicious information into a GenAI system’s data sources by hiding instructions in the data it accesses, such as incoming emails or stored documents. Unlike direct query injection, it does not require direct access to the GenAI system, but instead poses a risk to the entire range of data sources that the GenAI system uses to provide context.

When a GenAI system gains access to email, personal documents, organizational knowledge, and other business applications, the potential for malicious misinformation to be injected through indirect injection using hidden instructions increases significantly. Many organizations are aware of the risk of GenAI-based misinformation but struggle to manage it. McKinsey reported inaccuracy as the most significant GenAI risk for organizations surveyed in its Global Survey, yet only 38% of those organizations have worked to mitigate these risks.

A key component of hidden instructions is that the GenAI assistant does not read the data in the same way that a human would. This allows for the development of extremely simple input methods that are invisible to the human eye but central to the GenAI data retrieval process. Combined with the range of input methods available to the GenAI assistant, such as emails, documents, and external web pages, the attack surface is wide and diverse.

Responsible implementation of a GenAI system requires proper mitigation of the risk of indirect operational data injection through data quality control, conscious management of data access, clear training of users on the safe use of the tools, and ongoing monitoring for suspicious behavior.

Context

In recent years, organizations have seen an explosive growth in their use of GenAI. In July 2024, Statista found that 75% of enterprise employees used GenAI in their work; 46% of them had implemented GenAI in the previous six months. All major use cases – from marketing to HR to data management – depend on access to the business’s knowledge base for efficiency. Indeed, major vendors such as Microsoft, Google and Amazon have released products promising to connect LLM to an organization’s data. At its Q2 conference, Microsoft announced that more than 10,000 companies have already integrated CoPilot into their Microsoft 365 applications.

The training of a Master of Laws (LLM) comes with a long list of overhead costs. According to the International Energy Agency, by 2026, annual energy consumption by data centers will equal or exceed that of Japan. Meta’s LLaMA 3.1-405B model required a cluster of over 16,000 H100 GPUs for computation, which brought Nvidia $30 billion. The total cost of scaling LLM is expected to exceed $1 trillion by the end of 2032.

Resource management (LLM) methods require clear use cases to justify such investments. Embedding LLM in systems that incorporate organizational contextual information is one way to do this. Typically based on a system called Retrieval-Augmented Generation (RAG), RAG+LLM systems operate by taking an initial user request to the system, accessing connected data sources (such as document repositories, databases, web services, and email), and retrieving the most relevant contextual information. It is then provided to the LLM as part of its request in conjunction with the user’s initial request and allows the LLM to respond as if it understands the organization’s data.

High-level RAG+LLM process map

In the pursuit of better performance, off-the-shelf systems integrate with several enterprise applications by default. For example, the enterprise versions of Microsoft CoPilot and Google Gemini have access to email, document repositories, and incoming Internet traffic.

While this can provide impressive functionality, it also poses obvious risks. If an attacker can insert information into any of the provided data sources without warning the requesting user, they can influence the behavior of the RAG+LLM system through the context it is using. At a basic level, it is possible to completely stop the system from responding. More subtle attacks can prevent the system from responding to queries related to certain key terms. At a higher level, an attacker can use indirect query injection to execute malicious code, inject misinformation, or return incorrect banking details.

In addition, GenAI systems are prone to hallucinations that occur when the LLM falsifies information. For example, Google’s Bard chatbot claimed that the James Webb Space Telescope had taken the world’s first images of a planet outside our solar system. One way to reduce the risk of hallucinations is to use organizational context as a reference point. But this increases the risk of indirect information injection, as it can inject data from compromised emails, documents, and databases into the GenAI system’s response.

Finally, there are related challenges in training GenAI users. GenAI systems provide warnings that some of the information they provide may be false, and links should be checked accordingly. However, these links can be embedded with phishing links. In this case, training a user to interact with a GenAI system may conflict with training it to detect and avoid malicious links.

Case Studies

The following case studies contribute to the literature on the possible outcomes of indirect questioning. We combine this with a simple mechanism we discovered for obfuscating large amounts of text to the reader of an email or document while still making the text available to the context window of an LLM application. (This mechanism has been reported to the NCSC.) The case studies highlight the risks associated with the user’s inability to verify information coming from an embedded LLM.

The method of obfuscating the data to the user is the same in each of these examples and does not require direct manipulation of metadata. It is not limited by the amount of text, default security policies, or the user’s color theme, as is the case with the small font size or white text approach.

Contact Details Changed: Email Disinformation

Given that Microsoft and Google are the two largest email providers in the world, it’s no surprise that both CoPilot and Gemini are designed to access and aggregate emails by default. Building on an example from BlackHat USA 2024, we show how emails can be used as a path to a user’s knowledge base. This allows us to edit the Assistant’s response to a request for email addresses or bank details.

Phishing through the spread of misinformation via email

Legal Liability: Disinformation via Documents

Documents are essential for providing effective context for RAG+LLM systems, especially in large organizations. Furthermore, documents allow for a wide range of injection attacks, as they not only contain much more data than a typical email, but are also often accessible to groups of people via cloud document storage such as SharePoint and Google Drive. Here, we show how injecting disinformation via obfuscated data into a stored document can lead CoPilot to misrepresent the organization’s legal liability position and repeat the disinformation when asked to draft a letter of intent.

Spreading disinformation via a malicious document

“I can’t help with that”: Targeted Denial of Service

Denial of Service (DoS) poses a threat to organizations that intend to rely on GenAI systems to perform critical tasks, such as customer support and content moderation, or as a source of decision-making. Indirect injection of a request allows for a targeted form of DoS. Rather than completely reducing the system’s performance, it instead injects malicious information that triggers the system’s defenses, forcing it to respond with an “I can’t help with that” response to a generic request. These attacks can be targeted to specific keywords and queries, making them difficult to track down because the system appears to function normally outside of those queries.

Receiving a malicious email renders Gmail unable to perform basic tasks

Mitigation and Response

To mitigate the risk of indirect data injection, organizations need to maintain proper data hygiene, assess systems before deployment, conduct user training, and implement technical safeguards. Each of these strategies plays a critical role in ensuring the security and resilience of GenAI tools.

Data Hygiene

Effective internal data governance is essential when implementing AI tools such as Microsoft CoPilot, Gemini, and Amazon Q. Proper data hygiene goes beyond simply organizing and storing data; it also ensures that the data that an AI system has access to is well-governed and protected.

Many traditional data governance practices apply to indirect data injection precautions. The most important first step is to separate any data that enters an organization’s system from external sources such as email. For example, unread emails should not be delivered to the data warehouse for retrieval until they are reviewed or read by an authorized user. Implementing an approval process for new data entering the data warehouse or RAG system helps create a defense that limits the potential impact of malicious content on the entire system.

Employee profiles for the data warehouse also help by ensuring that specific users only have access to data that is relevant to their job. This reduces the damage caused by a rapid deployment attack via a compromised document because it will only affect users who need access to that document; others in the organization cannot obtain malicious context.

Additionally, organizations should avoid entering email data into the data warehouse from blacklisted or other untrusted email addresses. This helps limit the impact of unknown risks on sensitive systems. Organizations should also apply controls to document types that allow hidden text (e.g., MS Word documents) and file types that can contain arbitrary code, such as pickle files (a file format commonly used to compress data when coding in Python).

Pre-deployment Assessment

Pre-deployment assessment of any AI application should be an automated or semi-automated process to ensure consistency of standards and create a clear path to managed deployment. This assessment should utilize a continuously updated database of indirect query injection attack strategies that exploit different document types (e.g., .doc, .pdf, and .png); obfuscation techniques (e.g., hidden text in Word documents and white text on a white background); and entry points (e.g., email from external sources and direct access to a document repository as an insider threat).

The pre-deployment assessment process should also include creating an interface for testing the application—either by interacting with underlying application programming interfaces or by directly testing the user interface using automated tools. By calculating the success rate of an attack, organizations can be confident in the security of their AI-powered application before deploying it more broadly.

User Training

User training is critical to mitigating the security risks associated with AI-powered applications like Microsoft CoPilot. Most organizations provide training on how to detect phishing attacks via email, but AI-specific training can help employees understand the additional security risks posed by these tools. This will improve their understanding of how AI models work, the vulnerabilities they create, and how malicious clues can be disguised.

Training programs should also emphasize the importance of validating information generated by AI and foster a healthy skepticism about embedded links, unexpected data, and other potential artifacts that could pose malicious risks.

Technical Safeguards

The introduction of natural language into AI models makes implicit hinting so effective. This means that there is no clear distinction between the task instructions given to the underlying AI and the data retrieved from the data store needed to answer the question. Attackers exploit this lack of distinction by inserting instructions into the data store that can mislead the AI.

A technical solution to mitigate this could include implementing an evaluation process—a mechanism that scans any data retrieved from the database for text that could be interpreted as instructions rather than data. If such text is found, it can either be flagged for review or filtered out before the AI includes it in the generated response. This additional level of scrutiny can help reduce the likelihood of malicious instructions getting into the context of a query and causing a security issue.

Conclusion

Indirect query injection is a significant risk in large-scale deployments of RAG systems such as Microsoft CoPilot, Google Gemini, and Amazon Q. While the ability to apply sophisticated LLMs to an organization’s relevant internal data can improve performance, it also risks inadvertently introducing a new class of vulnerabilities. Given the rapid evolution of these systems, organizations seeking to implement them must incorporate security best practices and appropriate user training into their deployment processes. Otherwise, they risk being exposed to external manipulation—whether it’s data theft, phishing scams, executable code injection, or disinformation.

New evidence suggests that effective controls have a significant impact on an attacker’s ability to perform indirect operational virus injection. These controls include implementing technical safeguards, such as input sanitization and stricter access controls, and building a security culture through effective user training. By embedding these measures into pre-deployment processes, organizations can safely and reliably implement these tools. They can combine them with post-deployment monitoring of the effectiveness of protective measures to ensure user behavior is secure and mitigate emerging threats.

0 Коментарі

Oldest

Newest Most Voted