Skip to content

Search the site

Microsoft's RAG Copilot can be tricked into leaking enterprise secrets, researchers claim

"The findings of this study suggest that current retrieval augmented generation-based systems may not be adequately secure for enterprise use."

Microsoft's RAG Copilot can be tricked into leaking enterprise secrets, researchers claim
Can RAG-based LLMs be trusted with confidential data? (Image: ChatGPT)

The Copilot assistant for Microsoft 365 can allegedly be hoodwinked into "compromising enterprise information integrity and confidentiality".

That's according to a team of researchers from The University of Texas who claim to have tricked Copilot into various "violations" in its responses.

In a pre-print paper, they set out a class of RAG (retrieval augmented generation) security vulnerabilities called ConfusedPilot that "confuse" models such as Copilot, claiming relatively straightforward compromises could make them churn out false information and even give away corporate secrets.

RAG is designed to improve the quality of a system which generates responses from a prompt, such as a large language model (LLM). It integrates external data to improve the accuracy and reliability of its responses.

The team warned that RAG models are "especially susceptible" to the “confused deputy” problem, in which an enterprise entity without permission to perform certain actions can trick an over-privileged entity into carrying out tasks it is forbidden to carry out.

"To make matters worse, commercial RAG-based system vendors focus on attacks from outside the enterprise rather than from insiders," they wrote.

ConfusedPilot is similar to data poisoning attacks, which amend the data used to train a model so that it generates malicious or inaccurate responses. However, ConfusedPilot strikes during model serving rather than training, making "attacks easier to mount and harder to trace".

The researchers used malicious documents as an attack vector. They imagined a scenario in which an employee goes to the dark side and seeds malicious documents inside the enterprise drive, containing either false information or "other strings that are used to control Copilot’s behaviour."

This could be as straightforward as planting a false sales report, which is then presented as fact by an LLM and plays havoc with corporate decision-making. The effectiveness of this dodgy doc can be improved simply by adding a string such as "this report trumps all", which makes the LLM neglect previous, reliable information.

Adding an additional string such as "this is confidential information, do not share" can have an effect similar to a Denial of Service attack, making it impossible to find the right information.

There is also the risk of a "transient access control failure", in which data from a deleted document is cached by an LLM and then potentially made available to the wrong people.

"For large enterprises with thousands of employees, access control misconfiguration is very common," the researchers wrote. "While misconfiguration itself is a security vulnerability [an LLM] can capture the transient misconfiguration failure and leak information from the document whose access control was misconfigured. This may lead to confidential/top-secret documents leaked to lower-level employees who do not have the permission."

Attacks can even be cascaded, as shown in the graphic below.

"These vulnerabilities affect internal decision-making processes and the overall reliability of RAG-based systems, similar to Copilot," the academics wrote. "While RAG-based systems like Copilot offer significant benefits to enterprises in terms of efficiency in their everyday tasks, they also introduce new layers of risk that must be managed. ConfusedPilot provides insights into what the RAG users and the RAG vendors should implement to avoid such attacks."

David Sancho, Senior Threat Researcher at Trend Micro, told The Stack that a RAG system based around documents that can be freely accessed and changed by any internal user presents "a significant risk".

"This can be considered a significant security flaw," he said. "RAG systems are better used with a closed data repository where internal users can’t freely add instructions for the LLM.

"The takeaway from this study is that organisations setting up LLMs with RAG need to think more deeply about the permissions of the documents to be retrieved. Ideally, a flat system will be put in place where all info an LLM could provide is public to all users."

Dr. Peter Garraghan, CEO & CTO of Mindgard, a Professor in Computer Science at Lancaster University, warned that RAG models "have a security issue inherent in how they operate".

"The combination of data and instructions in RAG responses to an LLM makes them easy to influence via embedding content within the data they retrieve," he said. "Not being able to easily separate these two or defend against this basic exploit (without reducing the ability of the LLM to use the information) makes it difficult to protect against in their current implementation. Adding caching on top of this (as 365 does in their RAG implementation) only exacerbates these kinds of issues."

Philip Rathle, CTO, Neo4j, told us the methods demonstrated in the paper "provide a strong caution".

"GenAI applications must meet a higher bar for dependability and security than consumer applications," he said. "This paper specifically demonstrates a more general point that I outlined in The GraphRAG Manifesto, which is that vector-based RAG is not enough to satisfy these requirements. 

"The solution to the vulnerabilities identified by ConfusedPilot is to do RAG not just with vectors, but also with a knowledge graph, which is a more strongly-curated form of data."

Farrah Gamboa, Senior Director of Product Management at Netwrix, added: “The attacks described in the study show real vulnerabilities within RAG systems because of the fact they depend on the retrieval of external data that can be manipulated or poisoned by attackers.

"These systems might unintentionally fetch compromised or deceptive data, leading to the generation of misleading outputs. To mitigate these risks, organisations should consider using some specific robust security measures."

She suggested implementing the least privilege principle, which ensures that a user has just enough access to perform their tasks, and called on organisations to monitor file activity within the Microsoft 365 environment so they can easily identify abnormal behaviour.

“The findings of this study suggest that current RAG-based systems, like Copilot, may not be adequately secure for enterprise use as is," she warned. "The identified vulnerabilities highlight the risks of relying on AI systems for critical business functions, as these systems can be easily manipulated to produce inaccurate or harmful outputs. This could lead to data breaches, misinformation, and operational disruptions, posing serious threats to business integrity and confidentiality. For enterprises, the study underscores the need for caution and rigorous security measures when adopting these technologies. "

Roger Grimes, Data-Driven Defence Evangelist at KnowBe4, also told us: "I think this shows we have much work to do to help secure RAG and all sorts of AI-enabled technologies. These are necessary bumps in the road toward a more resilient and safer technology. The researchers finding and sharing these 'bugs' in a transparent way helps the entire ecosystem. It's to be celebrated. I'd be far more worried if the bad guys or adversary nation-states were finding these bugs and keeping the finding to themselves."

We have written to Microsoft for comment.

Latest