A new framework, Mantis, lets cybersecurity professionals automate counter-offensive actions against any AI agents attacking their systems.
The new open-source toolkit shows how defenders can use prompt injection attacks to take over systems hosting a malicious agent.
Alternatively, they can soak up attackers' AI resources in an “agent tarpit” that traps the LLM agent in an infinite filesystem exploration loop*.
"The attacker is driven into a fake and dynamically created filesystem with a directory tree of infinite depth and is asked/forced to traverse it indefinitely."
The Mantis** framework is the creation of three Red Team security researchers and academics associated with George Mason University.
It effectively generates honeypots or decoys designed to counter-attack LLM agents activated against them, using various prompt injections.
AI versus AI
Dario Pasquini, Evgenios M. Kornaropoulos, and Giuseppe Ateniese say once deployed, Mantis “operates autonomously, orchestrating countermeasures…through a suite of decoy services…such as fake FTP servers and compromised-looking web applications [to] entrap LLM agents by mimicking exploitable features and common attack vectors.
It can then counter-attack, with "prompt injection[s] inserted in…a way that [is] invisible to a human operator that loads the decoy’s response. We achieve this by using ANSI escape sequences and HTML comment tags.”
Mantis can be customized to employ... dynamically tailored execution triggers specific to the attacking LLM agent. To achieve this, Mantis can use fingerprinting tools like LLMmap to identify the LLM version used by the attacking agent based on current interactions. Once identified, methods like NeuralExec [pdf] can then generate customized execution triggers
[Mantis aims to] leverage the agent’s tool-access capabilities, such as terminal access, to manipulate it into executing unsafe commands that compromise the machine on which it is running [for example to] initiate a reverse shell connection to the attacker’s machine. Due to the limited robustness of LLMs, this strategy can be implemented relatively easily – Pasquini et al.
In an October 28 arXiv paper they claimed that Mantis "consistently achieved over 95% effectiveness against automated LLM-driven attacks", showcasing a range of successful prompt injection counter-attacks.
The framework, provided as a Python package, is a response to a) The susceptibility of AI agents to prompt injection attacks; b) The nascent use by threat actors of LLM agents to support automated exploitation.
Big Sleep finds vulnerabilities: Don't nap on this
It was released as Google's Project Zero said that its "Big Sleep" LLM agent had autonomously identified an exploitable stack-based buffer overflow in the SQLite open source database engine, which fuzzing had not identified.
We believe this is the first public example of an AI agent finding a previously unknown exploitable memory-safety issue in widely used real-world software.
That vulnerability (patched before the code was made public) "remained undiscovered after 150 CPU-hours of fuzzing" Google's researchers said.
OpenAI and Microsoft wrote earlier in 2024 meanwhile that they had disrupted attempted "malicious uses of AI by state-affiliated threat actors".
They wrote: "Previous red team assessments we conducted in partnership with external cybersecurity experts...found that GPT-4 offers only limited, incremental capabilities for malicious cybersecurity tasks beyond what is already achievable with publicly available, non-AI powered tools."
See also: No LLMs aren’t about to “autonomously” hack your company
But Mantis's release comes as Red Teams say that LLMs are increasingly helpful in offensive cyber-operations, with bespoke tools like PentestGPT performing [pdf] performing highly in Capture The Flag tests.
Grim-faced security veterans will no doubt decry hype around the use of AI in malicious attacks beyond social engineering, saying that by far the greater risk comes from cretins persistently saving their passwords in plain text on their desktops, failure to deploy MFA, the rampant leaking of credentials, or firewall vendors pushing out products riddled with ancient code, SQL injection vulnerabilities or hard-coded passwords.
(Scrutiny of the firmware running on Ivanti devices by Eclypsium earlier this year revealed that its Pulse Secure appliances run on an 11-year-old base OS that is no longer supported and are composed of multiple libraries which are vulnerable to a combined 973 flaws, with 111 having publicly known exploits? "Firewall"? Users seem to certainly get regularly burned.)
But to those concerned at the potential for wider deployment of AI agents in offensive cyber activity and thinking about their response, Mantis may just be a lot of fun; just speak to counsel before... deploying in the wild.
*Alternative tarpit approaches are available...
**MANTIS is a rather creative acronym for “Malicious LLM-Agent Neutralization and Exploitation Through Prompt Injections”