Mantis Framework poisons hackers' AI agents

A new framework, Mantis, lets cybersecurity professionals automate counter-offensive actions against any AI agents attacking their systems.

The new open-source toolkit shows how defenders can use prompt injection attacks to take over systems hosting a malicious agent.

Alternatively, they can soak up attackers' AI resources in an “agent tarpit” that traps the LLM agent in an infinite filesystem exploration loop*.

"The attacker is driven into a fake and dynamically created filesystem with a directory tree of infinite depth and is asked/forced to traverse it indefinitely."

The Mantis** framework is the creation of three Red Team security researchers and academics associated with George Mason University.

It effectively generates honeypots or decoys designed to counter-attack LLM agents activated against them, using various prompt injections.

AI versus AI

Dario Pasquini, Evgenios M. Kornaropoulos, and Giuseppe Ateniese say once deployed, Mantis “operates autonomously, orchestrating countermeasures…through a suite of decoy services…such as fake FTP servers and compromised-looking web applications [to] entrap LLM agents by mimicking exploitable features and common attack vectors.

It can then counter-attack, with "prompt injection[s] inserted in…a way that [is] invisible to a human operator that loads the decoy’s response. We achieve this by using ANSI escape sequences and HTML comment tags.”

Mantis can be customized to employ... dynamically tailored execution triggers specific to the attacking LLM agent. To achieve this, Mantis can use fingerprinting tools like LLMmap to identify the LLM version used by the attacking agent based on current interactions. Once identified, methods like NeuralExec [pdf] can then generate customized execution triggers

[Mantis aims to] leverage the agent’s tool-access capabilities, such as terminal access, to manipulate it into executing unsafe commands that compromise the machine on which it is running [for example to] initiate a reverse shell connection to the attacker’s machine. Due to the limited robustness of LLMs, this strategy can be implemented relatively easily – Pasquini et al.

In an October 28 arXiv paper they claimed that Mantis "consistently achieved over 95% effectiveness against automated LLM-driven attacks", showcasing a range of successful prompt injection counter-attacks.

The framework, provided as a Python package, is a response to a) The susceptibility of AI agents to prompt injection attacks; b) The nascent use by threat actors of LLM agents to support automated exploitation.

Somewhere, an overheating GPU sucked up vital electricity from the grid to help us generate this image as the planet overheated and extreme weather events proliferated. We're sorry.

Big Sleep finds vulnerabilities: Don't nap on this

It was released as Google's Project Zero said that its "Big Sleep" LLM agent had autonomously identified an exploitable stack-based buffer overflow in the SQLite open source database engine, which fuzzing had not identified.

We believe this is the first public example of an AI agent finding a previously unknown exploitable memory-safety issue in widely used real-world software.

That vulnerability (patched before the code was made public) "remained undiscovered after 150 CPU-hours of fuzzing" Google's researchers said.

OpenAI and Microsoft wrote earlier in 2024 meanwhile that they had disrupted attempted "malicious uses of AI by state-affiliated threat actors".

They wrote: "Previous red team assessments⁠ we conducted in partnership with external cybersecurity experts...found that GPT-4 offers only limited, incremental capabilities for malicious cybersecurity tasks beyond what is already achievable with publicly available, non-AI powered tools."