AI Agents: Rock, Paper, Hype?

Excitable pundits of many persuasions say AI “agents” own the future – referring to code that can make decisions and take actions autonomously.

Take a look at LinkedIn or any other social media platform and clickbait of this stripe proliferates – purporting to show the creation of entirely autonomous hedge funds built on multiple AI agents, or other reveries.

In a thought-provoking new paper, Microsoft Research’s Ryan White and Professor Chirag Shah of the University of Washington sketch out the limitations of both past and present approaches to agentic workflows – and propose an improved collaboration of “Agents, Sims, and Assistants.”

AI Agents: The limitations

Among the limitations of AI agents are that they are typically “designed for specific tasks and fail to generalize across different domains”, the two wrote: “Ensuring seamless interaction among agents [also] remains a significant challenge, often leading to inefficiencies and conflicts.”

Among the potential architectural adjustments they propose are the implementation of “caching solutions that store and execute agent workflows and reduce the need for calls to foundation models…”

The two also propose the use of “hierarchical architectures that integrate small language models and large language models” for scalability and efficiency, saying that by “decomposing tasks into sub-tasks and assigning them to specialized agents, we can manage complexity more effectively.”

Yet even with such improvements (the two outline more in their paper here) AI agents alone will not be capable of executing the level of complex autonomous task whilst being trusted by end-users, they warn.

The two propose a three-layered approach instead.

1) Agents as “narrow and purpose-driven modules” that “can be autonomous, but with an ability to interface with other agents.”

2) “Sims” as simulations of a user, that capture a “combination of user profile, preferences, and behaviors” with customisable privacy settings.

3) “Assistants” or programmes that directly interact with the user and which can call Sims and Agents as needed to deliver tasks and sub-tasks.

Elephants all the way down?

If this sounds like elephants all the way down (the two share little detail of what differentiates a “Sim” from an Agent), the authors emphasise that as it stands, the industry seriously needs to “address personalization, privacy, user agency, value generation, and trustworthiness of agents.”

“We believe similar to an app store, there could be an agent store with vetted agents available for a user or their Assistants to interact with and accomplish various tasks” they conclude – saying that Agents will also need to be augmented by “reinforcement learning and transfer learning.”

LLMs “struggle to adapt”

Their article comes as research proliferates proposing quasi-autonomous trading houses powered by AI agents or agents delivering urban planning.

A team of researchers from IBM and the University of Montreal warned in an important December 27 paper, however, that LLMs “still struggle to effectively adapt to other agents even when they are directly told both what the other agent will do is and the reward structure of the game…”

They gave the example of the "incredibly simple scenario of an LLM agent playing the classic game Rock, Paper, Scissors against an agent that always plays the action "Rock"for 100 consecutive rounds. The optimal course of action is for the LLM to respond with the counter to this action "Paper" as much as possible...

"... What we found is that a vanilla application of some common open source LLMs resulted in a policy that chose each action "Rock", "Paper", and "Scissor" roughly evenly. This is interesting because this is actually the famous Nash equilibrium solution for this game. However, this is the solution for optimizing your worst case return across any possible opponent. To act in this way against this particular opponent that always plays "Rock" for 100 consecutive rounds actually demonstrates a profound lack of theory of mind..."

“While we find that advanced prompting strategies can lead to significant improvements in the adaptability of models, open source LLMs still are unable to consistently match the performance of simple tabular models.”

AI Agents: The limitations

Elephants all the way down?

LLMs “struggle to adapt”

Sign up for The Stack