Skip to content

Search the site

AIdeveloperOpinion

"The Real AI Coding Race"

"Model development is increasingly a distraction, a costly effort that will be eclipsed by the hyperscalers..."

Editor's note: This thought-provoking guest piece in The Stack first appeared on Medium and was republished with the kind permission of Ivan Durazin. We very rarely publish previously published articles but thought that this one was worth it.

After spending considerable time observing the evolution of AI in software development, I’ve come to a firm conclusion: the company that will dominate the AI coding agent space won’t be the one that builds the best AI models. Instead, it will be the one that perfects the human interface and effectively manages the middleware between this interface and multiple AI models," writes Ivan Burazin, co-founder, Daytona.

Model development is increasingly a distraction, a costly effort that will be eclipsed by the hyperscalers — leaving others to focus on the true battleground: experience and infrastructure. It will always be an arms race, but one where end consumers will reap the benefits.

The Four Approaches to AI-Driven Software Development

Right now, we see four distinct approaches in the AI-driven software development space, and all have raised significant funding:

  1. AI Augmenting Humans: Tools like Cursor, GitHub Copilot and Tabnine focus on boosting developer productivity by having AI assist in the coding process.
  2. Humans Augmenting AI: Platforms like Pythagora, Gitstart and Fume leverage human oversight to guide AI in specific tasks, ensuring quality control and human insight in more complex problem-solving.
  3. Full Autonomous AI Agents: Companies like Devin.ai, AutoCodeRover, and SWE-Agent are working on creating fully autonomous AI that can write and deploy software with minimal human input.
  4. Specialized Coding Models: Players like Poolside and Magic are building models that focus on specific aspects of coding, such as code-specific models or domain-focused models.
Current State of Market [1]
Current State of Market [1]

While these approaches may seem distinct today, all of these companies are chasing the same end goal: defining the future of software development. To justify their billion-dollar valuations, they will need to converge on the same thing — a unified paradigm where Autonomous AI Coding Agents are seamlessly integrated into the development workflow. This is where the importance of the interface and middleware comes in. Regardless of their starting point, the real challenge these companies face is crafting the user experience and managing the complex middleware needed to connect multiple AI models into a cohesive and functional tool.

The company that will dominate the AI coding agent space won’t be the one that builds the best AI models. Instead, it will be the one that perfects the human interface and effectively manages the middleware between this interface and multiple AI models.

The Billion-Dollar AI Engineering Race

Companies are pouring vast amounts of capital into building AI-assisted coding tools, ranging from augmenting humans (Cursor, Tabnine, GitHub Copilot) to humans augmenting AI (Fume, GitStart) to completely autonomous systems (Devin.ai, AutoCodeRover, SWE-Agent). This space has already attracted over $1.5 billion in funding, with sophisticated investors like a16z and Founders Fund throwing their weight behind it.

The vision for many of these companies is an autonomous AI software engineer, at least at a junior developer level. The total addressable market for such a solution is estimated to be in the trillions. However, the real challenge is not creating the most advanced AI model for coding, but crafting a compelling interface that bridges human-AI collaboration.

The Layers of an AI Coding Agent

An AI coding agent is composed of three key layers:

  1. Models: The foundational Large Language Models (LLMs) and Small Language Models (SLMs) that power the agent’s intelligence.
  2. Middleware: Sometimes referred to as the “thin layer” in AI tools is the connective tissue that manages how multiple models interact with the interface.
  3. Interface: The layer where human-AI interaction occurs.

Of these, the model layer is the least critical to focus on if you’re not a hyperscaler[2].

Models Are a Hyperscaler’s Game

Building new models is becoming increasingly futile unless you’re a hyperscaler with near-unlimited resources. Consider companies like Replit, which initially fine-tuned their code repair model but later switched to using OpenAI’s off-the-shelf models. Why? Because it’s a losing battle. The rapid pace of model improvements and the plummeting cost of inference make it impractical for startups to sink capital into building their own models. As we see from today’s benchmarks all models both Open and closed Sourced are all converging.

Llama 3.1 405B closes the gap with closed-sourced models [3]

Instead, the smarter approach is to leverage multiple, highly specialized small models (SLMs) that are optimized for specific tasks and tie them together with middleware. This strategy mirrors how multi-core processors revolutionized computing by distributing cost and complexity across different components. Furthermore the cost per million tokens has dropped dramatically in just 18 months — from $180 to under $1 — showing that hyperscalers will continue driving down the cost of large models. So why compete with them?

Minimum price per million tokens for an LLM with a 42 MMLU score [4]

Breakthroughs in model development will continue, driven by these hyperscalers. But even when new, more powerful models are developed, they often come with prohibitively high inference costs, making them impractical for widespread, real-time use in coding agents. For companies building Autonomous AI Coding Agents, the real opportunity lies elsewhere — in focusing on intuitive, efficient interfaces and robust middleware.

Building new models is becoming increasingly futile unless you’re a hyperscaler with near-unlimited resources

This strategy allows companies to remain agile, integrating improved models as they become available and economically viable, without the need to compete directly in model development.

These are complex and challenging problems that aren’t always enjoyable, such as the brute-force approach of competing on scale and performance. There’s a persistent sense that AI models are being released into the market with the hope that someone will eventually solve these issues and find product-market fit. This can be seen in companies like OpenAI and Anthropic, which have been integrating features such as artefacts and GitHub integration.

The reality is that we’re still constrained by the process of finding product-market fit and building defensible moats. No matter what, model trainers will always have a competitive advantage due to their control over the underlying resources.

Middleware: The New Core Infrastructure

While the model layer often gets the most attention, the middleware layer is where real innovation happens for companies building scalable Autonomous AI Coding Agents. Middleware acts as the backbone, allowing multiple models to interact seamlessly and orchestrating a combination of specialized models (SLMs) and large language models (LLMs) to achieve complex tasks efficiently. Enabling the right models for each specific task is critical. As noted by Continue’s Co-Founder Ty Dunn[5], the importance of selecting and integrating the correct models through middleware cannot be overstated.

Companies that understand the importance of middleware are building systems capable of efficiently coordinating dozens of specialized models or large language models, each optimized for specific tasks. These systems dynamically select the appropriate model for the job at hand, enabling more efficient use of resources. Instead of relying on a single, massive model, they are developing middleware to coordinate the use of different models, each handling specific tasks. A key aspect of this process involves understanding intricate code structures and leveraging knowledge graphs to enhance middleware efficiency [6]. This approach not only cuts the inference costs of running the models but also ensures that the most suitable model is deployed for each job. Moreover, this flexibility makes it easy to swap out models when a more capable one is trained for a specific task, allowing continuous improvements without significant overhead.

GitHub Copilot exemplifies this approach. As GitHub CEO Thomas Dohmke explained in a recent interview, “Copilot today not only uses one but a variety of models, including OpenAI’s 3.5 turbo for autocompletion, 4 turbo for chat, and 4o for workspace.” [7]. This highlights how Copilot has developed middleware to orchestrate these different models, ensuring that the right model is used for each task, and making it easy to integrate newer or better models over time.

Another company successfully leveraging this strategy is Codium.ai. In a recent conversation with CEO and Co-Founder Itamar Friedman, he mentioned that Codium.ai’s product integrates a variety of models to tackle over 60 different models. This vast model diversity is managed through sophisticated middleware, ensuring each model is used optimally for its respective task. The middleware not only allows Codium.ai to deploy the best model for each job but also provides the flexibility to rapidly swap in new models as they become available, giving the company a significant competitive advantage in terms of both performance and cost efficiency.

By focusing on building middleware that integrates multiple models, companies like GitHub and Codium.ai are not only reducing inference costs but also remaining agile in the face of AI advancements. This strategic focus on middleware allows them to build robust solutions that leverage existing AI technologies, continuously upgrading their capabilities without needing to reinvent the wheel.

Moreover, a well-designed middleware can dynamically decide whether to run inference locally or in the cloud based on factors like the task at hand, security requirements, or confidentiality concerns. This flexibility ensures that companies can balance performance, cost, and compliance, tailoring their AI systems to the unique needs of each scenario.

The Infrastructure Layer: Why Local Computing Is a Bottleneck

One of the most overlooked challenges in building effective Autonomous AI Coding Agents is the reliance on local machines to run these agents. Most companies in this space expect their agents to function on the user’s local environment, which introduces several critical issues:

  1. Operability Issues: The agent doesn’t always know whether it’s running on macOS, Windows, or Linux, which can lead to discrepancies in performance or compatibility issues. Developers working on different systems can face wildly different experiences when using the same AI agent. This echoes the classic developer frustration, “It works on my machine.” [8]. By relying on local computing, we’re carrying unresolved issues like inconsistent environments and resource limitations into the future of development. Instead, embracing standardized, scalable infrastructure — such as Devin.ai from Cognition’s cloud-based sandboxes — offers a unified solution, eliminating these inefficiencies and providing consistent, high-performance experiences across all environments.
  2. Compute Constraints: Local machines often lack the necessary compute power for Autonomous AI Coding Agents to perform complex tasks effectively. Models requiring significant GPU resources or parallel processing can overwhelm standard developer machines. As these agents become more sophisticated, they will need to handle multiple tasks simultaneously, leading to greater demand for compute power. In a future where Autonomous AI Coding Agents are widespread, developers will likely want to spin up an army of these agents to perform tasks in parallel, ensuring efficiency and speed in executing complex operations. This makes relying on local machines even more of a bottleneck.
  3. Security Risks: Running AI agents locally can introduce potential security concerns, as the agent requires access to the user’s files and data to function effectively. While the model itself doesn’t retain memory or care about the data it’s processing, if the provider caches this data on remote servers, it introduces privacy risks. Additionally, if an AI agent encounters an error or malfunctions, it could inadvertently cause damage to the local environment, potentially corrupting files or, in extreme cases, even bricking the user’s machine. This makes fully isolated and sandboxed agents, where security and data integrity can be centrally managed, a safer alternative in many scenarios.

OpenHands (formerly known as OpenDevin) takes a step in the right direction by using a local Docker-based sandbox to execute agent commands in isolation. This ensures that the AI agent doesn’t directly interact with the user’s local machine, enhancing security. For each task session, OpenHands spins up a securely isolated Docker container where all bash commands are executed, protecting the user’s local environment from errors.

While OpenHands has introduced a Remote Runtime feature to support cloud-based execution, this solution does come with some limitations. Developers must manually spin up a remote machine, install the OpenHands runtime, and connect it to their OpenHands instance. This manual configuration process and the lack of automatic scaling for remote environments introduce operational friction. According to the OpenDevin whitepaper[9], one of the core limitations in systems that rely on local execution, even with cloud options, is the lack of seamless transitions between environments.

A more promising infrastructure solution comes from Devin.ai by Cognition, which offers cloud-based sandbox environments. These environments eliminate local machine constraints, providing the necessary compute power to run multiple tasks in parallel without overloading the developer’s hardware. The cloud-based approach allows for better scalability, parallelization, and a more secure way to manage AI agent operations, reducing the risks to the local environment and maximizing performance.

However, what if you need both local and cloud options, or even a combination of the two? There are several reasons why developers might want to mix and match — such as cost, control, scaling flexibility, and security. Some tasks might be more efficiently handled locally to save on cloud costs, while more intensive, parallelized workflows could be delegated to cloud infrastructure. By blending local and cloud resources, developers can tailor their approach based on the specific requirements of each task, gaining flexibility while still ensuring both security and optimal performance.

Connecting to Middleware
These advanced infrastructure solutions are a critical part of the middleware layer for Autonomous AI Coding Agents, especially because they handle both the writing and execution of code. Middleware that incorporates standardized and sandboxed infrastructure like that offered by Devin.ai allows for the orchestration of multiple models and tasks, bridging the gap between model capabilities and practical, scalable application. By providing a robust, secure, and scalable infrastructure, middleware solutions ensure that Autonomous AI Coding Agents can operate effectively, responding dynamically to the needs of the software development process.

The Interface: The Real Challenge Yet to Be Solved

The challenge with Autonomous AI Coding Agents today is that most solutions either rely heavily on existing IDEs like VS Code, which is a losing battle, or they attempt to create entirely new interfaces, which are inherently flawed. Neither approach has solved the core problem, and this is where the winner of the AI coding race will likely be decided.

Relying on Existing IDEs: A Losing Strategy

Many AI coding tools fall back on using existing code editors or IDEs like VS Code or JetBrains. This is a fundamental weakness. To be more precise, by relying on for example VS Code, you’re handing control of your interface to Microsoft, a company that already dominates both the middleware and interface layers with GitHub Copilot and VS Code.

Microsoft owns the entire ecosystem: VS Code, the most widely used IDE; GitHub Copilot, which leads in middleware; and even Codespaces/Azure, which provides infrastructure. This tight integration creates a significant moat.

In essence, building on top of VS Code tools puts you in a precarious position, where your product’s success can be undermined at any time by the very platform you depend on. Even if your middleware is innovative, the reliance on their interface limits your ability to grow and compete.

If you’re building middleware but relying on VS Code as your interface, you’re directly competing with Microsoft on their own platform. They can easily outmaneuver you by either disabling your product’s key integrations, removing you from their marketplace, or simply incorporating a better version of your feature natively within their product. This makes it extremely difficult to differentiate or succeed in the long term.

Creating New Interfaces: The Flawed Alternative

Many companies are attempting to move away from traditional IDEs by creating new interfaces — often referred to as “studios” or “slices” — that combine chat, text editors, terminals, and browsers. These aim to revolutionize coding but face fundamental flaws.

These new interfaces are typically built around the limitations of current AI models rather than aligning with best practices in software development. They often ignore essential tools like version control (Git) and infrastructure as code (IaC), prioritizing AI-centric features like saving prompts. This is problematic because AI outputs are often inconsistent, leading to unpredictable results that create friction in the development process instead of eliminating it. Starting a project from a prompt leads to variability, whereas using a structured template ensures consistency — an essential trait for software development.

For example, rather than saving a prompt to set up a “template” development environment, it would be more effective for AI to generate reusable templates like a Dev Container, Nix, or a Terraform file. These templates provide consistent environments and workflows, which are vital for scalable and predictable development.

Human-AI Collaboration: The Missing Piece

While companies attempt to innovate with new AI-driven interfaces, one crucial aspect that remains unresolved is how human developers and AI engineers can effectively collaborate. Currently, most solutions that incorporate AI coding agents allow the human developer to take control by connecting to the sandbox via a classic editor like VS Code or JetBrains, working in their familiar environment. Once their tasks are completed, they switch back to the AI interface, allowing the AI to resume its operations.

However, this workflow introduces significant problems. The human and AI often miss out on the context and history of what the other has done. The AI may not be fully aware of changes made by the human during manual interventions, and the human might lose sight of the AI’s prior work. This lack of shared context results in inefficiencies and increases the cognitive load due to frequent context switching between the AI interface and traditional IDEs.

As we can see, neither relying on traditional IDEs nor these new AI-first interfaces offers a viable long-term solution for creating a streamlined, unified human/AI coding experience. Both approaches introduce friction, inconsistency, and inefficiency, which hinder seamless collaboration between human developers and AI. The cognitive burden from switching contexts and the lack of shared understanding between human and AI efforts create barriers rather than solutions.

This is why the battle for dominance in Autonomous AI Coding Agents will ultimately be decided at the interface level. The company that can solve these interface challenges — delivering a unified, fluid experience where humans and AI collaborate effortlessly — will define the future of AI-driven software development. A unified platform that retains context and history, while enabling smooth transitions between human and AI tasks, will set the standard for the next generation of development tools. Without this, the true potential of AI-assisted development will remain out of reach.

The Future: Building the iPhone of AI Engineers

Right now, we are at the PalmPilot stage of Autonomous AI Coding Agents. The technology is there, and pieces are starting to come together, but no one has yet built the iPhone of AI development — a product that harmonizes all the needed components into a seamless, delightful experience. The company that manages to do this will ultimately win the Autonomous AI Coding Agents race.

The pieces of the puzzle — cloud infrastructure, AI models, and middleware orchestration — already exist. But, like with the iPhone, success will come to the company that combines these components in a way that developers love to use

Like the iPhone, this won’t be achieved by inventing new models or even groundbreaking AI architectures. The real key to victory lies in leveraging existing models and integrating them into a user-centric experience. This approach doesn’t just prevent the unnecessary R&D overhead of developing models from scratch — it also ensures agility. Companies that focus on middleware and the interface can remain flexible, swiftly adopting new models as they become more capable and cost-effective.

Instead of competing with hyperscalers in the model development arms race, companies should focus on solving the critical challenges that remain unsolved: crafting intuitive interfaces and orchestrating robust middleware. These are the key differentiators. How AI interacts with developers and how models are used behind the scenes are deeply complex, and the companies that can master these layers will define the next generation of software development.

Just as the iPhone revolutionized the mobile phone industry by elegantly integrating pre-existing technologies, the winning Autonomous AI Coding Agent will transform the software development landscape by seamlessly embedding AI into the developer workflow. The focus must be on creating a streamlined, unified interface and robust middleware that allows humans and AI to collaborate effortlessly, without friction.

The pieces of the puzzle — cloud infrastructure, AI models, and middleware orchestration — already exist. But, like with the iPhone, success will come to the company that combines these components in a way that developers love to use. This is the battleground where the AI engineering race will be won.

References:

  1. Code Smarter, Not Harder — https://greylock.com/greymatter/code-smarter-not-harder/
  2. Definition of Hyperscaler — For this article, we use Gartner’s definition of hyperscalers: https://www.gartner.com/interactive/mq/4970831?ref=ddisp&refval=5001531, plus major LLM vendors like OpenAI and Anthropic. The debate on whether LLMs qualify as hyperscalers is beyond this article’s scope.
  3. Llama 3.1 405B closes the gap with closed-sourced models — https://www.linkedin.com/in/maxime-labonne/
  4. Minimum price per million tokens for an LLM with a 42 MMLU score by Martin Casado — https://x.com/martin_casado/status/1832597193137717672
  5. Enable the right models for the job — https://amplified.dev/#2-enable-the-right-models-for-the-job
  6. Building a Knowledge Graph of Your Codebase by Nikola Balic — https://www.daytona.io/dotfiles/building-a-knowledge-graph-of-your-codebase
  7. Unsupervised Learning: Redpoint’s AI Podcast — https://www.youtube.com/watch?v=iaoxbVqhewo
  8. It works on my machine — https://codingforspeed.com/but-it-works-on-my-machine/
  9. OpenDevin (now know as OpenHands) White Paper — https://arxiv.org/pdf/2407.16741

Thanks to Shawn “Swyx” Wang (Smol AI), Graham Neubig (OpenHands), Ty Dunn (Continue), Zvonimir Sabljic (Pythagora), and Itamar Friedman (CodiumAI) for commenting drafts of this.

Latest