As containers started gaining traction, we needed a way to manage them across multiple machines, and for a period of time we had a wide range of options, writes Charles Humble. Docker had Docker Swarm and Docker Swarm mode, companies like Rancher and CoreOS came up with their own takes; and more general purpose platforms like Mesos, which Apple used for Siri (and possibly still does but I’ve not been able to confirm), ran containers alongside other sorts of workloads.
But gradually Kubernetes came to dominate the space, sucking out all the oxygen and leaving us with just Kubernetes, alongside options such as Red Hat’s OpenShift that are built on top of it. For any competitor, it is hard to get attention.
There are some good reasons for this. Kubernetes’ dominance means it is relatively easy to find people who know and understand it; it is also undeniably powerful and flexible. But I’ve come to see it as the J2EE of orchestrators. “Simple is not a word anyone would use to describe Kubernetes any more, and it isn’t elegant either,” Matt Butcher, creator of Kubernetes package manager, Helm, and CEO at Fermyon Technologies, told me. “In their haste to make it palatable to the enterprise they tried to make it all things to all people, and it has become very big and clunky.”
The explosion of tools and options that the CNCF landscape represents provides so many different ways of building a platform that curation is required to use Kubernetes effectively. Because of this, it isn’t uncommon for organisations to have platform teams who are dedicated to maintaining their Kubernetes platform, and still struggle to keep it up to date. A Datadog survey from 2021 reported that the most popular Kubernetes version is 17 months old.
What this means, according to Butcher, is that there is room for a new contender. “Kubernetes has become so big and so difficult to use that small and medium businesses don’t find it palatable, and it’s so expensive to operate that enterprises are saying, ‘This is too much. How do we reduce the total cost of ownership?’ Kubernetes isn’t invincible; I actually think it’s in quite a precarious place and open to a challenger. .NET disrupted Java in the early days. What would be the challenger for Kubernetes?”
There are three possibilities: HashiCorp’s Nomad, VMware Tanzu Application Service, and cycle.io.
Nomad
HashiCorp’s Nomad is deployed as a single binary, written in GO, and has a responsive community of maintainers on GitHub. Well known users of the platform include Autodesk, Cloudflare and Roblox.
Although Nomad is a container orchestrator, one of its key advantages is that it isn’t restricted to deploying containerised workloads, so it’s perhaps best thought of as a workload scheduler. It can deploy pretty much anything including VMs, Java JARs, QEMU, Raw Executables, Firecracker microVMs, and WebAssembly, and can also be used to schedule batch jobs. Those workloads can be deployed across on-prem data centres, at the edge or on any public cloud.
Nomad leverages Consul for configuring and discovering cluster services. A Nomad cluster is composed of between three and seven servers, connecting with client agents through RPC. The cluster infrastructure is divided into regions that manage one or more availability zones or data centres, where regions are loosely coupled and communicate with each other using a gossip protocol.
Nomad is highly scalable. Whilst current Kubernetes version 1.29 is built to support clusters with up to 5,000 nodes orchestrating a maximum of 300,000 containers, Nomad can scale clusters exceeding 10,000 nodes in production. In 2020, HashiCorp demonstrated that they could scale to 2 million Docker containers on 6,100 hosts in 10 AWS regions in just 22 minutes.
Nomad also has a well-deserved reputation for operational simplicity. The aforementioned Roblox employs just 4 SREs to manage Nomad, Consul and Vault for 11,000+ nodes across 22 clusters, serving 420+ internal developers.
Butcher adopted it as part of a move to WebAssembly, which he likes in part because the WebAssembly virtual machine being just a simple stack machine means it starts incredibly quickly. “We can cold start it in under a millisecond,” Butcher said. “But we were having trouble finding an orchestrator for it because most were built with only containers in mind. With Krustlet we tried to retrofit Kubernetes to use WebAssembly as a runtime, and we just couldn’t get it done. We spent three or four months just getting to the point where we could start up a WebAssembly binary the way we thought it should be executed. We got frustrated hitting the various assumptions built into Kubernetes that we couldn’t easily correct, so we took Nomad for a spin for a weekend. Started on Friday and had it running as we wanted by Monday.”
Butcher and his team were happy with their choice, and able to get very high levels of density, running about 3,000 applications per 2XL-sized AWS virtual machine. At the time he also believed that Nomad would ultimately supplant Kubernetes as the container of choice. But then HashiCorp took the decision to switch from Mozilla Public License v2.0 to Business Source License (BSL) v1.1 – which is “source-available” but not open source in any traditional sense.
“HashiCorp shot themselves in the foot by changing their licensing agreement,” Butcher told me. “All the conversations we had about installing and running Nomad on-prem made us realise that Nomad was no longer a valid option. I don’t think it was so much a legal problem with the BSL licence as a sense of betrayal that was felt as a result of that change. So we’ve had to pivot back to Kubernetes because we bet on the wrong horse.”
We should say that IBM has just purchased HashiCorp, and it isn’t beyond the bounds of possibility that they might change the licensing model back to an open source one. However, they have so far given no indication that they will do so, and IBM’s plans for HashiCorp’s suite of products is also not yet clear.
There are also some technical limitations with Nomad. It doesn’t have the concept of ingress controllers for managing network connectivity. It is also primarily a task-scheduling platform, so it can’t orchestrate things like load balancing, configuration management or routing. There are also no managed services for Nomad on any cloud platform including, rather surprisingly, HashiCorp’s own HCP. In view of this, if you are looking for something more like a PaaS, VMware’s Tanzu and cycle.io are other options to consider.
Tanzu
Tanzu Application Service, previously known as Pivotal Cloud Foundry (PCF), is designed to run Microservice applications across clouds. It works on vSphere and all the major cloud providers, and is particularly suited to .NET, Spring and Spring Boot-based applications. Where it really excels is with developer experience—cf push offers a Heroku-like experience for developers at an enterprise scale; in effect, “Here is my code, run it for me, I don’t care how.” It is lovely to work with.
However, because it is highly opinionated it also has limitations, a point which ex-Pivotal employee and now Syntasso COO, Paula Kennedy, made when I chatted to her last year. “We really tried to offer Cloud Foundry as the platform that could do everything,” she said. “It worked fantastically for certain types of applications, particularly stateless 12-factor applications. But for many of the big enterprises we worked with, there’s a lot of other estate out there that doesn’t fit that model.”
Alongside their Application Service, VMWare has a direct competitor to OpenShift with Tanzu Application Platform. One option is to take a mixed approach with Tanzu Application Service where that model works, and Tanzu Application Platform or OpenShift, when it doesn’t. I’m aware of at least one company doing exactly this, and the approach works well for them.
I have a couple of other reservations with Tanzu, however. The first is that their array of offerings originate from both internal R&D projects and a number of acquisitions. The resulting portfolio can be confusing, and it isn’t always clear how the various offerings are differentiated or integrated.
A second, perhaps more significant, issue is that their future roadmap is, at the time of writing, uncertain following their acquisition by Broadcom. For a lot of enterprises that lack of clarity makes Tanzu a non-starter. At the same time, competitor software companies are aggressively targeting VMWare’s customers—SoftIrons’ VM Squared is one recent example—and whilst it is impossible to know how this will play out, I would be extremely cautious about committing to Tanzu at the moment.
Cycle
Which leaves us with Cycle. “I think Cycle is an elegantly constructed system,” Butcher told me. “Their strength is that they made it much more palatable to a developer or a small operations team who needs to get things deployed, monitored and managed, and doesn’t want to run the show, handle the operating system upgrades and so on. That has been a huge Achilles heel for the Kubernetes world. Even the hosted control plane offerings you get from Azure and AWS require a depth of knowledge of how the orchestrator works, and what dependencies are going to break when they automatically update your control plane. That’s an area where Cycle has overtaken everyone and we really like it. When we looked at it we thought, ‘Wow! We could literally run Fermyon Cloud without an operations team.’”
Cycle is proprietary. Their customers include Busify, who has started to migrate from Amazon ECS to Cycle because it “gives us the ability to empower our engineers to control the operation infrastructure without struggling or being experts with ECS,” according to Casey Dement, Busify’s head of engineering.
In common with Nomad, Cycle can be run more or less anywhere. It is a container orchestrator and infrastructure management platform. Supported sources for the container images are all OCI-compatible or Docker-based (Docker Hub, Docker Registry and Dockerfile), but servers can exist on multiple cloud providers, with built-in support for AWS, GCP, Equinix Metal and Vultr. In addition, what Cycle refers to as their Infrastructure Abstraction Layer (IAL), allows organisations to add support for anything from another cloud provider to on-premise infrastructure, by implementing a REST-based middleware.
On each compute node, Cycle automatically instals a minimal Linux-derived operating system called CycleOS which provides basic networking, storage protocols and plugins for the container layer that runs on top of it. Every time a server boots, it connects to Cycle and pulls down a copy of the OS, which then runs in RAM—the OS is never installed to disk. This means that the operating system can be automatically updated by the company whenever necessary, with new releases on a two-week cadence automatically deployed to all of their customers, and no risk of having an out-of-date or unpatched version.
When I spoke to the company in 2022, they told me that the majority of their customers had never adopted Kubernetes. But according to Cycle’s co-founder and CEO, Jake Warner, that is changing. “I think we hit the top of the Kubernetes hype-cycle about six months ago,” he told us. “This is based on the number of companies who are coming to us looking not for a Kubernetes wrapper, but for something that has nothing to do with Kubernetes at all.” What’s behind this shift, he believes, is that, “Now that interest rates are higher, more organisations have CTOs and CEOs who are saying, ‘Is Kubernetes solving problems for us, or did we just adopt it so a DevOps engineer could put it on their resume?’”
Warner tells me that companies who move to Cycle can expect to save money. Echoing what Butcher said, Warner expects that on average, a team of up to 20 developers can use Cycle without any DevOps engineers. “We have a customer who recently moved from ECS and expects to save $40,000/year on infrastructure costs because Cycle focuses on better server density, and another $180,000/year on DevOps costs,” he said.
Cycle does have some limitations. CycleOS only runs on x86 machines (ARM is not supported) with a minimum of 4GB RAM and 30GB+ of disk space. The reliance on their own operating system means that you can’t pick the host operating system and underlying kernel; if that is important to you, Nomad is likely a better option. Cycle also doesn’t support non-containerised workloads, and it doesn’t yet work with Microsoft Azure. Of course, Cycle is a much smaller company than either HashiCorp or Broadcom, but since the platform consumes just raw OCI-compliant containers, users could easily take their containers elsewhere if they needed to.
On the other hand, Cycle is a very pleasant platform to work with, and it is clear that a great deal of thought and care has gone into the features they support and how they’ve created it. Whilst it is opinionated, it manages to achieve a level of flexibility that is well beyond anything that a PaaS like Heroku or Railway offers. “Our goal,” Warner told me, “is to be smack-dab in the middle of that spectrum between opinionated and flexible. We’re opinionated about many things where that makes it easier, but we want to remain flexible and powerful enough that organisations can build whatever they want on top of the platform.” For that reason, it is definitely worth having on your shortlist.
Delivered in partnership with Cycle.