When Broadcom started putting up prices, cutting out channel vendors and moving VMware customers from perpetual licences to subscriptions, both enterprises and partners started looking for alternatives.
That’s led to a lot of interest in KubeVirt, a CNCF project that relies on KVM (the same type1 bare metal hypervisor used in OpenStack, Proxmox and Nutanix) and the libvirt virtualisation API to run virtual machines alongside containers in Kubernetes clusters, using the same commands and tools to orchestrate both.
KubeVirt offers VM lifecycle management, resource allocation, storage and network management, GPU passthrough, live migration and other familiar virtualisation features, although it actually delegates functions like scheduling, networking and storage to Kubernetes itself. That makes it an attractive strategy for gradually modernising your infrastructure as well as supporting legacy VM workloads, and maybe even having the same team manage both kinds of workload.
“Pure cloud or pure on-prem deployments are rare now, so in a world of heterogeneous compute, KubeVirt allows a better integration of VM and container workloads,” says Andrew Wafaa, senior director of software communities at Arm (a KubeVirt contributor, bent on making sure it runs well on the Arm systems hyperscale clouds are investing in).
“KubeVirt allows you to modernise over time at your own pace,” KubeVirt maintainer Fabian Deutsch explains.
Already popular with organisations who embraced cloud native early, the project is seeing renewed interest both from users and contributors (many of them vendors with an eye on those new users). “The push came from Broadcom, and the primary need right now is to run virtual machines – but while they’re learning about KubeVirt, they start to realize that this gives them the flexibility, time wise and technology wise, to modernize their own applications and internal infrastructure.”
Tangled up in infrastructure
You won't always see KubeVirt named in migration stories like Reist's recent move to OpenShift because it’s not just the vSphere virtualisation management platform organisations need to replace; they may find they’re also dependent on (or at least optimised for) VMware for networking, storage, security, identity and workload management – and on third party tools that integrate with those VMware products for critical features like backup and disaster recovery.
Untangling all that makes migration slow and expensive: Gartner’s rather eye-watering estimates of the costs involved ($300-$3,000 per VM for commercial migration services and maybe two years of still paying for a Broadcom subscription until it’s done) reflects staff time as well as partner and vendor costs, as well as retraining, certifications and potential downtime.
See also: $300 million cloud bill triggered a rethink - and a shopping spree on modular hardware
That’s a lot of money to replace something you’ve already got, which means many organizations may end up staying with VMware while they look for longer term solutions. For one thing, organizations don’t want to pick another virtualisation solution only to have to repeat what’s likely to be an expensive, painful and disruptive replatforming exercise.
“A lot of these customers are realizing they would go through a huge migration effort, and at best, they would have about the same as what they have today, just on a different vendor. There’s a risk that you'd invest millions and millions in migration projects and maybe end up in three years having to go to yet another platform because they get acquired, or they go bust, or whatever,” warns Spectro Cloud group product manager Romain Decker.
“If they already have a containerization strategy or a Kubernetes first strategy, they’re asking if it makes more sense to look at what can I containerize up front and replace outright? Or is there a way where I can start mixing these together and lift and shift over what I have today as a temporary solution to a platform that also is my target platform for my containers, and then start refactoring piecemeal.”
KubeVirt can be the first step on that journey but it’s worth remembering that VMs still have advantages for some workloads. It’s not just the additional security (and regulatory compliance) that comes from full kernel isolation, but also the option of using VMs to spin up a cluster within a cluster when you want to sandbox a new containerised application to try out without affecting existing dependencies. And if you need a Windows application as part of a CI/CD pipeline, a VM is the obvious way to do it: “[you can] take in an amount of data, spin up the VM and then process that data, spit out the artifact and use it in the pipeline,” explains Andrew Burden, RedHat’s KubeVirt community facilitator.
KubeVirt is flexible enough to cover all these approaches, because you’re not just adopting KubeVirt – you’re moving to the Kubernetes ecosystem.
It’s deliberately not designed to be a full-fledged, standalone solution. “Our belief is if we fit into the CNCF ecosystem perfectly, then you can leverage the CNCF ecosystem, like Cluster API, like Tekton Pipelines, like Argo. Our users have access to a wide ecosystem of tools, whether it's for policy enforcement or automation monitoring.”
The state of KubeVirt
The KubeVirt project joined the CNCF in 2019 and stayed in beta for a long time. The v1 release came out in July 2023 and the project is currently in the incubation stage, although it’s likely to graduate within the next few months, marking the maturity of the project.
“KubeVirt is rapidly catching up to VMware for all the major features,” claims Decker (like a number of people in the KubeVirt ecosystem, he previously worked at VMware). “Two years ago, there were significant gaps between what KubeVirt could deliver and what VMware customers expected. In the last 18 months, the number of improvements, feature enhancements and new capabilities that went into KubeVirt boggle the mind including features that were not even thought possible, like storage migration.”
Now, he maintains, KubeVirt has 80-90% of the functionality customers expect from a virtualisation platform. “You can start and stop VMs. You can snapshot and clone VMs. There's a backup and restore mechanism. You can do live migration. You can do storage migration, you can run encryption, you can run a TPM.” More niche functionality like full orchestration of replication for disaster recovery or failing over to a copy of a VM running in lockstep may not be available yet, but cross-cluster migration (for backup or resiliency) is in development.
Recent features include network interface hotplugging, GPU and (contributed by Nvidia) vGPU assignment, (limited) NUMA support and storage and volume migration (delivered as an API tool vendors can build on), as well as an option previously only available in commercial solutions built on KubeVirt like OpenShift Virtualization: common instance types to simplify creating VMs with predefined resources and performance. The upcoming 1.5 release will add directed live migration to a specific node, as well as VM reset.
These kinds of familiar virtualisation features make up the last 10-15% of KubeVirt functionality required for what Deutsch calls ‘serious’ adopters. “We didn’t have reset for a long time, we didn’t have storage live migration; now we know we need it to close those gaps to more traditional virtualization solutions.”
Sometimes, missing features were about limitations in Kubernetes, Burden notes. “CPU hot plug and memory hot plug were popular features in the virtualisation space but Kubernetes didn’t permit changing CPU memory dynamically for pods so KubeVirt was limited.” Both are now supported. “Because of the changed landscape, we decided we have to deliver these features because they’re just expected.”
That doesn’t mean abandoning the long-standing KubeVirt principle that if something isn’t virtualisation specific, it’s better to solve it in Kubernetes first. The KubeVirt team has been working with other projects like Medik8s, which delivers high availability for Kubernetes on bare metal, or and helped deliver network microsegmentation in OVN-Kubernetes. “This is great because we solve [the problem] for pods and VMs alike,” says Deutsch.” Similarly, the project contributed load-aware rebalancing (which he describes as similar to vSphere Distributed Resource Scheduler) to the Kubernetes Descheduler, which moves and evicts both pods and VMs.
Developments like this may be why Charles Ruffino, cloud architecture fellow at SoftIron expects modern, cloud native implementations to replace 40% of traditional VMware deployments, although he warns that the transition may not be easy.
“While KubeVirt offers an interesting way to run traditional VMs within Kubernetes by attempting to bridge the gap between legacy VMware environments and modern cloud-native architectures, and while it can definitely play a role in easing the transition, achieving a fully cloud-native outcome is going to need a different strategy focused on scalability, security, and operational efficiency across the board. KubeVirt leaves a lot to the operator to figure out.”
But partners and vendors are jumping in to fill that gap.
“It's an opportunity for vendors to go after users that are incentivized to find something else and build it on cloud native,” CNCF developer relations manager Jorge Castro suggests. “The major lesson learned here, from a business perspective, is that a vendor chose that you weren't important enough [to them]. One of the values of open source is you're never stuck in a lurch, you always have that option of choosing your own fate.”
KubeVirt under the covers
For several years, KubeVirt has been a useful tool for building platforms, and while those were originally rather specialised, it’s showing up in more general purpose offerings. Cloudflare uses KubeVirt for its CI runners, Intel uses it in its Gaudi AI cloud and NVidia built its GeForce NOW online gaming service with KubeVirt, so you can play PC games on a phone or a VR headset in the cloud. Equinix uses it as part of its bare metal service, Civo runs KubeVirt on bare metal to deliver its cloud services, NCR Voyix uses it to offer retailers and restaurants a platform that runs both containerised and virtualised apps at the edge and it’s in tools like KubeSphere and Cozystack, a free PaaS for building your own cloud.
“The Fortune 50 are using KubeVirt extensively,” says NASA Cloud Security Engineer (and former KubeVirt community lead) Kathryn Morgan. “It’s used in telecoms because it’s suitable for highly regulated production environments where they can't afford dropped connections and have huge efficiency demands on the power envelope.”
Microsoft uses KubeVirt for VM workloads in the Kubernetes-based Azure Operator Nexus offering for mobile operators that it took over from AT&T in 2021. In the same year Platform 9 added KubeVirt to its range of managed Kubernetes services, again initially for telcos who need to run control applications that aren’t a good fit for running in containers – but it’s now pushing that as a solution for enterprises looking for a VMware alternative. Spectro Cloud offers what it calls a Virtual Machine Orchestrator reference architecture for its Palette Kubernetes management platform, which include KubeVirt and is designed to scale to multiple locations including the edge; a space VMware experimented with repeatedly but never served well.
“We have customers that run KubeVirt on drones that pick apples. There’s no way you’re going to run a full stack of VMware servers for a solution like that,” Decker points out
These days, KubeVirt is showing up not just in Kubernetes distros like Oracle Cloud Native Environment but products aimed specifically at competing with VMware. RedHat’s OpenShift Virtualization is based on KubeVirt and it’s used in Rancher’s open source Harvester bare metal hyperconverged infrastructure project which SUSE Virtualization and Edge Virtualization solutions build on. Google Anthos uses KubeVirt.
Familiar tools that offer backup and disaster recovery for Kubernetes are starting to work with at least the commercial versions of KubeVirt; Veeam Kasten supports OpenShift Virtualization and SUSE Virtualization.
See also: Google praises UniSuper’s CIO after GCP error deleted $124 billion firm’s entire private cloud
No one is going to rebuild their hundreds or thousands of VMs to move to KubeVirt, but there are migration tools. Forklift offers a GUI for migrating even running VMs with minimal downtime (including converting guest VMs from ESXi to KVM) but it does require collecting a lot of VM metadata and may not be sufficient for more complex workloads with multiple network interfaces, non-standard operating systems or virtual appliance.
But again, vendors are starting to invest here. For example, Spectro Cloud offers a web-based VM Migration Assistant based on Forklift that fetches VMs from your environment, suggests migration options like storage and network mappings and then creates resources and migrates VMs into your cluster. Platform 9 and Cloudbase have their own migration tools with KubeVirt support.
There's also work going on to integrate KubeVirt with other CNCF projects that haven’t previously supported VMs, like using Helm charts for VM deployments.
These tools are important for KubeVirt’s new audience. Early adopters were looking for efficiency and wanting to built hybrid applications that incorporated complex or monolithic workloads like Postgres they couldn’t easily containerise to run on their bare metal greenfield Kubernetes deployments. “Containers are much more efficient than virtual machines and more dynamic; they support different development models,” Deutsch notes.
When a VM is what you need, KubeVirt lets you take advantage of those same efficiencies, Morgan points out. “In VMware, you have to lug around a lot of ISOs and design entire storage architectures based on being able to ship these ISOs to your nodes; in Kubernetes, that's a completely solved problem. You just write the name of the image that you want just like any other container image in your manifest, and it rolls out.”
Join peers following The Stack on LinkedIn
Instead of needing highly efficient tiered caching to create a network that can move a 5GB ISO around for ephemeral VMs that spawn and are destroyed frequently, you can just load it onto each worker node once. “Just in terms of efficiencies around the data centre, there's a lot of benefit to your VM workloads once you've modernized on more efficient compute and distribution mechanisms like OCI and Kubernetes.”
That extends to using declarative tooling pipelines like Flux or Argo CD to define workloads, whether that’s a Kubernetes app or Windows, 11 VDI infrastructure.
Those benefits may be a pleasant surprise for newer users adopting KubeVirt for different motivations – primarily driven by concerns about Broadcom – but they also have different needs. They’re used to working with partners, they need helpful migration tools that can move VMs wholesale while keeping the same IP addresses and they’re probably looking for a vendor who can offer 24-7 support and even consultancy rather than a set of open source projects they can assemble. Even for organisations with experience of open source, navigating multiple CNCF projects can be challenging.
VMware has monitoring, automation and observability tools – but so does the Kubernetes ecosystem, Deutsch points out. “Vendors pick a nicely integrated, opinionated set of Kubernetes and CNCF projects.” That will include not just observability tools but certifications from storage and networking partners. “That’s something you can start to compare to VMware because you’ve got the partner integration because the vendor is helping to do certifications, they’re providing an observability solution, they're providing networking and storage options.”
The skills transition may be a bigger challenge for many organisations. For one thing, it’s probably not the same team running Kubernetes and VMware infrastructure inside an organisation. VMware operators have invested in certifications for what’s been a secure career up until now; Kubernetes certifications won’t match up exactly to their expertise.
Plus, they will need to deal with a different approach to infrastructure with a very different lifecycle and maintenance schedule. Even with long-term support Kubernetes, organisations need to be prepared for regular upgrades, Decker notes. “A VMware customer could manually deploy vSphere onto a box of machines and then basically run it for years and years and years with very little maintenance on it. With the Kubernetes and the open source stack, you have to be able to do maintenance in an automated way, in a structured way, continuously.”
That’s where vendors like Spectro Cloud want to help. The KubeVirt project itself has also adapted to the needs of this new breed of users. In the early days, it had short release cycles to keep up with how fast Kubernetes itself was moving. As vendors started to include KubeVirt in their products, the project slowed down its release schedule to give both enterprise adopters and vendors more time to rebase on top of new KubeVirt releases and it’s now tied to the Kubernetes release schedule. The development model has also changed to reflect the fact that VMs run critical workloads and new (and increasingly complex) features can’t break those workloads.
The 1.5 release is the first to use the new KubeVirt Enhancement process with Kubernetes-style Special Interest Groups for storage, compute, network, migration and other key areas, to spread the maintainer workload and deliver high quality features. That’s a sign of an increasingly mature project (and something Burden hopes more external vendors will get involved with).
Futureproofing
What KubeVirt really delivers is the flexibility to take advantage of VMs in conjunction with cloud native projects and tools.
Kubernetes is emerging as a more universal infrastructure orchestrator: it’s not just for containers. VMs can find a home there, particularly as part of workloads that need components that don’t fit neatly into a container or as part of a longer-term migration to cloud native – but also for legacy workloads that it’s not worth rewriting. But Kubernetes is also interesting for running smaller chunks of compute, whether that’s serverless functions or WASM code with projects like runwasi.
Organisations frustrated by the need to move off VMware may be tempted to try using KubeVirt as a transition to a purely containerised future and this is a good opportunity to reevaluate preconceived ideas about what you need, Morgan suggests. “Should they be doing containerized workloads? Should they do be doing virtualized workloads? Should they be skipping over a modernization journey that is expensive and takes a long time, and going straight to WASM?”
The capabilities of KubeVirt are still evolving; even as part of a commercial product, it may not yet offer all the enterprise features expect from the mature VMware solutions and in practice, optimising it takes work. But if you treat it as part of rethinking your infrastructure stack rather than a quick fix for a vendor issue like Broadcom’s price rises, you can start to plan for the future possibilities of a single, unified management plane.