Bad Ubuntu update crashes global Azure Kubernetes services
A flawed Ubuntu systemd update appears to have taken Azure virtual machines running on Ubuntu offline by breaking DNS – causing a significant Azure Kubernetes outage for Ubuntu users.
Canonical has confirmed a bug and pulled the affected update – systemd version 237-3ubuntu10.54 for Ubuntu 18.04. According to Microsoft the issue first became apparent at 0600 UTC (7am UK time) on Tuesday 30 August.
The OS version affected is the default OS for Kubernetes nodes on Azure, which may account for the outsize effect an issue with a single version of the distro is having on Azure’s services worldwide. A user on Reddit also documented the Azure Kubernetes outage, as well as reporting on the success of their workarounds.
See also: Kubernetes has standardised on sigstore in a landmark move
The DNS issue can be resolved either by hard-coding a DNS server address, or by rebooting VMS, thus renewing the DHCP lease. While the affected update is no longer available, and should now be cleared from mirrors, Microsoft recommends that users running Ubuntu VMs on Azure stop automatic updates.
A poster on Canonical’s bug thread also offered a way to detect whether nodes are affected by the Azure Kubernetes outage, with other posters also offering advice and observations. However, others reported that restarting their VMs had no effect.
The Stack has contacted Canonical and Microsoft for comment.
According to figures from 2020, more than half of all VMs running on Azure used Linux, with supported distros including Red Hat, SUSE, Debian and others – including Ubuntu.
Ubuntu is a popular choice for running Kubernetes clusters on Azure, with Microsoft supporting Ubuntu 18.04 (the version affected by this issue) as the default node OS for the Azure Kubernetes Service. Canonical also produces a number of other Ubuntu versions specifically tuned for use on Azure.
Image from Jeff Geerling’s ever-relevant t-shirt.