The Stack

Google, NHS count cost of heatwave IT outages – what happens next time?

Image courtesy the Met Office.

Both Google’s cooling and failover systems struggled to cope with the UK’s July heatwave, while an NHS trust is still dealing with heatwave IT outages – so, what happens the next time temperatures reach 40C?

The UK’s extreme heat brought much of the country to a standstill, but it also pushed datacentres to their limits – and in the case of Google’s Europe-West2-a London site’s cooling systems, beyond them. In a detailed report on the heat-induced failure, published on Friday, Google said it was forced to shut down the datacentre to prevent damage to its systems – but then accidentally routed traffic away from functioning parts of the Europe-West2 region.

“At the start of the incident, we inadvertently modified traffic routing for internal services to avoid all three zones in the europe-west2 region, rather than just the impacted europe-west2-a zone,” said Google’s report.

“Our regional storage services, including GCS and BigQuery, replicate customer data across multiple zones. Due to the regional traffic routing change, they were unable to access any replica for a number of storage objects. This prevented customers from reading these objects until the traffic routing was corrected, at which point access was immediately restored.”

Google shut down the London datacentre at 18:05 BST on 19 July, and managed to repair the cooling systems around four hours later, after which it began restoring its service. The main outage was resolved after around 18 hours, but the “long tail” issues associated with shutting down an entire hyperscaler datacentre took an extra 17 hours to resolve – for a total of 35 hours’ disruption.

The search giant said it was conducting a detailed analysis of what happened to its cooling systems, and work on ways to decrease the thermal load within a datacentre, to prevent the need for a full shutdown. The report also said it would “repair and carefully re-test” its failover automation, to prevent a reoccurrence of the routing mess.

NHS trust apologises for heatwave IT outages

But while Google and Oracle – the other high-profile IT player to see heatwave IT outages – were able to restore their systems relatively quickly, other organisations were less fortunate. The most prominent example is Guy’s and St Thomas’ NHS Trust in London, which saw both its datacentres wilt in the heat – and which is still dealing with the aftermath.

Last week the trust’s chief executive, Ian Abbs, published an apology for the ongoing heatwave IT outages, and promised a full external review of the incident. Much of the trust’s IT systems are still offline, including its management systems – resulting in a lot of manual work.

“On Tuesday 19 July, as a result of extreme heat, the two data centres on our Guy’s and St Thomas’ sites failed – this is unprecedented and happened very quickly. While our immediate priority is to get our IT systems back up and running as safely and quickly as possible, we will be commissioning an external, independent review to look in detail at what happened and to ensure we identify and learn all the lessons,” said Abbs in his apology.

What about the next heatwave?

While the specific causes of all these heatwave IT outages need to be investigated, the ultimate cause is both simple and obvious: nobody expected these systems to need to cope with such extreme heat. There are plenty of datacentres operating in consistently hotter climates, but it would make no sense for Google to build its London datacentre with the same cooling capacity as its Nevada facility.   

Unfortunately, the equation is shifting when it comes to the UK’s climate, with the Met Office predicting peak summer temperatures could be 4-7C hotter within the next 50 years, and with much higher chances of temperatures exceeding 40C. Those predictions came in 2020, in the aftermath of 2019’s record-breaking summer – records which were smashed last month.

In the wake of 2019’s record highs, academics looked into how the UK was dealing with heatwave planning – and the answers were not encouraging. In a paper published in the February 2021 edition of Environmental Science & Policy, Chloe Brimicombe et al found heatwaves are an “invisible risk” in the UK, with most policy work focusing on narrow impacts.

“Communication over what UK residents should do, the support needed to make changes, and their capacity to enact those changes, is often lacking. In turn, there is an inherent bias where research focuses too narrowly on the health and building sectors over other critical sectors, such as agriculture,” wrote the authors.

See also: “Through a glass, darkly?” Making sense of Data Centre sustainability

In September 2021 another team of researchers found the wider economic impact of climate change was being mostly ignored, with most models assuming the effect on long-term economic growth would be zero. Their findings were published in Environmental Research Letters.

“Climate change makes detrimental events like the recent heatwave in North America and the floods in Europe much more likely. If we stop assuming that economies recover from such events within months, the costs of warming look much higher than usually stated,” said co-author Chris Brierley in a UCL article about the research.

Many commentators dismissively compared last month’s heatwave to the sustained high temperatures of 1976. And while in many ways that summer was worse thanks to the nearly three months of heat, the UK’s peak temperature only reached 35.9C, nearly five degrees less than the July 2022 peak, and three degrees less than the 2019 peak.

As the Guy’s and Thomas’ heatwave IT outages demonstrate, peak temperatures can have outsized effects, even if they only last for a short time. As it stands, it seems like the UK – and particularly the government – is not prepared to deal with these incidents, and urgent reassessments of the impact of extreme climate events is needed.

Follow The Stack on LinkedIn

Exit mobile version