Skip to content

Search the site

Microsoft suffers global incident: Triggers manual server restarts

"Our mitigative actions haven't provided relief as expected, and a portion of infrastructure remains in an unhealthy state"

Image credit: https://unsplash.com/@superadmins

Microsoft said Monday that it was having to manually restart servers after an incident that dragged on for over 11 hours without full resolution. 

The Exchange Online outage appeared to have global impact and also affected Teams Calendar – and even Defender for some customers. 

Redmond first acknowledged the incident at 9:06 am UK time. 

It said that it was “routing traffic to alternate infrastructure and [has] reinitiated targeted server restarts” after an incident knocked Exchange Online/Outlook services offline for some customers globally.

“A portion of infrastructure which supports mailbox and calendar functionality isn't operating as expected, resulting in impact” Redmond said in a gated incident report, later pointing to “a recent change which we believe has resulted in impact.” It was testing a patch “on a subset of components before full deployment” it said by 13:00 UK time.

Angry users vented on X in response to updates there: "Any chance Microsoft could actually learn to stop making untested changes to live services please? Well, they're either untested, or your testing methodology is completely inadequate. Both are equally as bad and equally inexcusable" as one put it. Others expressed delight and took off for the pub.

See also: Microsoft’s “top notch” China hack post-mortem was "troubling" speculation

Some four hours into the Exchange Online outage it said that it was “continuing our manual restarts on the remaining impacted machines.” 

But by around 6pm UK time it admitted in gated updates that “Our mitigative actions haven't provided relief as expected, and a portion of infrastructure remains in an unhealthy state. We determined that some of the targeted server restarts did not succeed due to processing issues, which are under investigation. We’re currently focused on spreading traffic to healthy infrastructure, and we're seeing some recovery. “

See also: Massive Azure outage blamed on WAN update

As the incident dragged into what appeared to be its eleventh hour, Microsoft posted to its @MSFT365Status page on X that “We’re facing delays in our recovery efforts and are taking immediate action to address them. We understand the significant impact of this event to your businesses and are working to provide relief as soon as possible…”

Impact, as assessed from a mixture of social platforms and user forums suggested that performance was intermittent across certain services (with some seeing outright Exchange Online outages) rather than blanket down.

Updates to follow later.

We’ll look forward to (and share) the post-mortem when we have it. 

Latest