Exchange Online outage: Microsoft manually restarts servers

Microsoft said Monday that it was having to manually restart servers after an incident that dragged on for over 11 hours without full resolution.

The Exchange Online outage appeared to have global impact and also affected Teams Calendar – and even Defender for some customers.

Redmond first acknowledged the incident at 9:06 am UK time.

It said that it was “routing traffic to alternate infrastructure and [has] reinitiated targeted server restarts” after an incident knocked Exchange Online/Outlook services offline for some customers globally.

“A portion of infrastructure which supports mailbox and calendar functionality isn't operating as expected, resulting in impact” Redmond said in a gated incident report, later pointing to “a recent change which we believe has resulted in impact.” It was testing a patch “on a subset of components before full deployment” it said by 13:00 UK time.

Angry users vented on X in response to updates there: "Any chance Microsoft could actually learn to stop making untested changes to live services please? Well, they're either untested, or your testing methodology is completely inadequate. Both are equally as bad and equally inexcusable" as one put it. Others expressed delight and took off for the pub.

See also: Microsoft’s “top notch” China hack post-mortem was "troubling" speculation

Some four hours into the Exchange Online outage it said that it was “continuing our manual restarts on the remaining impacted machines.”

But by around 6pm UK time it admitted in gated updates that “Our mitigative actions haven't provided relief as expected, and a portion of infrastructure remains in an unhealthy state. We determined that some of the targeted server restarts did not succeed due to processing issues, which are under investigation. We’re currently focused on spreading traffic to healthy infrastructure, and we're seeing some recovery. “

Microsoft suffers global incident: Triggers manual server restarts

See also: Microsoft’s “top notch” China hack post-mortem was "troubling" speculation

See also: Massive Azure outage blamed on WAN update

See also: Microsoft’s “top notch” China hack post-mortem was "troubling" speculation

See also: Massive Azure outage blamed on WAN update

Sign up for The Stack