Amazon has blamed a “subsystem responsible for capacity management for AWS Lambda” for an AWS outage in its US-EAST-1 region that took down over 100 services for approximately four hours late on June 13.
Despite recent architectural changes to improve AWS Support resilience AWS said that “customers may also have experienced issues when attempting to initiate a Call or Chat to AWS Support” during the incident.
The AWS outage took place from 11:49am PDT to 15:37pm PDT.
Given its scale, few companies have the ability to affect as many customers in one fell swoop as AWS does; the incident affected FIFA, Fox News, and the McDonalds app amongst many hundreds of others.
Some 104 AWS services including AWS Account Management, CloudWatch, Glue, Fargate, Secrets Manager and more saw “increased error rates and latencies” and AWS said that “customers may have experienced authentication or sign-in errors when using the AWS Management Console, or authenticating through Cognito or IAM STS…”
The root cause of the issue was rapidly resolved and no cloud hyperscaler (or indeed on-premises data centre) can avoid sporadic issues. The impact on AWS Support that may leave the hyperscaler with some questions to answer, however, after promised changes to AWS Support infrastructure.
In late 2021 AWS promised to build a “new support system architecture that actively runs across multiple AWS regions” amid criticism of US-EAST-1 (AWS’s most fragile region) being a single point of failure for AWS Support when severe outages like one in December 2021 happen.
This issue was flagged in Gartner’s 2022 Magic Quadrant for Cloud Infrastructure and Platform Services, which pointed to AWS’s “regional dependencies and communication” as cause for some concern.
Join peers following The Stack on LinkedIn
As Gartner’s analysts put it in October: “AWS’s operational incident of 7 December 2021 revealed some multiregion dependencies on the internal AWS network, which is hosted in US-EAST-1. Because US-EAST-1 also hosts support ticketing for North America, AWS customers also had difficulty communicating with technical support during the incident…
Yet in a note published on August 1, 2022 AWS said it had launched a new “AWS Support Center console URL… [that] ensures you can always contact AWS Support via the AWS Support Center Console… built using the latest architecture standards for high availability and region redundancy.”
Why this rebuild did not resolve the perennial issue of US-EAST-1 outages taking out AWS Support with them remains an open question.
The Stack has put it to AWS and will update this if we get an answer.