Gmail down for many after 3-hour Google Cloud outage

Google faced a series of global outages today (Friday November 12) with Gmail down (IMAP servers not responding) for some users, cloud SQL and Google Cloud Console down for others. The issues appear to have begun around 8:30am BST -- with support recognising it as "an issue with Google Cloud infrastructure components" and the incident apparently largely resolved by 11:38 BST.

With Gmail down being the issue that attracted the most immediate attention from users globally, the issue also affected the following services for some hours, with Google Cloud saying at 10:57am BST:

Cloud App Engine: Customers may see traffic drop for us-central1 and europe-west1
Cloud Bigtable: "Mitigation still in progress, ETA for resolution still unknown
Cloud Monitoring UI: There is a mitigation in place at the GFE infrastructure level that is rolling out and is expected to resolve this issue.
Cloud Console: All Cloud Console paths may be unavailable.
Cloud Spanner: Customers coming through GFE (not CFE or cloud interconnect) will experience UNAVAILABLE error and latency for both DATA and ADMIN operations

See also: Facebook outage triggered by BGP configuration issue as services fail for 6 billion

Software updates getting pushed to production and slipping through checks designed to prevent unwanted issues are regularly to blame for this kind of issue. Slack, AWS, Azure, Fastly and Facebook have all faced outages this year. Fastly blamed a "service configuration" issue; Azure blamed a March outage on "a service update targeting an internal validation test ring [that] was deployed, causing a crash upon startup in the Azure AD backend services. A latent code defect in the Azure AD backend service Safe Deployment Process (SDP) system caused this to deploy directly into our production environment, bypassing our normal validation process". AWS customers are still awaiting a post-incident write-up for a sustained outage in AWS-East-1 in September 2021, meanwhile.