ITarian blames AWS issues for Comodo service outage

ITarian, which provides cloud-based IT management software for managed service providers (MSPs) and enterprises, says a protracted outage late last week was not the result of a security incident.

Instead the company appears to have blamed AWS for the issues on February 24 and 25.

The hyperscaler meanwhile says it has not suffered any incidents. Critics say ITarian is shifting blame.

ITarian's statement came some 24 hours into the incident as customers began to fear -- amid a febrile international climate in which IT incidents immediately arouse fears in many quarters of a security breach -- that the company had been attacked and data was at risk. MSPs have increasingly been targeted in cyber-attacks.

Privately held, New Jersey-based ITarian provides a wide range of software tools and services ranging from patch management and help desk tools through to remote monitoring and management software.

The Stack could only reach ITarian via raising a ticket ourselves. We were eventually told by support on Sunday 27 that "[the] cause for the outage is definitely not a cyber attack. It is a server issue."

Whose servers?

The company's support told one customer, as posted in a forum thread on the issue: "Yesterday we experienced an outage which affected most if not ALL our US based customers. This issue has the full attention of Comodo Sr Leadership and we are doing everything we can to rectify this situation as fast as possible.

"We have isolated an issue within AWS load balancers which have been crashing.

"This service has been going up and down as AWS looks to resolve their issue. We have escalated and will continue to escalate this outage within AWS however we cannot provide an ETA on resolution. We will continue to provide updates as they become available," the company said a response to a ticket shared by one customer.

Stay up to date: Follow The Stack on LinkedIn

That's controversial. AWS had told The Register Thursday that it could "confirm there are no issues with AWS services," after several customers reported problems. A spokesperson for the hyperscaler said there was no breakdown on Tuesday nor on Thursday, adding: "We have not had a single service event this week."

It is, of course, conceivable that it is convenient for ITarian to point the finger at a third-party provider for its issues or even trigger errors within its own AWS setup through mismanagement/misconfiguration.

Many customers meanwhile were left frustrated by the lack of regular status updates from ITarian.

As one customer noted on February 24: "It's amazing how frequently the Comodo/iTarian tools are down but the status page says everything's fine. Two hours. Still can't connect from sites in WA, TX, FL, and NC" -- adding the next day that their tooling was "down for over 24 hours, no update on the status page, and their staff keeps saying it'll be back up in an hour or two. Contract renewal is up in a few months. I can't wait."

They told us by DM: "As of Monday February 28th ~10 AM I still have ~75% of my devices offline. I just got off the phone with them and they said they "changed IPs" and that I should reboot any workstation that isn't online. I'm trying that now". In an update March 1, they added: "A majority of my machines are back online now. Rebooting initially didn't help it. I think they're still having intermittent issues. Comodo is garbage.

"I'm in the middle of writing my own remote support tool. Probably going to open source it when I'm done. I'm sick of all these terrible providers and their crap products."

Data from Downdetector, a website that tracks reports of outages, had suggested AWS was having issues last week -- with an AWS spokesperson telling The Reg that DownDetector had "walked back its own false reporting" and adding "The AWS Service Health Dashboard (SHD) is the only reliable source of AWS availability data, providing customers with timely and accurate information on AWS services and regions."

The "timely and accurate" line may cause hollow laughs in some quarters.

As one frustrated AWS customer wrote in the wake of substantial US-EAST-1 AWS data centre issues in December 2021: “It really grinds my gears to hear AWS yammer on about cell-based deploys and minimizing blast radius during Re:Invent and watch the following week as the status page stays green for at least an hour during a regional outage, and then only admit AWS Console problems…” with the author, developer Ryan Scott Brown adding in a blistering blog on AWS’s post-incident write-up: “AWS has made US-East-1 a source of systemic risk for every single AWS customer...” saying the issue has been neglected for years and noting that in the wake of a 2017 AWS outage, the company wrote “We were unable to update the individual services’ status on the AWS Service Health Dashboard (SHD) because of a dependency the SHD administration console has on Amazon S3.

Alongside the ITarian outage, other AWS users had suffered issues around the same time last week.

Heroku, which is hosted by AWS, suffered a partial outage at about 17:00 UTC on February 24, around the same time complaints against AWS rose on DownDetector. That's just a few hours after ITarian customers started reporting problems. The Rust programming language's Crates.io, which relies on Heroku and Amazon's cloud, was also temporarily down at the same time and blamed an unnamed infrastructure provider.

Correlation, of course, does not imply causation.

AWS in late 2021 meanwhile said it expects to "release a new version of our Service Health Dashboard early next year that will make it easier to understand service impact and a new support system architecture that actively runs across multiple AWS regions to ensure we do not have delays in communicating with customers.”