Multiple major websites went offline on Tuesday after an hour-long outage at the cloud service company Fastly

Multiple major websites went offline on Tuesday after an apparent outage at the cloud service company Fastly, and there were still reports of sporadic disruptions after the company patched the problem about an hour later.

Dozens of sites - including the New York Times, CNN, Twitch, Reddit, Amazon, the Guardian, and the UK government’s home page - could not be reached. Affected websites displayed the message: “Error 503 Service Unavailable”.

San Francisco-based Fastly acknowledged a problem just before 1000 GMT. It said in repeated updates on its website that it was “continuing to investigate the issue”.

In a blog on Wednesday, Nick Rockwell, senior vice president of Engineering and Infrastructure at Fastly said the disruption was trigger by a single customer.

“We experienced a global outage due to an undiscovered software bug that surfaced on June 8 when it was triggered by a valid customer configuration change”, adding that, “Within 49 minutes, 95% of our network was operating as normal”.

“This outage was broad and severe, and we’re truly sorry for the impact to our customers and everyone who relies on them,” he said.

Gaps in operational resilience

The outage reflects how there’s still a long way to go when it comes to operational resilience, according to Guy Warren, CEO of ITRS. With the world’s telecommunications infrastructure becoming even more complex, firms need to act now to avoid similar outages happening again, he explained.

“Yesterday, the websites of some of the world’s biggest institutions experienced outages,” he said. ”Faced with increasing options, customers and consumers won’t remain patient forever.”

”Firms must realise that investing in operational resilience – whether that be allocating funding to the endeavor or providing those internally responsible for operational resilience with more authority – is key to their survival.”

”The reputational damage and customer losses following repeated outages cannot be underestimated. Firms must act now or face the consequences later.”

Fastly describes itself as an “edge cloud platform.” It provides behind-the-scenes cloud computing services to many of the web’s high profile sites, by helping them to store content in servers around the world.

It is designed to speed up loading times for websites, and protect them from denial-of-service (DoS) attacks and help them when traffic is peaking.

The 2021 Allianz Risk Barometer identified cloud outages as having potential for systemic or catastrophic risk. ”A major blackout or cloud outage could have a massive impact, simultaneously affecting companies around the world”.

“Future ‘Black Swan’ events cannot be ruled out,” said Jens Krickhahn, a regional cyber practice leader at AGCS. ”It will be important to identify and prepare for such scenarios quickly before they become true events.”