Tech Insight: When the Internet Blinked — Amazon’s Big Cloud Outage Explained

If your favourite app, game, or online bank stopped working recently, you weren’t alone. In October, a major problem inside Amazon Web Services (AWS) — the world’s largest cloud computing provider — took thousands of websites and apps offline for hours. Amazon has now apologised and explained what went wrong, but the incident has left many businesses wondering just how fragile the internet really is.

Read time: 5 mins

Amazon's big cloud outage caused headaches for millions

What Actually Happened?

Late on 19 October, engineers at Amazon’s massive data centre in North Virginia noticed something odd. Parts of their system suddenly couldn’t talk to each other. The problem started in a behind-the-scenes piece of technology called DNS, which acts like the internet’s phone book. Every website and app has a name (e.g. lloydsbank.com) and a number (an IP address, such as 23.73.138.26) that computers use to find it. DNS connects the two.

A software bug in Amazon’s system accidentally deleted one of those phone book entries — a critical one used by DynamoDB, a database service that thousands of other systems depend on.
Without that entry, servers simply couldn’t find each other. Imagine if every contact in your phone vanished — you’d still have a phone, but you couldn’t call anyone. That’s exactly what happened to much of the internet for about 15 hours.

How Amazon’s Big Cloud Outage Affected Everyday Life

More than 1,000 popular services went down or slowed to a crawl. Among them were Snapchat, Reddit, Roblox, Fortnite, Lloyds Bank, Halifax, and Venmo.
Some banking customers couldn’t access their accounts or make payments until mid-afternoon the next day. Even smart home gadgets misbehaved — one brand of connected mattress overheated or got stuck in a raised position because its control app stopped communicating with the cloud.

In short, millions of devices and websites were still “on” — but blind to each other. For a few hours, big chunks of the internet simply forgot how to connect.

Why Did It Take So Long to Fix?

Amazon’s engineers quickly found the cause and restored the missing record manually. But by then, the damage was done.
Their automated systems — the ones that normally spin up servers, balance traffic, and reconnect users — were overwhelmed trying to catch up. Some services had to be restarted completely to untangle the mess.

By the afternoon of 20 October, things were mostly back to normal, though a few smaller systems took longer to recover.

Amazon’s Apology

Amazon published a detailed technical report (about 7,000 words long!) and admitted that automation had failed to spot the error or fix it automatically.
The company said it had turned off the faulty system worldwide until it’s sure the bug is gone and promised new safeguards to stop similar issues in future.

In its statement, AWS said:

“We know how critical our services are to our customers… and we apologise for the impact this event caused.”

What the Experts Are Saying about Amazon’s Big Cloud Outage

Tech specialists have called the incident a classic example of how automation can go wrong. The more automated and complex a system becomes, the bigger the ripple effect when a small thing fails.
The phrase “It’s always DNS” — long a running joke among IT professionals — turned out to be painfully true again.

Experts also say the outage shows the danger of putting all your eggs in one basket. Businesses that ran everything in a single AWS region were completely offline. Those that spread their systems across multiple regions, or used a mix of cloud providers, stayed up and running.

Why It Matters to UK Businesses

For UK organisations, Amazon’s Big Cloud Outage is a reminder that resilience can’t be someone else’s problem. Many British firms rely on US-based cloud platforms for their most critical systems — from payments and logistics to customer service. When one region on the other side of the world goes dark, the knock-on effects can be immediate.

It’s worth asking:

  • Where does your data actually live?
  • What happens if one cloud region fails?
  • Do you have a backup plan to keep key systems online?

The answers don’t have to be complicated, but they do need to exist. Regular testing, regional backups, and disaster recovery plans are now essential, not optional.

The Bigger Picture

Even Amazon’s own products, like Alexa and Ring doorbells, slowed down during the outage — proof that no one is immune.
For Amazon, it’s a lesson in how success brings risk: when so much of the world’s online activity runs through one company, a small internal hiccup can have global consequences.

For everyone else, it’s a wake-up call. The modern internet is powerful but fragile — a complex web of connections where a single broken link can make everything feel like it’s disappeared.


Takeaway for all businesses:

Incidents like this highlight why redundancy, visibility, and good backup practices matter more than ever. Even the biggest players can stumble.

By understanding where your systems depend on others — and having a plan B — you can make sure your business stays connected, no matter who’s having a bad day in the cloud.

If you’re not sure where your data lives, what would happen if you lost access to it, or if your data is safely backed up to a separate facility and recoverable when you need it to be, then get in touch with us. We can help you with business continuity planning for outage incidents like this, as well as a whole host of other possibilities, so you can keep your show on the road, even when your competitors can’t.

Cyber resilience is everyone’s business!

< Back to blog