AWS outage: the lessons that must be learned

admin 30th October 2025 News, Views and Opinion Comments Off on AWS outage: the lessons that must be learned 299 Views

Aras Nazarovas explains what business leaders can do to protect operations in the future

The biggest lesson from the AWS US-East-1 region outage on October 20th, is that most of the affected companies were lacking resources to ensure business continuity. It shows that even when using some of the most trusted and reliable cloud solutions, it is unreasonable to expect a 100% uptime.

Therefore, to ensure that your business continues to operate even when a major cloud provider is experiencing issues, you need to have backups of data, as well as computational resources in other places, either with other cloud providers, or on-site solutions.

When a single cloud region fails, the most cost-effective solution would be to configure your services to also be hosted on other cloud regions, though this could still fail if multiple cloud regions fail, or something breaks at the provider level, upstream of cloud regions. Therefore it is best to have usable, fallback infrastructure, either on site or with other cloud providers such as Google Cloud, Microsoft Azure, or DigitalOcean.

There are several operational blind spots this outage exposed. For years businesses chose to mostly select a single provider to host most of their infrastructure to reduce costs and complexity. This however comes with risks as seen on October 20th. If your single provider fails, your business is left stranded and with very limited control over the situation.

Such potential failures should be taken into account when designing services that are critical to always remain operational, in these cases having infrastructure mirrored across providers, or onsite fallback solutions may be worth investing in, even when the cost increases significantly.

Following this outage, business leaders should initiate several things.

First, create an inventory of hosts and services that they operate and offer to their customers, prioritise which services and hosts are critical to remain operational at any cost, and which services can be temporarily disrupted without creating a significant disruption (e.g. for a messaging app you may prioritise the functionality of being able to send and receive text messages and giving less priority to the functionality that enables sending images videos, stickers, read receipts, etc.) create a plan, budget and timeline to implement this additional redundancy.

It’s also important to create a disaster recovery plan, or update the existing one by reconsidering the potential risks, and issues that may arise from similar failures and create solutions for these potential issues. This saves valuable time when a similar issue occurs allowing teams to efficiently and without hesitation or second guessing ensure that critical systems are restored as soon as possible.

One of the most effective ways to ensure a swift recovery from third party cloud failures is having a predetermined plan on what to do when such an issue occurs. It ensures that teams do not need to come up with solutions or make difficult decisions when time is of the essence, allowing them to promptly implement any required changes. Having multiple alternative lines of communication is also critical to ensure efficient coordination of recovery efforts.

Aras Nazarovas is a Senior Information Security Researcher at Cybernews, a research-driven online publication. Aras specialises in cybersecurity and threat analysis. He investigates online services, malicious campaigns, and hardware security while compiling data on the most prevalent cybersecurity threats.

Engineer News Network The ultimate online news and information resource for today’s engineer

AWS outage: the lessons that must be learned

Related Articles

Check Also

How micro 3D printing de-risks micro moulding before big tooling decisions

Italy’s solar momentum risks cost overruns unless grid and permitting improve

How material advances in hot surface igniters are expanding reliability across gas appliance applications

Managing electrical installation in precast concrete construction

igus helps engineering artist and designer create a bigger 3D footprint

Six-axis robot turns 3D printing into an art form

Structural vibration: What is it and how to control it

Gravity energy storage ‘cheaper than lithium batteries’

How to make sense of Alarm System Performance KPIs

Chill out: Variable speed chiller saves fuel, money and the environment

3G,4G/LTE antennas for the smallest PCBs

Surge protection: go beyond

Control cabinet heat management

How manufacturers can ensure digital twins deliver accurate, real-time insights

Micromobility safety: bikes that talk, roads that listen

How better visibility into quality, operations, and compliance can turn waste into measurable ROI

Rapid-response dewatering solution when uptime is critical

Compact entry-level solution for automated post-processing

How micro 3D printing de-risks micro moulding before big tooling decisions

Architecting scalable metal part supply PCE

Italy’s solar momentum risks cost overruns unless grid and permitting improve

Improving signal integrity in high-power automotive systems

How material advances in hot surface igniters are expanding reliability across gas appliance applications